Jun Zhang
- Papers
- 53
Cite
Notes
Only stored in your browser.
Authored papers
53MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping
arXiv 2026
TC-AE: Unlocking Token Capacity for Deep Compression Autoencoders
arXiv 2026
On the Step Length Confounding in LLM Reasoning Data Selection
arXiv 2026
Focus-dLLM: Accelerating Long-Context Diffusion LLM Inference via Confidence-Guided Context Focusing
arXiv 2026
Reinforcing Few-step Generators via Reward-Tilted Distribution Matching
arXiv 2026
Forcing-KV: Hybrid KV Cache Compression for Efficient Autoregressive Video Diffusion Models
arXiv 2026
Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning
arXiv 2026
Salt: Self-Consistent Distribution Matching with Cache-Aware Training for Fast Video Generation
arXiv 2026
STEP3-VL-10B Technical Report
arXiv 2026
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
arXiv 2025
AdaWorld: Learning Adaptable World Models with Latent Actions
arXiv 2025
OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning
arXiv 2025
SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image Distillation
arXiv 2025
TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs
arXiv 2025
Solla: Towards a Speech-Oriented LLM That Hears Acoustic Context
arXiv 2025
Joint Automatic Speech Recognition And Structure Learning For Better Speech Understanding
arXiv 2025
Dual Mutual Learning Network with Global-local Awareness for RGB-D Salient Object Detection
arXiv 2025
Reg-DPO: SFT-Regularized Direct Preference Optimization with GT-Pair for Improving Video Generation
arXiv 2025
WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling
arXiv 2025
WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research
arXiv 2025
Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models
arXiv 2025
UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation
arXiv 2025
Prot2Chat: Protein LLM with Early-Fusion of Text, Sequence and Structure
arXiv 2025
ByteLoom: Weaving Geometry-Consistent Human-Object Interactions through Progressive Curriculum Learning
arXiv 2025
DeFine: A Decomposed and Fine-Grained Annotated Dataset for Long-form Article Generation
arXiv 2025
FullStack Bench: Evaluating LLMs as Full Stack Coders
arXiv 2024
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion
arXiv 2024
GenAD: Generalized Predictive Model for Autonomous Driving
CVPR 2024 1
Training-Free Long-Context Scaling of Large Language Models
arXiv 2024
Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability
arXiv 2024
CityBench: Evaluating the Capabilities of Large Language Models for Urban Tasks
arXiv 2024
p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay
arXiv 2024
Timo: Towards Better Temporal Reasoning for Language Models
arXiv 2024
Ensembling Diffusion Models via Adaptive Feature Aggregation
arXiv 2024
Exploring Selective Layer Fine-Tuning in Federated Learning
arXiv 2024
V-Express: Conditional Dropout for Progressive Training of Portrait Video Generation
arXiv 2024
GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting
arXiv 2024
FAN: Fourier Analysis Networks
arXiv 2024
ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM
arXiv 2024
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
arXiv 2024
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
arXiv 2024
Boosting Neural Representations for Videos with a Conditional Decoder
CVPR 2024 1
HarmoniCa: Harmonizing Training and Inference for Better Feature Caching in Diffusion Transformer Acceleration
arXiv 2024
Bidirectional Stereo Image Compression with Cross-Dimensional Entropy Model
arXiv 2024
Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding
arXiv 2023
L-Eval: Instituting Standardized Evaluation for Long Context Language Models
arXiv 2023
Advancing Pose-Guided Image Synthesis with Progressive Conditional Diffusion Models
arXiv 2023
Image Super-resolution Via Latent Diffusion: A Sampling-space Mixture Of Experts And Frequency-augmented Decoder Approach
arXiv 2023
In-Context Learning with Many Demonstration Examples
arXiv 2023
KBioXLM: A Knowledge-anchored Biomedical Multilingual Pretrained Language Model
arXiv 2023
CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling
arXiv 2022
SoccerNet 2022 Challenges Results
arXiv 2022
Sparse Mixture-of-Experts are Domain Generalizable Learners
arXiv 2022
Affiliations
Frequent co-authors
10from 53 papers