Zhen Li
- Papers
- 37
Cite
Notes
Only stored in your browser.
Authored papers
37D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models
arXiv 2026
WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG
arXiv 2026
MiniCPM4: Ultra-Efficient LLMs on End Devices
arXiv 2025
Sekai: A Video Dataset towards World Exploration
arXiv 2025
Lumina-Image 2.0: A Unified and Efficient Image Generative Framework
ICCV 2025
VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning
ICCV 2025
K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs
CVPR 2025 1
ChartGalaxy: A Dataset for Infographic Chart Understanding and Generation
arXiv 2025
OmniCaptioner: One Captioner to Rule Them All
arXiv 2025
Yume-1.5: A Text-Controlled Interactive World Generation Model
arXiv 2025
Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield
arXiv 2025
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer
arXiv 2025
LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis
arXiv 2025
DSPNet: Dual-vision Scene Perception for Robust 3D Question Answering
CVPR 2025 1
IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models
arXiv 2025
MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models
mdk12-bench-a-multi-discipline-benchmark-for
CLEA: Closed-Loop Embodied Agent for Enhancing Task Execution in Dynamic Environments
arXiv 2025
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
arXiv 2025
InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning
arXiv 2025
Multi-Sourced Compositional Generalization in Visual Question Answering
arXiv 2025
Distribution Matching Distillation Meets Reinforcement Learning
arXiv 2025
DriveGEN: Generalized and Robust 3D Detection in Driving via Controllable Text-to-Image Diffusion Generation
CVPR 2025 1
Emu3: Next-Token Prediction is All You Need
arXiv 2024
Leveraging Large Language Models for NLG Evaluation: Advances and Challenges
arXiv 2024
GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding
arXiv 2024
Time Evidence Fusion Network: Multi-source View in Long-Term Time Series Forecasting
arXiv 2024
LATR: 3D Lane Detection from Monocular Images with Transformer
ICCV 2023 1
Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
CVPR 2024 1
SupFusion: Supervised LiDAR-Camera Fusion for 3D Object Detection
ICCV 2023 1
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
CVPR 2024 1
SRFormerV2: Taking a Closer Look at Permuted Self-Attention for Image Super-Resolution
ICCV 2023 1
AMT: All-Pairs Multi-Field Transforms for Efficient Frame Interpolation
CVPR 2023 1
SkeletonMAE: Graph-based Masked Autoencoder for Skeleton Sequence Pre-training
ICCV 2023 1
Towards An End-to-End Framework for Flow-Guided Video Inpainting
CVPR 2022 1
Composable Text Controls in Latent Space with ODEs
arXiv 2022
Don't Take It Literally: An Edit-Invariant Sequence Loss for Text Generation
don-t-take-it-literally-an-edit-invariant-1
VulDeePecker: A Deep Learning-Based System for Vulnerability Detection
arXiv 2018
Affiliations
Frequent co-authors
10from 37 papers