Le Zhuo
- Papers
- 20
Cite
Notes
Only stored in your browser.
Authored papers
20From Statics to Dynamics: Physics-Aware Image Editing with Latent Transition Priors
arXiv 2026
Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding
arXiv 2025
Lumina-Image 2.0: A Unified and Efficient Image Generative Framework
ICCV 2025
VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning
ICCV 2025
From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning
ICCV 2025
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT
arXiv 2025
OmniCaptioner: One Captioner to Rule Them All
arXiv 2025
Vision-to-Music Generation: A Survey
arXiv 2025
PICABench: How Far Are We from Physically Realistic Image Editing?
arXiv 2025
Factuality Matters: When Image Generation and Editing Meet Structured Visuals
arXiv 2025
IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models
arXiv 2025
Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT
arXiv 2024
VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
CVPR 2025 1
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining
arXiv 2024
Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation
arXiv 2024
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation
arXiv 2024
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
arXiv 2024
I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow
arXiv 2024
LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT
arXiv 2023
Video Background Music Generation: Dataset, Method and Evaluation
ICCV 2023 1
Affiliations
Frequent co-authors
10from 20 papers