Li Yuan
- Papers
- 57
Cite
Notes
Only stored in your browser.
Authored papers
57Helios: Real Real-Time Long Video Generation Model
arXiv 2026
SERE: Similarity-based Expert Re-routing for Efficient Batch Decoding in MoE Models
arXiv 2026
LatentChem: From Textual CoT to Latent Thinking in Chemical Reasoning
arXiv 2026
iFSQ: Improving FSQ for Image Generation with 1 Line of Code
arXiv 2026
Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models
arXiv 2026
MAXS: Meta-Adaptive Exploration with LLM Agents
arXiv 2026
ImgEdit: A Unified Image Editing Dataset and Benchmark
arXiv 2025
UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
arXiv 2025
OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation
arXiv 2025
WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation
arXiv 2025
SwapAnyone: Consistent and Realistic Video Synthesis for Swapping Any Person into Any Video
arXiv 2025
HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation
arXiv 2025
Rethinking Text-based Protein Understanding: Retrieval or LLM?
arXiv 2025
BioProBench: Comprehensive Dataset and Benchmark in Biological Protocol Understanding and Reasoning
arXiv 2025
UltraShape 1.0: High-Fidelity 3D Shape Generation via Scalable Geometric Refinement
arXiv 2025
Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback
arXiv 2025
Epona: Autoregressive Diffusion World Model for Autonomous Driving
ICCV 2025
Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward
arXiv 2025
Magic 1-For-1: Generating One Minute Video Clips within One Minute
arXiv 2025
GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation
arXiv 2025
Sci-Fi: Symmetric Constraint for Frame Inbetweening
arXiv 2025
Can Understanding and Generation Truly Benefit Together -- or Just Coexist?
arXiv 2025
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
arXiv 2024
ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis
arXiv 2024
Open-Sora Plan: Open-Source Large Video Generation Model
arXiv 2024
Identity-Preserving Text-to-Video Generation by Frequency Decomposition
CVPR 2025 1
LLaVA-CoT: Let Vision Language Models Reason Step-by-Step
arXiv 2024
Mora: Enabling Generalist Video Generation via A Multi-Agent Framework
arXiv 2024
HoloDreamer: Holistic 3D Panoramic World Generation from Text Descriptions
arXiv 2024
DF40: Toward Next-Generation Deepfake Detection
arXiv 2024
MoH: Multi-Head Attention as Mixture-of-Head Attention
arXiv 2024
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
arXiv 2024
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model
CVPR 2025 1
Med-MoE: Mixture of Domain-Specific Experts for Lightweight Medical Vision-Language Models
arXiv 2024
Next Patch Prediction for Autoregressive Visual Generation
arXiv 2024
LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference
arXiv 2024
PiCO: Peer Review in LLMs based on the Consistency Optimization
arXiv 2024
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation
arXiv 2024
Spikformer V2: Join the High Accuracy Club on ImageNet with an SNN Ticket
arXiv 2024
MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts
arXiv 2024
ProLLaMA: A Protein Large Language Model for Multi-Task Protein Language Processing
arXiv 2024
Orthogonal Subspace Decomposition for Generalizable AI-Generated Image Detection
arXiv 2024
Envision3D: One Image to 3D with Anchor Views Interpolation
arXiv 2024
GraCo: Granularity-Controllable Interactive Segmentation
CVPR 2024 1
Chatlaw: A Multi-Agent Collaborative Legal Assistant with Knowledge Graph Enhanced Mixture-of-Experts Large Language Model
arXiv 2023
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
video-llava-learning-united-visual
Machine Mindset: An MBTI Exploration of Large Language Models
arXiv 2023
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
arXiv 2023
DiffusionRet: Generative Text-Video Retrieval with Diffusion Model
ICCV 2023 1
Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
CVPR 2023 1
Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting
arXiv 2023
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models
arXiv 2023
FreestyleRet: Retrieving Images from Style-Diversified Queries
arXiv 2023
Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment
arXiv 2023
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
CVPR 2024 1
Masked Autoencoders for Point Cloud Self-supervised Learning
arXiv 2022
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
ICCV 2021 10
Affiliations
Frequent co-authors
10from 57 papers