Yexin Liu
- Papers
- 14
Cite
Notes
Only stored in your browser.
Authored papers
14LoopViT: Scaling Visual ARC with Looped Transformers
arXiv 2026
When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding
arXiv 2025
MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation
arXiv 2025
EditThinker: Unlocking Iterative Reasoning for Any Image Editor
arXiv 2025
OpenSubject: Leveraging Video-Derived Identity and Diversity Priors for Subject-driven Image Generation and Manipulation
arXiv 2025
Architecture Decoupling Is Not All You Need For Unified Multimodal Model
arXiv 2025
TalkVid: A Large-Scale Diversified Dataset for Audio-Driven Talking Head Synthesis
arXiv 2025
OmniGen2: Exploration to Advanced Multimodal Generation
arXiv 2025
LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization
arXiv 2025
Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models
ICCV 2025
Efficient Multimodal Large Language Models: A Survey
arXiv 2024
VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation
arXiv 2024
Efficient Multimodal Learning from Data-centric Perspective
arXiv 2024
Unveiling the Ignorance of MLLMs: Seeing Clearly, Answering Incorrectly
CVPR 2025 1
Affiliations
Frequent co-authors
10from 14 papers