Yu Cheng
- Papers
- 81
Cite
Notes
Only stored in your browser.
Authored papers
81MiMo-V2-Flash Technical Report
arXiv 2026
AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning
arXiv 2026
GEMS: Agent-Native Multimodal Generation with Memory and Skills
arXiv 2026
A Survey of Large Audio Language Models: Generalization, Trustworthiness, and Outlook
arXiv 2026
DrawMotion: Generating 3D Human Motions by Freehand Drawing
arXiv 2026
Unify-Agent: A Unified Multimodal Agent for World-Grounded Image Synthesis
arXiv 2026
π-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows
arXiv 2026
Memory Intelligence Agent
arXiv 2026
TEMPO: Scaling Test-time Training for Large Reasoning Models
arXiv 2026
Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs
arXiv 2026
TabEmbed: Benchmarking and Learning Generalist Embeddings for Tabular Understanding
arXiv 2026
LatentMem: Customizing Latent Memory for Multi-Agent Systems
arXiv 2026
Think Longer to Explore Deeper: Learn to Explore In-Context via Length-Incentivized Reinforcement Learning
arXiv 2026
Query as Anchor: Scenario-Adaptive User Representation via Large Language Model
arXiv 2026
P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads
arXiv 2026
How Do Decoder-Only LLMs Perceive Users? Rethinking Attention Masking for User Representation Learning
arXiv 2026
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models
arXiv 2025
Learning to Reason under Off-Policy Guidance
arXiv 2025
VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models
arXiv 2025
Process Reinforcement through Implicit Rewards
arXiv 2025
SeerAttention-R: Sparse Attention Adaptation for Long Reasoning
arXiv 2025
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts
arXiv 2025
OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning
arXiv 2025
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment
arXiv 2025
Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement Learning
arXiv 2025
Text-to-Decision Agent: Learning Generalist Policies from Natural Language Supervision
arXiv 2025
SATORI-R1: Incentivizing Multimodal Reasoning with Spatial Grounding and Verifiable Rewards
arXiv 2025
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models
arXiv 2025
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers
arXiv 2025
Liger: Linearizing Large Language Models to Gated Recurrent Structures
arXiv 2025
Extrapolating and Decoupling Image-to-Video Generation Models: Motion Modeling is Easier Than You Think
CVPR 2025 1
Reasoning over Boundaries: Enhancing Specification Alignment via Test-time Delibration
arXiv 2025
UltraIF: Advancing Instruction Following from the Wild
arXiv 2025
From Drafts to Answers: Unlocking LLM Potential via Aggregation Fine-Tuning
arXiv 2025
Visually Interpretable Subtask Reasoning for Visual Question Answering
arXiv 2025
Native Hybrid Attention for Efficient Sequence Modeling
arXiv 2025
Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning
arXiv 2025
VideoSSR: Video Self-Supervised Reinforcement Learning
arXiv 2025
P1: Mastering Physics Olympiads with Reinforcement Learning
arXiv 2025
ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning
arXiv 2025
TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models
arXiv 2025
Interleaving Reasoning for Better Text-to-Image Generation
arXiv 2025
ExGRPO: Learning to Reason from Experience
arXiv 2025
FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting
arXiv 2025
Speed Always Wins: A Survey on Efficient Architectures for Large Language Models
arXiv 2025
From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery
arXiv 2025
Spotlight on Token Perception for Multimodal Reinforcement Learning
arXiv 2025
Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models
arXiv 2025
FullFront: Benchmarking MLLMs Across the Full Front-End Engineering Workflow
arXiv 2025
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
arXiv 2025
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback
arXiv 2025
LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training
arXiv 2024
CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling
arXiv 2024
LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training
arXiv 2024
Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
arXiv 2024
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation
arXiv 2024
Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging
arXiv 2024
Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints
ICCV 2025
ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM
arXiv 2024
Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts
arXiv 2024
Timo: Towards Better Temporal Reasoning for Language Models
arXiv 2024
Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning
arXiv 2024
Learning the Unlearned: Mitigating Feature Suppression in Contrastive Learning
arXiv 2024
What makes your model a low-empathy or warmth person: Exploring the Origins of Personality in LLMs
arXiv 2024
SURf: Teaching Large Vision-Language Models to Selectively Utilize Retrieved Information
arXiv 2024
MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution
arXiv 2024
Continuous Speech Tokenizer in Text To Speech
arXiv 2024
A Survey of Reasoning with Foundation Models
arXiv 2023
Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy
arXiv 2023
Hiding Data Helps: On the Benefits of Masking for Sparse Coding
arXiv 2023
AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning
arXiv 2023
Enhancing Low-Resource Relation Representations through Multi-View Decoupling
arXiv 2023
M$^3$ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design
arXiv 2022
RASAT: Integrating Relational Structures into Pretrained Seq2Seq Model for Text-to-SQL
arXiv 2022
Local Byte Fusion for Neural Machine Translation
arXiv 2022
DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models
dsee-dually-sparsity-embedded-efficient
HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training
EMNLP 2020 11
VIOLIN: A Large-Scale Dataset for Video-and-Language Inference
violin-a-large-scale-dataset-for-video-and-1
Graph Optimal Transport for Cross-Domain Alignment
ICML 2020 1
UNITER: UNiversal Image-TExt Representation Learning
ECCV 2020 8
EnlightenGAN: Deep Light Enhancement without Paired Supervision
arXiv 2019
Affiliations
Frequent co-authors
10from 81 papers