0

Yu Cheng

Papers
81

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
81papers

Authored papers

81

MiMo-V2-Flash Technical Report

arXiv 2026

2026

AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning

arXiv 2026

2026

GEMS: Agent-Native Multimodal Generation with Memory and Skills

arXiv 2026

2026

A Survey of Large Audio Language Models: Generalization, Trustworthiness, and Outlook

arXiv 2026

2026

DrawMotion: Generating 3D Human Motions by Freehand Drawing

arXiv 2026

2026

Unify-Agent: A Unified Multimodal Agent for World-Grounded Image Synthesis

arXiv 2026

2026

π-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows

arXiv 2026

2026

Memory Intelligence Agent

arXiv 2026

2026

TEMPO: Scaling Test-time Training for Large Reasoning Models

arXiv 2026

2026

Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs

arXiv 2026

2026

TabEmbed: Benchmarking and Learning Generalist Embeddings for Tabular Understanding

arXiv 2026

2026

LatentMem: Customizing Latent Memory for Multi-Agent Systems

arXiv 2026

2026

Think Longer to Explore Deeper: Learn to Explore In-Context via Length-Incentivized Reinforcement Learning

arXiv 2026

2026

Query as Anchor: Scenario-Adaptive User Representation via Large Language Model

arXiv 2026

2026

P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads

arXiv 2026

2026

How Do Decoder-Only LLMs Perceive Users? Rethinking Attention Masking for User Representation Learning

arXiv 2026

2026

The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models

arXiv 2025

2025

Learning to Reason under Off-Policy Guidance

arXiv 2025

2025

VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models

arXiv 2025

2025

Process Reinforcement through Implicit Rewards

arXiv 2025

2025

SeerAttention-R: Sparse Attention Adaptation for Long Reasoning

arXiv 2025

2025

Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts

arXiv 2025

2025

OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning

arXiv 2025

2025

Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment

arXiv 2025

2025

Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement Learning

arXiv 2025

2025

Text-to-Decision Agent: Learning Generalist Policies from Natural Language Supervision

arXiv 2025

2025

SATORI-R1: Incentivizing Multimodal Reasoning with Spatial Grounding and Verifiable Rewards

arXiv 2025

2025

DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models

arXiv 2025

2025

Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers

arXiv 2025

2025

Liger: Linearizing Large Language Models to Gated Recurrent Structures

arXiv 2025

2025

Extrapolating and Decoupling Image-to-Video Generation Models: Motion Modeling is Easier Than You Think

CVPR 2025 1

2025

Reasoning over Boundaries: Enhancing Specification Alignment via Test-time Delibration

arXiv 2025

2025

UltraIF: Advancing Instruction Following from the Wild

arXiv 2025

2025

From Drafts to Answers: Unlocking LLM Potential via Aggregation Fine-Tuning

arXiv 2025

2025

Visually Interpretable Subtask Reasoning for Visual Question Answering

arXiv 2025

2025

Native Hybrid Attention for Efficient Sequence Modeling

arXiv 2025

2025

Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning

arXiv 2025

2025

VideoSSR: Video Self-Supervised Reinforcement Learning

arXiv 2025

2025

P1: Mastering Physics Olympiads with Reinforcement Learning

arXiv 2025

2025

ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning

arXiv 2025

2025

TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models

arXiv 2025

2025

Interleaving Reasoning for Better Text-to-Image Generation

arXiv 2025

2025

ExGRPO: Learning to Reason from Experience

arXiv 2025

2025

FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting

arXiv 2025

2025

Speed Always Wins: A Survey on Efficient Architectures for Large Language Models

arXiv 2025

2025

From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery

arXiv 2025

2025

Spotlight on Token Perception for Multimodal Reinforcement Learning

arXiv 2025

2025

Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models

arXiv 2025

2025

FullFront: Benchmarking MLLMs Across the Full Front-End Engineering Workflow

arXiv 2025

2025

A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond

arXiv 2025

2025

Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback

arXiv 2025

2025

LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training

arXiv 2024

2024

CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling

arXiv 2024

2024

LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training

arXiv 2024

2024

Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models

arXiv 2024

2024

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

arXiv 2024

2024

Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging

arXiv 2024

2024

Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints

ICCV 2025

2024

ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM

arXiv 2024

2024

Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts

arXiv 2024

2024

Timo: Towards Better Temporal Reasoning for Language Models

arXiv 2024

2024

Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning

arXiv 2024

2024

Learning the Unlearned: Mitigating Feature Suppression in Contrastive Learning

arXiv 2024

2024

What makes your model a low-empathy or warmth person: Exploring the Origins of Personality in LLMs

arXiv 2024

2024

SURf: Teaching Large Vision-Language Models to Selectively Utilize Retrieved Information

arXiv 2024

2024

MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution

arXiv 2024

2024

Continuous Speech Tokenizer in Text To Speech

arXiv 2024

2024

A Survey of Reasoning with Foundation Models

arXiv 2023

2023

Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy

arXiv 2023

2023

Hiding Data Helps: On the Benefits of Masking for Sparse Coding

arXiv 2023

2023

AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning

arXiv 2023

2023

Enhancing Low-Resource Relation Representations through Multi-View Decoupling

arXiv 2023

2023

M$^3$ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design

arXiv 2022

2022

RASAT: Integrating Relational Structures into Pretrained Seq2Seq Model for Text-to-SQL

arXiv 2022

2022

Local Byte Fusion for Neural Machine Translation

arXiv 2022

2022

DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models

dsee-dually-sparsity-embedded-efficient

2021

HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training

EMNLP 2020 11

2020

VIOLIN: A Large-Scale Dataset for Video-and-Language Inference

violin-a-large-scale-dataset-for-video-and-1

2020

Graph Optimal Transport for Cross-Domain Alignment

ICML 2020 1

2020

UNITER: UNiversal Image-TExt Representation Learning

ECCV 2020 8

2019

EnlightenGAN: Deep Light Enhancement without Paired Supervision

arXiv 2019

2019

Affiliations

No known affiliations.

Frequent co-authors

10

from 81 papers