Zehui Chen
- Papers
- 23
Cite
Notes
Only stored in your browser.
Authored papers
23Flow-OPD: On-Policy Distillation for Flow Matching Models
arXiv 2026
VideoSeeker: Incentivizing Instance-level Video Understanding via Native Agentic Tool Invocation
arXiv 2026
SaaSBench: Exploring the Boundaries of Coding Agents in Long-Horizon Enterprise SaaS Engineering
arXiv 2026
SkillFlow:Benchmarking Lifelong Skill Discovery and Evolution for Autonomous Agents
arXiv 2026
SCOPE: Structured Decomposition and Conditional Skill Orchestration for Complex Image Generation
arXiv 2026
Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models
arXiv 2026
UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision
arXiv 2026
Beyond Accuracy: Unveiling Inefficiency Patterns in Tool-Integrated Reasoning
arXiv 2026
VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning
arXiv 2025
ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents
arXiv 2025
VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning
arXiv 2025
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training
arXiv 2025
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning
arXiv 2025
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning
arXiv 2025
Agentic Jigsaw Interaction Learning for Enhancing Visual Perception and Reasoning in Vision-Language Models
arXiv 2025
Critique-RL: Training Language Models for Critiquing through Two-Stage Reinforcement Learning
arXiv 2025
CRITICTOOL: Evaluating Self-Critique Capabilities of Large Language Models in Tool-Calling Error Scenarios
arXiv 2025
V2P-Bench: Evaluating Video-Language Understanding with Visual Prompts for Better Human-Model Interaction
arXiv 2025
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher
arXiv 2024
Are We on the Right Way for Evaluating Large Vision-Language Models?
arXiv 2024
PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition
arXiv 2024
Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models
arXiv 2024
T-Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step
arXiv 2023
Affiliations
Frequent co-authors
10from 23 papers