0

Zehui Chen

Papers
23

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
23papers

Authored papers

23

Flow-OPD: On-Policy Distillation for Flow Matching Models

arXiv 2026

2026

VideoSeeker: Incentivizing Instance-level Video Understanding via Native Agentic Tool Invocation

arXiv 2026

2026

SaaSBench: Exploring the Boundaries of Coding Agents in Long-Horizon Enterprise SaaS Engineering

arXiv 2026

2026

SkillFlow:Benchmarking Lifelong Skill Discovery and Evolution for Autonomous Agents

arXiv 2026

2026

SCOPE: Structured Decomposition and Conditional Skill Orchestration for Complex Image Generation

arXiv 2026

2026

Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models

arXiv 2026

2026

UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision

arXiv 2026

2026

Beyond Accuracy: Unveiling Inefficiency Patterns in Tool-Integrated Reasoning

arXiv 2026

2026

VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning

arXiv 2025

2025

ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents

arXiv 2025

2025

VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning

arXiv 2025

2025

Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training

arXiv 2025

2025

UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning

arXiv 2025

2025

AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning

arXiv 2025

2025

Agentic Jigsaw Interaction Learning for Enhancing Visual Perception and Reasoning in Vision-Language Models

arXiv 2025

2025

Critique-RL: Training Language Models for Critiquing through Two-Stage Reinforcement Learning

arXiv 2025

2025

CRITICTOOL: Evaluating Self-Critique Capabilities of Large Language Models in Tool-Calling Error Scenarios

arXiv 2025

2025

V2P-Bench: Evaluating Video-Language Understanding with Visual Prompts for Better Human-Model Interaction

arXiv 2025

2025

MindSearch: Mimicking Human Minds Elicits Deep AI Searcher

arXiv 2024

2024

Are We on the Right Way for Evaluating Large Vision-Language Models?

arXiv 2024

2024

PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition

arXiv 2024

2024

Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models

arXiv 2024

2024

T-Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step

arXiv 2023

2023

Affiliations

No known affiliations.

Frequent co-authors

10

from 23 papers