Feng Zhao
- Papers
- 29
Cite
Notes
Only stored in your browser.
Authored papers
29Flow-OPD: On-Policy Distillation for Flow Matching Models
arXiv 2026
VideoSeeker: Incentivizing Instance-level Video Understanding via Native Agentic Tool Invocation
arXiv 2026
SaaSBench: Exploring the Boundaries of Coding Agents in Long-Horizon Enterprise SaaS Engineering
arXiv 2026
SkillFlow:Benchmarking Lifelong Skill Discovery and Evolution for Autonomous Agents
arXiv 2026
SCOPE: Structured Decomposition and Conditional Skill Orchestration for Complex Image Generation
arXiv 2026
Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models
arXiv 2026
UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision
arXiv 2026
Beyond Accuracy: Unveiling Inefficiency Patterns in Tool-Integrated Reasoning
arXiv 2026
Tracing the Roots: A Multi-Agent Framework for Uncovering Data Lineage in Post-Training LLMs
arXiv 2026
VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning
arXiv 2025
ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents
arXiv 2025
VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning
arXiv 2025
V2P-Bench: Evaluating Video-Language Understanding with Visual Prompts for Better Human-Model Interaction
arXiv 2025
RLFR: Extending Reinforcement Learning for LLMs with Flow Environment
arXiv 2025
VL-LN Bench: Towards Long-horizon Goal-oriented Navigation with Active Dialogs
arXiv 2025
InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation
arXiv 2025
Gaussian Variation Field Diffusion for High-fidelity Video-to-4D Synthesis
ICCV 2025
Agentic Jigsaw Interaction Learning for Enhancing Visual Perception and Reasoning in Vision-Language Models
arXiv 2025
CRITICTOOL: Evaluating Self-Critique Capabilities of Large Language Models in Tool-Calling Error Scenarios
arXiv 2025
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher
arXiv 2024
Are We on the Right Way for Evaluating Large Vision-Language Models?
arXiv 2024
Varformer: Adapting VAR's Generative Prior for Image Restoration
arXiv 2024
Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models
arXiv 2024
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model
arXiv 2024
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
arXiv 2023
T-Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step
arXiv 2023
Unmasking Bias in Diffusion Model Training
arXiv 2023
Empowering Low-Light Image Enhancer through Customized Learnable Priors
ICCV 2023 1
P2B: Point-to-Box Network for 3D Object Tracking in Point Clouds
p2b-point-to-box-network-for-3d-object-1
Affiliations
Frequent co-authors
10from 29 papers