0

Feng Zhao

Papers
29

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
29papers

Authored papers

29

Flow-OPD: On-Policy Distillation for Flow Matching Models

arXiv 2026

2026

VideoSeeker: Incentivizing Instance-level Video Understanding via Native Agentic Tool Invocation

arXiv 2026

2026

SaaSBench: Exploring the Boundaries of Coding Agents in Long-Horizon Enterprise SaaS Engineering

arXiv 2026

2026

SkillFlow:Benchmarking Lifelong Skill Discovery and Evolution for Autonomous Agents

arXiv 2026

2026

SCOPE: Structured Decomposition and Conditional Skill Orchestration for Complex Image Generation

arXiv 2026

2026

Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models

arXiv 2026

2026

UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision

arXiv 2026

2026

Beyond Accuracy: Unveiling Inefficiency Patterns in Tool-Integrated Reasoning

arXiv 2026

2026

Tracing the Roots: A Multi-Agent Framework for Uncovering Data Lineage in Post-Training LLMs

arXiv 2026

2026

VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning

arXiv 2025

2025

ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents

arXiv 2025

2025

VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning

arXiv 2025

2025

V2P-Bench: Evaluating Video-Language Understanding with Visual Prompts for Better Human-Model Interaction

arXiv 2025

2025

RLFR: Extending Reinforcement Learning for LLMs with Flow Environment

arXiv 2025

2025

VL-LN Bench: Towards Long-horizon Goal-oriented Navigation with Active Dialogs

arXiv 2025

2025

InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation

arXiv 2025

2025

Gaussian Variation Field Diffusion for High-fidelity Video-to-4D Synthesis

ICCV 2025

2025

Agentic Jigsaw Interaction Learning for Enhancing Visual Perception and Reasoning in Vision-Language Models

arXiv 2025

2025

CRITICTOOL: Evaluating Self-Critique Capabilities of Large Language Models in Tool-Calling Error Scenarios

arXiv 2025

2025

MindSearch: Mimicking Human Minds Elicits Deep AI Searcher

arXiv 2024

2024

Are We on the Right Way for Evaluating Large Vision-Language Models?

arXiv 2024

2024

Varformer: Adapting VAR's Generative Prior for Image Restoration

arXiv 2024

2024

Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models

arXiv 2024

2024

SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model

arXiv 2024

2024

ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

arXiv 2023

2023

T-Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step

arXiv 2023

2023

Unmasking Bias in Diffusion Model Training

arXiv 2023

2023

Empowering Low-Light Image Enhancer through Customized Learnable Priors

ICCV 2023 1

2023

P2B: Point-to-Box Network for 3D Object Tracking in Point Clouds

p2b-point-to-box-network-for-3d-object-1

2020

Affiliations

No known affiliations.

Frequent co-authors

10

from 29 papers