0

Yuhao Dong

Papers
21

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
21papers

Authored papers

21

Artifact-Bench: Evaluating MLLMs on Detecting and Assessing the Artifacts of AI-Generated Videos

arXiv 2026

2026

LongAV-Compass: Towards Unified Evaluation of Minute-Scale Audio-Visual Generation Across T2AV, I2AV, and V2AV

arXiv 2026

2026

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

arXiv 2026

2026

VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining

arXiv 2026

2026

PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning

arXiv 2026

2026

Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition

arXiv 2026

2026

FileGram: Grounding Agent Personalization in File-System Behavioral Traces

arXiv 2026

2026

Kimi K2.5: Visual Agentic Intelligence

arXiv 2026

2026

EgoLife: Towards Egocentric Life Assistant

CVPR 2025 1

2025

Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives

ICCV 2025

2025

Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning

arXiv 2025

2025

Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark

arXiv 2025

2025

RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark

arXiv 2025

2025

ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models

arXiv 2025

2025

High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning

arXiv 2025

2025

Ola: Pushing the Frontiers of Omni-Modal Language Model

arXiv 2025

2025

Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution

arXiv 2024

2024

Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

CVPR 2025 1

2024

Efficient Inference of Vision Instruction-Following Models with Elastic Cache

arXiv 2024

2024

3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion

CVPR 2025 1

2024

Octopus: Embodied Vision-Language Programmer from Environmental Feedback

arXiv 2023

2023

Affiliations

No known affiliations.

Frequent co-authors

10

from 21 papers