Yuhao Dong
- Papers
- 21
Cite
Notes
Only stored in your browser.
Authored papers
21Artifact-Bench: Evaluating MLLMs on Detecting and Assessing the Artifacts of AI-Generated Videos
arXiv 2026
LongAV-Compass: Towards Unified Evaluation of Minute-Scale Audio-Visual Generation Across T2AV, I2AV, and V2AV
arXiv 2026
Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding
arXiv 2026
VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining
arXiv 2026
PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning
arXiv 2026
Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition
arXiv 2026
FileGram: Grounding Agent Personalization in File-System Behavioral Traces
arXiv 2026
Kimi K2.5: Visual Agentic Intelligence
arXiv 2026
EgoLife: Towards Egocentric Life Assistant
CVPR 2025 1
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives
ICCV 2025
Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning
arXiv 2025
Uni-MMMU: A Massive Multi-discipline Multimodal Unified Benchmark
arXiv 2025
RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark
arXiv 2025
ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models
arXiv 2025
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning
arXiv 2025
Ola: Pushing the Frontiers of Omni-Modal Language Model
arXiv 2025
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution
arXiv 2024
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
CVPR 2025 1
Efficient Inference of Vision Instruction-Following Models with Elastic Cache
arXiv 2024
3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion
CVPR 2025 1
Octopus: Embodied Vision-Language Programmer from Environmental Feedback
arXiv 2023
Affiliations
Frequent co-authors
10from 21 papers