Haoyu Cao

Papers: 10

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: Semantic Scholar

Attribution policy →

10papers

Authored papers

RISE-Video: Can Video Generators Decode Implicit World Rules?

arXiv 2026

2026

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

arXiv 2026

2026

Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision

arXiv 2026

2026

Youtu-Parsing: Perception, Structuring and Recognition via High-Parallelism Decoding

arXiv 2026

2026

Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion

arXiv 2026

2026

VITA-E: Natural Embodied Interaction with Concurrent Seeing, Hearing, Speaking, and Acting

arXiv 2025

2025

VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model

arXiv 2025

2025

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

arXiv 2025

2025

Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuray

arXiv 2025

2025

Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction

arXiv 2024

2024

Affiliations

No known affiliations.

Frequent co-authors

from 10 papers

Xing Sun

Yunhang Shen

Chaoyou Fu

Caifeng Shan

Deqiang Jiang

Ke Li

Peixian Chen

Ran He

Zuwei Long

Bing Liu