Mengdan Zhang
- Papers
- 5
Cite
Notes
Only stored in your browser.
5papers
Authored papers
5Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision
arXiv 2026
VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model
arXiv 2025
Streaming Video Instruction Tuning
arXiv 2025
Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuray
arXiv 2025
Aligning and Prompting Everything All at Once for Universal Visual Perception
arXiv 2023
Affiliations
No known affiliations.
Frequent co-authors
10from 5 papers