Yunhang Shen

Papers: 15

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: Semantic Scholar

Attribution policy →

15papers

Authored papers

Toward Native Multimodal Modeling: A Roadmap

arXiv 2026

2026

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

arXiv 2026

2026

Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision

arXiv 2026

2026

Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion

arXiv 2026

2026

VITA-E: Natural Embodied Interaction with Concurrent Seeing, Hearing, Speaking, and Acting

arXiv 2025

2025

Aligning Multimodal LLM with Human Preference: A Survey

arXiv 2025

2025

VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model

arXiv 2025

2025

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

arXiv 2025

2025

Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuray

arXiv 2025

2025

Solving the Catastrophic Forgetting Problem in Generalized Category Discovery

solving-the-catastrophic-forgetting-problem

2025

FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression

arXiv 2024

2024

T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs

arXiv 2024

2024

Woodpecker: Hallucination Correction for Multimodal Large Language Models

arXiv 2023

2023

Aligning and Prompting Everything All at Once for Universal Visual Perception

arXiv 2023

2023

FoPro: Few-Shot Guided Robust Webly-Supervised Prototypical Learning

arXiv 2022

2022

Affiliations

No known affiliations.

Frequent co-authors

from 15 papers

Xing Sun

Chaoyou Fu

Ke Li

Haoyu Cao

Caifeng Shan

Ran He

Rongrong Ji

Zuwei Long

Mengdan Zhang

Peixian Chen