Caifeng Shan
- Papers
- 10
Cite
Notes
Only stored in your browser.
Authored papers
10Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding
arXiv 2026
PersonaVLM: Long-Term Personalized Multimodal LLMs
arXiv 2026
NOVA: Sparse Control, Dense Synthesis for Pair-Free Video Editing
arXiv 2026
NGM: A Plug-and-Play Training-Free Memory Module for LLMs
arXiv 2026
Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion
arXiv 2026
VideoDetective: Clue Hunting via both Extrinsic Query and Intrinsic Relevance for Long Video Understanding
arXiv 2026
VITA-E: Natural Embodied Interaction with Concurrent Seeing, Hearing, Speaking, and Acting
arXiv 2025
PosterCopilot: Toward Layout Reasoning and Controllable Editing for Professional Graphic Design
arXiv 2025
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
arXiv 2025
T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs
arXiv 2024
Affiliations
Frequent co-authors
10from 10 papers