Caifeng Shan

Papers: 10

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: Semantic Scholar

Attribution policy →

10papers

Authored papers

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

arXiv 2026

2026

PersonaVLM: Long-Term Personalized Multimodal LLMs

arXiv 2026

2026

NOVA: Sparse Control, Dense Synthesis for Pair-Free Video Editing

arXiv 2026

2026

NGM: A Plug-and-Play Training-Free Memory Module for LLMs

arXiv 2026

2026

VideoDetective: Clue Hunting via both Extrinsic Query and Intrinsic Relevance for Long Video Understanding

arXiv 2026

2026

Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion

arXiv 2026

2026

VITA-E: Natural Embodied Interaction with Concurrent Seeing, Hearing, Speaking, and Acting

arXiv 2025

2025

PosterCopilot: Toward Layout Reasoning and Controllable Editing for Professional Graphic Design

arXiv 2025

2025

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

arXiv 2025

2025

T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs

arXiv 2024

2024

Affiliations

No known affiliations.

Frequent co-authors

from 10 papers

Chaoyou Fu

Ran He

Xing Sun

Yunhang Shen

Haoyu Cao

Chenyang Si

Yi-Fan Zhang

Zuwei Long

Chu Wu

Heting Gao