Cite
Notes
Only stored in your browser.
Attribution
video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models
arXiv 2025
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
ACVUBench: Audio-Centric Video Understanding Benchmark
from 3 papers
Changli Tang
Chao Zhang
Guangzhi Sun
Jimin Zhuang
Wei Li
Yixuan Li
Zejun Ma
Peihan Li
Yifan Jiang