Cite
Notes
Only stored in your browser.
Attribution
Do Audio-Visual Large Language Models Really See and Hear?
arXiv 2026
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
arXiv 2024
from 2 papers
Dinesh Manocha
S Sakshi
Sreyan Ghosh
Ashish Seth
Kaousheik Jayakumar
Oriol Nieto
Ramani Duraiswami
Ruohan Gao
Sonal Kumar
Utkarsh Tyagi