Cite
Notes
Only stored in your browser.
Attribution
See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models
arXiv 2025
Describe Anything Model for Visual Question Answering on Text-rich Images
from 2 papers
Le Thien Phuc Nguyen
Anh-Khoi Nguyen
Dinh-Thang Duong
Jeongik Lee
Jianhua Xing
JuWan Maeng
Min Xu
Samuel Low Yu Hang
SeungEun Chung
Soochahn Lee