Cite
Notes
Only stored in your browser.
Attribution
SpeakerVid-5M: A Large-Scale High-Quality Dataset for Audio-Visual Dyadic Interactive Human Generation
arXiv 2025
Evaluating and Advancing Multimodal Large Language Models in Ability Lens
arXiv 2024
Recognize Anything: A Strong Image Tagging Model
arXiv 2023
from 3 papers
Bohan Zhuang
Chenhui Gou
Deyu Zhou
Duomin Wang
Feng Chen
Gang Yu
Jiahe Zhang
Jing Liu
Jinyu Ma
Jiyuan Zhang