Cite
Notes
Only stored in your browser.
Attribution
CoGenAV: Versatile Audio-Visual Representation Learning via Contrastive-Generative Synchronization
arXiv 2025
HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context
HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding
from 3 papers
Xihan Wei
Boyuan Sun
Jiaxing Zhao
Liefeng Bo
Qize Yang
Shenghao Fu
Shimin Yao
Weixuan Chen
Bowen Yin
Jingren Zhou