Cite
Notes
Only stored in your browser.
Attribution
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
arXiv 2023
COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
from 2 papers
Jing Liu
Sihan Chen
Handong Li
Jiashi Feng
Jinhui Tang
Longteng Guo
Weining Wang
Xiaojie Jin
Xinxin Zhu