Cite
Notes
Only stored in your browser.
Attribution
Qwen3-VL Technical Report
arXiv 2025
JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation
CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment
arXiv 2022
from 3 papers
Fan Zhou
An Yang
Bei Liu
Binyuan Hui
Bo Zheng
Bowen Yu
Chang Gao
Chenglong Liu
Chenxu Lv
Chunjiang Ge