Cite
Notes
Only stored in your browser.
Attribution
CoMP: Continual Multimodal Pre-training for Vision Foundation Models
arXiv 2025
Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning
arXiv 2024
Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding
arXiv 2023
from 3 papers
Zuxuan Wu
Lingchen Meng
Yitong Chen
Yu-Gang Jiang
Hang Xu
Shiyi Lan
Sicheng Xie
Tao Gui
Xipeng Qiu
Yang Liu