Cite
Notes
Only stored in your browser.
Attribution
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
CVPR 2024 1
Improved baselines for vision-language pre-training
arXiv 2023
from 2 papers
Adriana Romero-Soriano
Enrico Fini
Florian Bordes
Jack Urbanek
Jakob Verbeek
Mary Williamson
Michal Drozdzal
Vasu Sharma