Cite
Notes
Only stored in your browser.
Attribution
Do Vision and Language Encoders Represent the World Similarly?
CVPR 2024 1
From Unimodal to Multimodal: Scaling up Projectors to Align Modalities
arXiv 2024
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding
egoschema-a-diagnostic-benchmark-for-very
from 3 papers
Karttikeya Mangalam
Mayug Maniparambil
Noel E. O'Connor
Sanath Narayan
Yasser Abdelaziz Dahou Djilali
Ankit Singh
Jitendra Malik
Mohamed El Amine Seddik