Cite
Notes
Only stored in your browser.
Attribution
MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs
arXiv 2025
FastVLM: Efficient Vision Encoding for Vision Language Models
CVPR 2025 1
MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs
arXiv 2024
from 3 papers
Yinfei Yang
Afshin Dehghan
Albert Antony
Cem Koc
Chun-Liang Li
David Griffiths
Erik Daxberger
Fartash Faghri
Gefen Kohavi
Gokul Santhanam