Cite
Notes
Only stored in your browser.
Attribution
MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs
arXiv 2025
LAMIC: Layout-Aware Multi-Image Composition via Scalability of Multimodal Diffusion Transformer
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
arXiv 2024
from 3 papers
Afshin Dehghan
Haiming Gang
David Griffiths
Erik Daxberger
Gefen Kohavi
Hong-You Chen
Jianhua Wang
Justin Lazarow
Marcin Eichner
Mingfei Gao