Cite
Notes
Only stored in your browser.
Attribution
MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation
arXiv 2025
Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead
Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models
arXiv 2024
from 3 papers
Vibhav Vineet
Besmira Nushi
Vidhisha Balachandran
Baharan Mirzasoleiman
Jiayu Wang
Jingya Chen
John Langford
Lingjiao Chen
Safoora Yousefi
Shivam Garg