Aniruddha Kembhavi
- Papers
- 17
Cite
Notes
Only stored in your browser.
Authored papers
17Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation
arXiv 2025
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
CVPR 2025 1
One Diffusion to Generate Them All
CVPR 2025 1
From an Image to a Scene: Learning to Imagine the World from a Million 360 Videos
arXiv 2024
SAT: Dynamic Spatial Aptitude Training for Multimodal Language Models
arXiv 2024
Holodeck: Language Guided Generation of 3D Embodied AI Environments
CVPR 2024 1
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
arXiv 2023
MIMIC: Masked Image Modeling with Image Correspondences
arXiv 2023
Exposing and Addressing Cross-Task Inconsistency in Unified Vision-Language Models
arXiv 2023
SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality
sugarcrepe-fixing-hackable-benchmarks-for
SatlasPretrain: A Large-Scale Dataset for Remote Sensing Image Understanding
ICCV 2023 1
I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision
ICCV 2023 1
Towards General Purpose Vision Systems
arXiv 2021
RobustNav: Towards Benchmarking Robustness in Embodied Navigation
ICCV 2021 10
Visual Semantic Role Labeling for Video Understanding
CVPR 2021 1
AI2-THOR: An Interactive 3D Environment for Visual AI
arXiv 2017
A Diagram Is Worth A Dozen Images
arXiv 2016
Affiliations
Frequent co-authors
10from 17 papers