Aniruddha Kembhavi

Papers: 17

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: Semantic Scholar

Attribution policy →

17papers

Authored papers

Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation

arXiv 2025

2025

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models

CVPR 2025 1

2024

One Diffusion to Generate Them All

CVPR 2025 1

2024

From an Image to a Scene: Learning to Imagine the World from a Million 360 Videos

arXiv 2024

2024

SAT: Dynamic Spatial Aptitude Training for Multimodal Language Models

arXiv 2024

2024

Holodeck: Language Guided Generation of 3D Embodied AI Environments

CVPR 2024 1

2023

Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action

arXiv 2023

2023

SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality

sugarcrepe-fixing-hackable-benchmarks-for

2023

MIMIC: Masked Image Modeling with Image Correspondences

arXiv 2023

2023

Exposing and Addressing Cross-Task Inconsistency in Unified Vision-Language Models

arXiv 2023

2023

SatlasPretrain: A Large-Scale Dataset for Remote Sensing Image Understanding

ICCV 2023 1

2022

I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision

ICCV 2023 1

2022

Towards General Purpose Vision Systems

arXiv 2021

2021

RobustNav: Towards Benchmarking Robustness in Embodied Navigation

ICCV 2021 10

2021

Visual Semantic Role Labeling for Video Understanding

CVPR 2021 1

2021

AI2-THOR: An Interactive 3D Environment for Visual AI

arXiv 2017

2017

A Diagram Is Worth A Dozen Images

arXiv 2016

2016

Affiliations

No known affiliations.

Frequent co-authors

from 17 papers

Christopher Clark

7 shared papers

Ranjay Krishna

7 shared papers

Ali Farhadi

CEO

Luca Weihs

Mark Yatskar

Matt Deitke

Tanmay Gupta

Chris Callison-Burch

Eli VanderBilt

Jiasen Lu