Sai Rajeswar
- Papers
- 13
Cite
Notes
Only stored in your browser.
Authored papers
13AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
arXiv 2025
Grounding Computer Use Agents on Human Demonstrations
arXiv 2025
The Promise of RL for Autoregressive Image Editing
arXiv 2025
AU-Harness: An Open-Source Toolkit for Holistic Evaluation of Audio LLMs
arXiv 2025
StarFlow: Generating Structured Workflow Outputs From Sketch Images
arXiv 2025
GenRL: Multimodal-foundation world models for generalization in embodied agents
arXiv 2024
VCR: A Task for Pixel-Level Complex Reasoning in Vision Language Models via Restoring Occluded Text
arXiv 2024
RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content
arXiv 2024
StarVector: Generating Scalable Vector Graphics Code from Images and Text
CVPR 2025 1
Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels
arXiv 2022
Choreographer: Learning and Adapting Skills in Imagination
arXiv 2022
Touch-based Curiosity for Sparse-Reward Tasks
arXiv 2021
Pix2Shape: Towards Unsupervised Learning of 3D Scenes from Images using a View-based Representation
arXiv 2020
Affiliations
Frequent co-authors
10from 13 papers
David Vazquez
researcher
Christopher Pal
Perouz Taslakian
Aaron Courville
Juan A Rodriguez
Bart Dhoedt
Nicolas Chapados
researcher
Pietro Mazzaglia
Spandana Gella
Tim Verbelen