Xiaojian Ma
- Papers
- 13
Cite
Notes
Only stored in your browser.
Authored papers
13Probing Visual Planning in Image Editing Models
arXiv 2026
UltraEdit: Instruction-based Fine-Grained Image Editing at Scale
arXiv 2024
Multi-modal Situated Reasoning in 3D Scenes
arXiv 2024
ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting
CVPR 2025 1
Bongard-OpenWorld: Few-Shot Reasoning for Free-form Visual Concepts in the Real World
arXiv 2023
An Embodied Generalist Agent in 3D World
arXiv 2023
MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning
arXiv 2023
Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents
arXiv 2023
GROOT: Learning to Follow Instructions by Watching Gameplay Videos
arXiv 2023
Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon Prediction
CVPR 2023 1
Perceive, Ground, Reason, and Act: A Benchmark for General-purpose Visual Representation
arXiv 2022
SQA3D: Situated Question Answering in 3D Scenes
arXiv 2022
Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions
CVPR 2022 1
Affiliations
Frequent co-authors
10from 13 papers