Xiaojian Ma

Papers: 13

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile: Semantic Scholar

Attribution policy →

13papers

Authored papers

Probing Visual Planning in Image Editing Models

arXiv 2026

2026

ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting

CVPR 2025 1

2024

UltraEdit: Instruction-based Fine-Grained Image Editing at Scale

arXiv 2024

2024

Multi-modal Situated Reasoning in 3D Scenes

arXiv 2024

2024

Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon Prediction

CVPR 2023 1

2023

Bongard-OpenWorld: Few-Shot Reasoning for Free-form Visual Concepts in the Real World

arXiv 2023

2023

An Embodied Generalist Agent in 3D World

arXiv 2023

2023

MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning

arXiv 2023

2023

Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents

arXiv 2023

2023

GROOT: Learning to Follow Instructions by Watching Gameplay Videos

arXiv 2023

2023

Perceive, Ground, Reason, and Act: A Benchmark for General-purpose Visual Representation

arXiv 2022

2022

SQA3D: Situated Question Answering in 3D Scenes

arXiv 2022

2022

Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions

CVPR 2022 1

2022

Affiliations

No known affiliations.

Frequent co-authors

from 13 papers

Qing Li

Yitao Liang

Anji Liu

Shaofei Cai

Siyuan Huang

Song-Chun Zhu

ZiHao Wang

Baoxiong Jia

Jiangyong Huang

Baobao Chang