Jiajun Wu
- Papers
- 47
Cite
Notes
Only stored in your browser.
Authored papers
47World Model for Robot Learning: A Comprehensive Survey
arXiv 2026
ESI-Bench: Towards Embodied Spatial Intelligence that Closes the Perception-Action Loop
arXiv 2026
RAGEN-2: Reasoning Collapse in Agentic RL
arXiv 2026
Learning from Trials and Errors: Reflective Test-Time Planning for Embodied LLMs
arXiv 2026
Neuro-Symbolic Decoding of Neural Activity
arXiv 2026
IQuest-Coder-V1 Technical Report
arXiv 2026
RealWonder: Real-Time Physical Action-Conditioned Video Generation
arXiv 2026
Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration?
arXiv 2026
InCoder-32B: Code Foundation Model for Industrial Scenarios
arXiv 2026
Talk2Move: Reinforcement Learning for Text-Instructed Object-Level Geometric Transformation in Scenes
arXiv 2026
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning
arXiv 2025
Re-thinking Temporal Search for Long-Form Video Understanding
CVPR 2025 1
BEHAVIOR Robot Suite: Streamlining Real-World Whole-Body Manipulation for Everyday Household Activities
arXiv 2025
FluidNexus: 3D Fluid Reconstruction and Prediction from a Single Video
CVPR 2025 1
WonderZoom: Multi-Scale 3D World Generation
arXiv 2025
Explain Before You Answer: A Survey on Compositional Visual Reasoning
arXiv 2025
Spatial Mental Modeling from Limited Views
arXiv 2025
Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models
arXiv 2025
Taming generative video models for zero-shot optical flow extraction
arXiv 2025
ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction
arXiv 2025
Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas
arXiv 2025
Evaluating Real-World Robot Manipulation Policies in Simulation
arXiv 2024
WonderWorld: Interactive 3D Scene Generation from a Single Image
CVPR 2025 1
Generalizable Humanoid Manipulation with 3D Diffusion Policies
arXiv 2024
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making
arXiv 2024
HourVideo: 1-Hour Video-Language Understanding
arXiv 2024
TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction
arXiv 2024
Diffusion Self-Distillation for Zero-Shot Customized Image Generation
CVPR 2025 1
IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos
arXiv 2024
Visually Descriptive Language Model for Vector Graphics Reasoning
arXiv 2024
View-Invariant Policy Learning via Zero-Shot Novel View Synthesis
arXiv 2024
Foundation Models in Robotics: Applications, Challenges, and the Future
arXiv 2023
ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding
CVPR 2024 1
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
arXiv 2023
Holodeck: Language Guided Generation of 3D Embodied AI Environments
CVPR 2024 1
ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Image
CVPR 2024 1
Open X-Embodiment: Robotic Learning Datasets and RT-X Models
arXiv 2023
SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing
arXiv 2023
Language-Informed Visual Concept Learning
arXiv 2023
3D Copy-Paste: Physically Plausible Object Insertion for Monocular 3D Detection
3d-copy-paste-physically-plausible-object
Mini-BEHAVIOR: A Procedurally Generated Benchmark for Long-horizon Decision-Making in Embodied AI
arXiv 2023
Patched Denoising Diffusion Models For High-Resolution Image Synthesis
arXiv 2023
Disentanglement via Latent Quantization
disentanglement-via-latent-quantization
Motion Question Answering via Modular Motion Programs
arXiv 2023
SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations
sdedit-guided-image-synthesis-and-editing
End-to-End Optimization of Scene Layout
end-to-end-optimization-of-scene-layout
The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision
the-neuro-symbolic-concept-learner
Affiliations
Frequent co-authors
10from 47 papers