Jiangmiao Pang
- Papers
- 49
Cite
Notes
Only stored in your browser.
Authored papers
49InternVLA-A1: Unifying Understanding, Generation and Action for Robotic Manipulation
arXiv 2026
UltraDexGrasp: Learning Universal Dexterous Grasping for Bimanual Robots with Synthetic Data
arXiv 2026
SIM1: Physics-Aligned Simulator as Zero-Shot Data Scaler in Deformable Worlds
arXiv 2026
M^3: Dense Matching Meets Multi-View Foundation Models for Monocular Gaussian Splatting SLAM
arXiv 2026
EgoSim: Egocentric World Simulator for Embodied Interaction Generation
arXiv 2026
SynthVerse: A Large-Scale Diverse Synthetic Dataset for Point Tracking
arXiv 2026
RoboVIP: Multi-View Video Generation with Visual Identity Prompting Augments Robot Manipulation
arXiv 2026
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
arXiv 2025
Sekai: A Video Dataset towards World Exploration
arXiv 2025
Learning Humanoid Standing-up Control across Diverse Postures
arXiv 2025
GLEAM: Learning Generalizable Exploration Policy for Active Mapping in Complex 3D Indoor Scenes
ICCV 2025
Ground Slow, Move Fast: A Dual-System Foundation Model for Generalizable Vision-and-Language Navigation
arXiv 2025
Yume-1.5: A Text-Controlled Interactive World Generation Model
arXiv 2025
LoGoPlanner: Localization Grounded Navigation Policy with Metric-aware Visual Geometry
arXiv 2025
MM-ACT: Learn from Multimodal Parallel Generation to Act
arXiv 2025
G^2VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning
arXiv 2025
VL-LN Bench: Towards Long-horizon Goal-oriented Navigation with Active Dialogs
arXiv 2025
A Vision-Language-Action-Critic Model for Robotic Real-World Reinforcement Learning
arXiv 2025
F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions
arXiv 2025
InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation
arXiv 2025
ObjectGS: Object-aware Scene Reconstruction and Scene Understanding via Gaussian Splatting
ICCV 2025
OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding
arXiv 2025
InternScenes: A Large-scale Simulatable Indoor Scene Dataset with Realistic Layouts
arXiv 2025
MesaTask: Towards Task-Driven Tabletop Scene Generation via 3D Spatial Reasoning
arXiv 2025
Infinite Mobility: Scalable High-Fidelity Synthesis of Articulated Objects via Procedural Generation
arXiv 2025
A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning
CVPR 2025 1
X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model
arXiv 2025
MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds
arXiv 2025
$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning
arXiv 2025
MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence
arXiv 2025
ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation
arXiv 2025
StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling
arXiv 2025
InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy
arXiv 2025
Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning
arXiv 2025
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
arXiv 2025
Aether: Geometric-Aware Unified World Modeling
ICCV 2025
Evolving Symbolic 3D Visual Grounder with Weakly Supervised Reflection
arXiv 2025
GRUtopia: Dream General Robots in a City at Scale
arXiv 2024
MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations
arXiv 2024
GenNBV: Generalizable Next-Best-View Policy for Active 3D Reconstruction
CVPR 2024 1
Grounded 3D-LLM with Referent Tokens
arXiv 2024
What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights
arXiv 2024
Hybrid Internal Model: Learning Agile Legged Locomotion with Simulated Robot Response
arXiv 2023
Unified Human-Scene Interaction via Prompted Chain-of-Contacts
arXiv 2023
PointLLM: Empowering Large Language Models to Understand Point Clouds
arXiv 2023
OV-PARTS: Towards Open-Vocabulary Part Segmentation
NeurIPS 2023 11
Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers
arXiv 2023
MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training
CVPR 2023 1
Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking
CVPR 2023 1
Affiliations
Frequent co-authors
10from 49 papers