Tai Wang
- Papers
- 27
Cite
Notes
Only stored in your browser.
Authored papers
27The Python Simulations of Chemistry Framework: 10 years of an open-source quantum chemistry project
arXiv 2026
InternVLA-A1: Unifying Understanding, Generation and Action for Robotic Manipulation
arXiv 2026
GLEAM: Learning Generalizable Exploration Policy for Active Mapping in Complex 3D Indoor Scenes
ICCV 2025
Ground Slow, Move Fast: A Dual-System Foundation Model for Generalizable Vision-and-Language Navigation
arXiv 2025
LoGoPlanner: Localization Grounded Navigation Policy with Metric-aware Visual Geometry
arXiv 2025
G^2VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning
arXiv 2025
VL-LN Bench: Towards Long-horizon Goal-oriented Navigation with Active Dialogs
arXiv 2025
MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence
arXiv 2025
StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling
arXiv 2025
InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy
arXiv 2025
InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation
arXiv 2025
OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding
arXiv 2025
InternScenes: A Large-scale Simulatable Indoor Scene Dataset with Realistic Layouts
arXiv 2025
Evolving Symbolic 3D Visual Grounder with Weakly Supervised Reflection
arXiv 2025
X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model
arXiv 2025
GRUtopia: Dream General Robots in a City at Scale
arXiv 2024
MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations
arXiv 2024
Grounded 3D-LLM with Referent Tokens
arXiv 2024
GenNBV: Generalizable Next-Best-View Policy for Active 3D Reconstruction
CVPR 2024 1
Omni6D: Large-Vocabulary 3D Object Dataset for Category-Level 6D Object Pose Estimation
arXiv 2024
Scene as Occupancy
ICCV 2023 1
Unified Human-Scene Interaction via Prompted Chain-of-Contacts
arXiv 2023
PointLLM: Empowering Large Language Models to Understand Point Clouds
arXiv 2023
Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers
arXiv 2023
GeoMIM: Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding
ICCV 2023 1
MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training
CVPR 2023 1
MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection
ICCV 2023 1
Affiliations
Frequent co-authors
10from 27 papers