0

Tai Wang

Papers
27

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
27papers

Authored papers

27

The Python Simulations of Chemistry Framework: 10 years of an open-source quantum chemistry project

arXiv 2026

2026

InternVLA-A1: Unifying Understanding, Generation and Action for Robotic Manipulation

arXiv 2026

2026

GLEAM: Learning Generalizable Exploration Policy for Active Mapping in Complex 3D Indoor Scenes

ICCV 2025

2025

Ground Slow, Move Fast: A Dual-System Foundation Model for Generalizable Vision-and-Language Navigation

arXiv 2025

2025

LoGoPlanner: Localization Grounded Navigation Policy with Metric-aware Visual Geometry

arXiv 2025

2025

G^2VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning

arXiv 2025

2025

VL-LN Bench: Towards Long-horizon Goal-oriented Navigation with Active Dialogs

arXiv 2025

2025

MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence

arXiv 2025

2025

StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling

arXiv 2025

2025

InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy

arXiv 2025

2025

InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation

arXiv 2025

2025

OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding

arXiv 2025

2025

InternScenes: A Large-scale Simulatable Indoor Scene Dataset with Realistic Layouts

arXiv 2025

2025

Evolving Symbolic 3D Visual Grounder with Weakly Supervised Reflection

arXiv 2025

2025

X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model

arXiv 2025

2025

GRUtopia: Dream General Robots in a City at Scale

arXiv 2024

2024

MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations

arXiv 2024

2024

Grounded 3D-LLM with Referent Tokens

arXiv 2024

2024

GenNBV: Generalizable Next-Best-View Policy for Active 3D Reconstruction

CVPR 2024 1

2024

Omni6D: Large-Vocabulary 3D Object Dataset for Category-Level 6D Object Pose Estimation

arXiv 2024

2024

Scene as Occupancy

ICCV 2023 1

2023

Unified Human-Scene Interaction via Prompted Chain-of-Contacts

arXiv 2023

2023

PointLLM: Empowering Large Language Models to Understand Point Clouds

arXiv 2023

2023

Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers

arXiv 2023

2023

GeoMIM: Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding

ICCV 2023 1

2023

MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training

CVPR 2023 1

2023

MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection

ICCV 2023 1

2022

Affiliations

No known affiliations.

Frequent co-authors

10

from 27 papers