Dong Wang
- Papers
- 33
Cite
Notes
Only stored in your browser.
Authored papers
33Xiaomi-Robotics-0: An Open-Sourced Vision-Language-Action Model with Real-Time Execution
arXiv 2026
Dr. Zero: Self-Evolving Search Agents without Training Data
arXiv 2026
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
arXiv 2025
SpatialVLA: Exploring Spatial Representations for Visual-Language-Action Model
arXiv 2025
Hume: Introducing System-2 Thinking in Visual-Language-Action Model
arXiv 2025
Hybrid Latent Reasoning via Reinforcement Learning
arXiv 2025
In Pursuit of Pixel Supervision for Visual Pre-training
arXiv 2025
AWorld: Orchestrating the Training Recipe for Agentic AI
arXiv 2025
Meta CLIP 2: A Worldwide Scaling Recipe
arXiv 2025
Exploring the Potential of Encoder-free Architectures in 3D LMMs
arXiv 2025
Cross from Left to Right Brain: Adaptive Text Dreamer for Vision-and-Language Navigation
arXiv 2025
Sagalee: an Open Source Automatic Speech Recognition Dataset for Oromo Language
arXiv 2025
EvMic: Event-based Non-contact sound recovery from effective spatial-temporal modeling
arXiv 2025
EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control
arXiv 2025
Towards Generalist Robot Policies: What Matters in Building Vision-Language-Action Models
arXiv 2024
Learning Manipulation by Predicting Interaction
arXiv 2024
LiveScene: Language Embedding Interactive Radiance Fields for Physical Scene Rendering and Control
arXiv 2024
Train Once, Deploy Anywhere: Matryoshka Representation Learning for Multimodal Recommendation
arXiv 2024
Open-Vocabulary Federated Learning with Multimodal Prototyping
arXiv 2024
Off-Policy Primal-Dual Safe Reinforcement Learning
arXiv 2024
Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding
arXiv 2024
AlignBot: Aligning VLM-powered Customized Task Planning with User Reminders Through Fine-Tuning for Household Robots
arXiv 2024
Exploring Lightweight Hierarchical Vision Transformers for Efficient Visual Tracking
ICCV 2023 1
Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning
diffusion-model-is-an-effective-planner-and
Point-PEFT: Parameter-Efficient Fine-Tuning for 3D Pre-trained Models
arXiv 2023
ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding with GPT and Prototype Guidance
arXiv 2023
Kinematic-aware Prompting for Generalizable Articulated Object Manipulation with LLMs
arXiv 2023
Towards Nonlinear-Motion-Aware and Occlusion-Robust Rolling Shutter Correction
ICCV 2023 1
Tracking Anything in High Quality
arXiv 2023
Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement
ICCV 2023 1
Safe Offline Reinforcement Learning with Real-Time Budget Constraints
arXiv 2023
LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search
CVPR 2021 1
MOS: A Low Latency and Lightweight Framework for Face Detection, Landmark Localization, and Head Pose Estimation
arXiv 2021
Affiliations
Frequent co-authors
10from 33 papers