0

Dong Wang

Papers
33

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
33papers

Authored papers

33

Xiaomi-Robotics-0: An Open-Sourced Vision-Language-Action Model with Real-Time Execution

arXiv 2026

2026

Dr. Zero: Self-Evolving Search Agents without Training Data

arXiv 2026

2026

Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

arXiv 2025

2025

SpatialVLA: Exploring Spatial Representations for Visual-Language-Action Model

arXiv 2025

2025

Hume: Introducing System-2 Thinking in Visual-Language-Action Model

arXiv 2025

2025

Hybrid Latent Reasoning via Reinforcement Learning

arXiv 2025

2025

In Pursuit of Pixel Supervision for Visual Pre-training

arXiv 2025

2025

AWorld: Orchestrating the Training Recipe for Agentic AI

arXiv 2025

2025

Meta CLIP 2: A Worldwide Scaling Recipe

arXiv 2025

2025

Exploring the Potential of Encoder-free Architectures in 3D LMMs

arXiv 2025

2025

Cross from Left to Right Brain: Adaptive Text Dreamer for Vision-and-Language Navigation

arXiv 2025

2025

Sagalee: an Open Source Automatic Speech Recognition Dataset for Oromo Language

arXiv 2025

2025

EvMic: Event-based Non-contact sound recovery from effective spatial-temporal modeling

arXiv 2025

2025

EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control

arXiv 2025

2025

Towards Generalist Robot Policies: What Matters in Building Vision-Language-Action Models

arXiv 2024

2024

Learning Manipulation by Predicting Interaction

arXiv 2024

2024

LiveScene: Language Embedding Interactive Radiance Fields for Physical Scene Rendering and Control

arXiv 2024

2024

Train Once, Deploy Anywhere: Matryoshka Representation Learning for Multimodal Recommendation

arXiv 2024

2024

Open-Vocabulary Federated Learning with Multimodal Prototyping

arXiv 2024

2024

Off-Policy Primal-Dual Safe Reinforcement Learning

arXiv 2024

2024

Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding

arXiv 2024

2024

AlignBot: Aligning VLM-powered Customized Task Planning with User Reminders Through Fine-Tuning for Household Robots

arXiv 2024

2024

Exploring Lightweight Hierarchical Vision Transformers for Efficient Visual Tracking

ICCV 2023 1

2023

Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning

diffusion-model-is-an-effective-planner-and

2023

Point-PEFT: Parameter-Efficient Fine-Tuning for 3D Pre-trained Models

arXiv 2023

2023

ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding with GPT and Prototype Guidance

arXiv 2023

2023

Kinematic-aware Prompting for Generalizable Articulated Object Manipulation with LLMs

arXiv 2023

2023

Towards Nonlinear-Motion-Aware and Occlusion-Robust Rolling Shutter Correction

ICCV 2023 1

2023

Tracking Anything in High Quality

arXiv 2023

2023

Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement

ICCV 2023 1

2023

Safe Offline Reinforcement Learning with Real-Time Budget Constraints

arXiv 2023

2023

LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search

CVPR 2021 1

2021

MOS: A Low Latency and Lightweight Framework for Face Detection, Landmark Localization, and Head Pose Estimation

arXiv 2021

2021

Affiliations

No known affiliations.

Frequent co-authors

10

from 33 papers