Dongbin Zhao
- Papers
- 9
Cite
Notes
Only stored in your browser.
Authored papers
9Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes
arXiv 2026
Spec-o3: A Tool-Augmented Vision-Language Agent for Rare Celestial Object Candidate Vetting via Automated Spectral Inspection
arXiv 2026
ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy
arXiv 2025
ReasonPlan: Unified Scene Prediction and Decision Reasoning for Closed-loop Autonomous Driving
arXiv 2025
UncAD: Towards Safe End-to-end Autonomous Driving via Online Map Uncertainty
arXiv 2025
Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation
arXiv 2025
Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL
arXiv 2025
Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model Disentanglement
arXiv 2024
In-Dataset Trajectory Return Regularization for Offline Preference-based Reinforcement Learning
arXiv 2024
Affiliations
Frequent co-authors
10from 9 papers