Zhaoran Wang
- Papers
- 8
Cite
Notes
Only stored in your browser.
Authored papers
8Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM Reasoning
arXiv 2025
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment
arXiv 2024
Reward-Augmented Data Enhances Direct Preference Alignment of LLMs
arXiv 2024
Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency
arXiv 2023
Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration
NeurIPS 2023 11
Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning
arXiv 2023
RORL: Robust Offline Reinforcement Learning via Conservative Smoothing
arXiv 2022
Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning
arXiv 2022
Affiliations
Frequent co-authors
10from 8 papers