Han Zhong
- Papers
- 6
Cite
Notes
Only stored in your browser.
6papers
Authored papers
6Less is More: Improving LLM Alignment via Preference Data Selection
arXiv 2025
Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment
arXiv 2024
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment
arXiv 2024
DPO Meets PPO: Reinforced Token Optimization for RLHF
arXiv 2024
Towards Robust Offline Reinforcement Learning under Diverse Data Corruption
arXiv 2023
Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration
NeurIPS 2023 11
Affiliations
No known affiliations.
Frequent co-authors
10from 6 papers