WenHao Zhang
- Papers
- 7
Cite
Notes
Only stored in your browser.
Authored papers
7On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models
arXiv 2026
TCOD: Exploring Temporal Curriculum in On-Policy Distillation for Multi-turn Autonomous Agents
arXiv 2026
Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models
arXiv 2025
StereoVLA: Enhancing Vision-Language-Action Models with Stereo Vision
arXiv 2025
On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting
arXiv 2025
Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends
arXiv 2025
Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration
arXiv 2025
Affiliations
Frequent co-authors
10from 7 papers