Cite
Notes
Only stored in your browser.
Attribution
On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning
arXiv 2025
from 1 papers
Huizhuo Yuan
Quanquan Gu
Yang Yuan
Yifan Zhang
Yifeng Liu