Cite
Notes
Only stored in your browser.
Attribution
Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends
arXiv 2025
from 1 papers
Bolin Ding
WenHao Zhang
Xuchen Pan
Yaliang Li
Yanxi Chen
Yuchang Sun
Yushuo Chen