Cite
Notes
Only stored in your browser.
Attribution
Cooper: Co-Optimizing Policy and Reward Models in Reinforcement Learning for Large Language Models
arXiv 2025
from 1 papers
Guiyang Hou
Jun Xiao
Weiming Lu
Wenqi Zhang
Xingyu Wu
Yongliang Shen
Yuchen Yan