Cite
Notes
Only stored in your browser.
Attribution
Pre-Trained Policy Discriminators are General Reward Models
arXiv 2025
from 1 papers
Chenhao Huang
Demin Song
Enyu Zhou
Haijun Lv
Honglin Guo
Kai Chen
Qi Zhang
Qiming Ge
Qipeng Guo
Shichun Liu