Cite
Notes
Only stored in your browser.
Attribution
Pre-Trained Policy Discriminators are General Reward Models
arXiv 2025
CoLLiE: Collaborative Training of Large Language Models in an Efficient Way
arXiv 2023
from 2 papers
Honglin Guo
Qipeng Guo
Xipeng Qiu
Chengqi Lv
Chenhao Huang
Demin Song
Enyu Zhou
Haijun Lv
Hang Yan
Jiawei Hong