Cite
Notes
Only stored in your browser.
Attribution
Prior Constraints-based Reward Model Training for Aligning Large Language Models
arXiv 2024
ESRL: Efficient Sampling-based Reinforcement Learning for Sequence Generation
arXiv 2023
from 2 papers
Chenglong Wang
Hang Zhou
Jingbo Zhu
Tong Xiao
Bei Li
Chunliang Zhang
Tongran Liu
Yifu Huo