Cite
Notes
Only stored in your browser.
Attribution
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
arXiv 2024
REBEL: Reinforcement Learning via Regressing Relative Rewards
Dataset Reset Policy Optimization for RLHF
from 3 papers
Jason D. Lee
Jonathan D. Chang
Kianté Brantley
Wen Sun
Gokul Swamy
Owen Oertell
Zhaolin Gao
Dipendra Misra
J. Andrew Bagnell
Thorsten Joachims