Cite
Notes
Only stored in your browser.
Attribution
DPO Meets PPO: Reinforced Token Optimization for RLHF
arXiv 2024
from 1 papers
Di He
Guhao Feng
Han Zhong
Jiang Bian
Li Zhao
LiWei Wang
Wei Xiong
Xinle Cheng