Cite
Notes
Only stored in your browser.
Attribution
Understanding the Performance Gap in Preference Learning: A Dichotomy of RLHF and DPO
arXiv 2025
Extragradient Preference Optimization (EGPO): Beyond Last-Iterate Convergence for Nash Learning from Human Feedback
from 2 papers
Runlong Zhou
Simon S. Du
Minhak Song
Ruizhe Shi
Zihan Zhang