Maryam Fazel

Cite

Notes

Only stored in your browser.

Attribution

2papers

Authored papers

Extragradient Preference Optimization (EGPO): Beyond Last-Iterate Convergence for Nash Learning from Human Feedback

arXiv 2025

Understanding the Performance Gap in Preference Learning: A Dichotomy of RLHF and DPO

arXiv 2025

No known affiliations.

from 2 papers

Runlong Zhou

Simon S. Du

Minhak Song

Ruizhe Shi

Zihan Zhang