Runlong Zhou

Cite

Notes

Only stored in your browser.

Attribution

4papers

Authored papers

Extragradient Preference Optimization (EGPO): Beyond Last-Iterate Convergence for Nash Learning from Human Feedback

arXiv 2025

RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments

arXiv 2025

Understanding the Performance Gap in Preference Learning: A Dichotomy of RLHF and DPO

arXiv 2025

Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised Learning

arXiv 2023

No known affiliations.

from 4 papers

Maryam Fazel

Simon S. Du

Simon Shaolei Du

Abhishek Gupta

Chenyang Zhao

Chuning Zhu

Hamish Ivison

grad-student

Hannaneh Hajishirzi

professor

Hao Peng

Jacqueline He