Cite
Notes
Only stored in your browser.
Attribution
Understanding the Performance Gap in Preference Learning: A Dichotomy of RLHF and DPO
arXiv 2025
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
arXiv 2023
from 2 papers
Simon S. Du
Huazhe Xu
Maryam Fazel
Minhak Song
Runlong Zhou
Yanjie Ze
Yuyao Liu
Zihan Zhang