Cite
Notes
Only stored in your browser.
Attribution
Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation
arXiv 2024
It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF
Flatness-Aware Prompt Selection Improves Accuracy and Sample Efficiency
arXiv 2023
from 3 papers
Lingfeng Shen
Amr Sharaf
Beidi Chen
Benjamin Van Durme
Boyuan Zheng
Daniel Khashabi
Haoran Xu
Huaxiu Yao
Kenton Murray
Taiming Lu