Cite
Notes
Only stored in your browser.
Attribution
Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking
arXiv 2023
from 1 papers
Adam Fisch
Ahmad Beirami
Alekh Agarwal
Alex D'Amour
Chirag Nagpal
Deepak Ramachandran
Jacob Eisenstein
Jonathan Berant
Katherine Heller
Peter Shaw