0

RewardBench 2: Advancing Reward Model Evaluation

Allen AI's expanded benchmark for reward models and LLM judges, with explicit reward-hacking probes that surface judges fooled by length, formatting, sycophancy, or self-preference.

Year
2025
Venue
preprint
Authors
5
Hosting
External sourcelicense unknown

Cite

Notes

Only stored in your browser.

Introduces 1 artifact - 1 eval

TL;DR

Semantic Scholar

This paper introduces RewardBench 2, a new multi-skill reward modeling benchmark designed to bring new, challenging data for accuracy-based reward model evaluation, while being highly correlated with downstream performance.

Artifacts

1

Authors

5