RewardBench 2: Advancing Reward Model Evaluation
Allen AI's expanded benchmark for reward models and LLM judges, with explicit reward-hacking probes that surface judges fooled by length, formatting, sycophancy, or self-preference.
- Publisher
- Allen Institute for AI (Ai2)
- Year
- 2025
- Venue
- preprint
- Authors
- 5
- Hosting
- External sourcelicense unknown
Cite
Notes
Only stored in your browser.
Introduces 1 artifact - 1 eval
TL;DR
Semantic Scholar
This paper introduces RewardBench 2, a new multi-skill reward modeling benchmark designed to bring new, challenging data for accuracy-based reward model evaluation, while being highly correlated with downstream performance.
Artifacts
1Evals