RewardBench 2: Advancing Reward Model Evaluation

Allen AI's expanded benchmark for reward models and LLM judges, with explicit reward-hacking probes that surface judges fooled by length, formatting, sycophancy, or self-preference.

Open

Preview
Publisher: Allen Institute for AI (Ai2)
Year: 2025
Venue: preprint
ArXiv: arxiv.org/abs/2506.01937
Code: github.com/allenai/reward-bench
Authors: 5
Hosting: External sourcelicense unknown

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text: arxiv.org/abs/2506.01937
TL;DR: semanticscholar.org/paper/6069fa0fc5cccb8ebd0061c5e1816f5069cc255b
Code: github.com/allenai/reward-bench

Attribution policy →

Introduces 1 artifact - 1 eval

TL;DR

Semantic Scholar

This paper introduces RewardBench 2, a new multi-skill reward modeling benchmark designed to bring new, challenging data for accuracy-based reward model evaluation, while being highly correlated with downstream performance.

Artifacts

Evals

RewardBench 2

Authors

Hamish Ivison Jacob Morrison Nathan Lambert Saumya Malik Valentina Pyatkin