0

HelpSteer2

Fresh

NVIDIA's permissively-licensed human-annotated preference dataset with 5-axis Likert ratings - engineered to train high-quality reward models.

Type
Preference
Publisher
NVIDIA
Runtime
hf_parquet
License
CC-BY-4.0
Size
21k prompts, ~10k preference pairs
Published
May 2026

Cite

Notes

Only stored in your browser.

Lift evidence

3

Models

Notable models trained on it

Llama-3-70B-SteerLM-RM (top of RewardBench at release)Nemotron-4 340B reward modelmany open reward models in 2024-2025

Papers

1

Contributors

3