HelpSteer2
Fresh
NVIDIA's permissively-licensed human-annotated preference dataset with 5-axis Likert ratings - engineered to train high-quality reward models.
- Type
- Preference
- Publisher
- NVIDIA
- Capabilities
- HallucinationSafetyInstruction Following
- Runtime
hf_parquet- License
- CC-BY-4.0
- Size
- 21k prompts, ~10k preference pairs
- Published
- May 2026
Cite
Notes
Only stored in your browser.
Lift evidence
3| Eval | Tools known to lift | Source paper |
|---|---|---|
| RewardBench | HelpSteer2 | - |
| Arena-Hard | HelpSteer2 | - |
| MT-Bench | HelpSteer2 | - |
Models
Notable models trained on it
Llama-3-70B-SteerLM-RM (top of RewardBench at release)Nemotron-4 340B reward modelmany open reward models in 2024-2025