0

Anthropic HH-RLHF

Anthropic's foundational helpful-and-harmless human preference dataset - the first major public RLHF corpus and a long-time community baseline.

Type
Preference
Publisher
Anthropic
Runtime
jsonl
License
MIT
Size
~170k preference pairs
Published
Apr 2022

Cite

Notes

Only stored in your browser.

Lift evidence

2
EvalTools known to liftSource paper
TruthfulQAAnthropic HH-RLHF-
HarmBenchAnthropic HH-RLHF-

Models

Notable models trained on it

early Llama-2-Chat-style reproductionscountless academic RLHF / DPO baselinesreward models for research benchmarks

Papers

1

Contributors

2