Anthropic HH-RLHF

Anthropic's foundational helpful-and-harmless human preference dataset - the first major public RLHF corpus and a long-time community baseline.

Cite

Notes

Only stored in your browser.

Lift evidence

Eval	Tools known to lift	Source paper
TruthfulQA	Anthropic HH-RLHF	-
HarmBench	Anthropic HH-RLHF	-

Notable models trained on it

early Llama-2-Chat-style reproductionscountless academic RLHF / DPO baselinesreward models for research benchmarks