0

Anthropic HH-RLHF

Active

Anthropic's helpful & harmless preference dataset - paired human-rated assistant responses widely used both as a preference-training corpus and as a reward-model benchmark.

Publisher
Anthropic
Domain
safety
Format
HF Dataset
Size
~170k preference pairs (helpful: ~118k base + online + rejection-sampled; harmless: ~42k) tasks
License
MIT
Published
Apr 2022
Notable for
Benchmark for evaluating safety, harmful content and instruction following in the safety domain.

Cite

Notes

Only stored in your browser.

Papers

2

FAQ

What is Anthropic HH-RLHF?
Anthropic's helpful & harmless preference dataset - paired human-rated assistant responses widely used both as a preference-training corpus and as a reward-model benchmark.
What capabilities does Anthropic HH-RLHF test?
Anthropic HH-RLHF evaluates safety, harmful content, instruction following.
What license is Anthropic HH-RLHF under?
Anthropic HH-RLHF is available under MIT.