Anthropic HH-RLHF
Active
Anthropic's helpful & harmless preference dataset - paired human-rated assistant responses widely used both as a preference-training corpus and as a reward-model benchmark.
- Publisher
- Anthropic
- Capabilities
- SafetyHarmful ContentInstruction Following
- Domain
- safety
- Format
- HF Dataset
- Size
- ~170k preference pairs (helpful: ~118k base + online + rejection-sampled; harmless: ~42k) tasks
- License
- MIT
- Published
- Apr 2022
- Notable for
- Benchmark for evaluating safety, harmful content and instruction following in the safety domain.
Cite
Notes
Only stored in your browser.
Papers
2FAQ
- What is Anthropic HH-RLHF?
- Anthropic's helpful & harmless preference dataset - paired human-rated assistant responses widely used both as a preference-training corpus and as a reward-model benchmark.
- What capabilities does Anthropic HH-RLHF test?
- Anthropic HH-RLHF evaluates safety, harmful content, instruction following.
- What license is Anthropic HH-RLHF under?
- Anthropic HH-RLHF is available under MIT.