PKU-SafeRLHF
Fresh
Peking University's dual-axis safety + helpfulness preference dataset with explicit harm-category labels, designed for Safe RLHF training.
- Type
- Preference
- Publisher
- PKU-Alignment
- Capabilities
- SafetyJailbreak ResistanceInstruction Following
- Runtime
hf_parquet- License
- CC-BY-NC-4.0
- Size
- 330k+ preference pairs
- Published
- May 2026
Cite
Notes
Only stored in your browser.
Lift evidence
3| Eval | Tools known to lift | Source paper |
|---|---|---|
| AdvBench | PKU-SafeRLHF | - |
| HarmBench | PKU-SafeRLHF | - |
| XSTest | PKU-SafeRLHF | - |
Models
Notable models trained on it
Beaver-7Bmany academic Safe RLHF reproductionscomponents of safety mixtures in Tülu 3, Llama-Guard training research