PKU-SafeRLHF

Fresh

Peking University's dual-axis safety + helpfulness preference dataset with explicit harm-category labels, designed for Safe RLHF training.

Type: Preference
Publisher: PKU-Alignment
Capabilities: Safety Jailbreak Resistance Instruction Following
Runtime: hf_parquet
License: CC-BY-NC-4.0
Size: 330k+ preference pairs
Published: May 2026
Canonical: huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF

Cite

Notes

Only stored in your browser.

Lift evidence

3

Eval	Tools known to lift	Source paper
AdvBench	PKU-SafeRLHF	-
HarmBench	PKU-SafeRLHF	-
XSTest	PKU-SafeRLHF	-

Models

Notable models trained on it

Beaver-7Bmany academic Safe RLHF reproductionscomponents of safety mixtures in Tülu 3, Llama-Guard training research

Papers

1

introducesBeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset

Contributors

3

Josef Dai Xuehai Pan Yaodong Yang