BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset
Peking University's PKU-SafeRLHF / BeaverTails dataset and Safe-RLHF framework with separated helpfulness and harmlessness reward models for safety alignment.
- Publisher
- Peking University
- Year
- 2023
- Venue
- NeurIPS
- Authors
- 10
- Hosting
- External sourcelicense unknown
Cite
Notes
Only stored in your browser.
Introduces 2 artifacts - 1 eval, 1 tool
TL;DR
Semantic Scholar
The BeaverTails dataset is introduced, aimed at fostering research on safety alignment in large language models (LLMs) and providing vital resources for the community, contributing towards the safe development and deployment of LLMs.
Artifacts
2Evals
Tools