0

BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset

Peking University's PKU-SafeRLHF / BeaverTails dataset and Safe-RLHF framework with separated helpfulness and harmlessness reward models for safety alignment.

Year
2023
Venue
NeurIPS
Authors
10
Hosting
External sourcelicense unknown

Cite

Notes

Only stored in your browser.

Introduces 2 artifacts - 1 eval, 1 tool

TL;DR

Semantic Scholar

The BeaverTails dataset is introduced, aimed at fostering research on safety alignment in large language models (LLMs) and providing vital resources for the community, contributing towards the safe development and deployment of LLMs.

Artifacts

2

Authors

10