0

Training language models to follow instructions with human feedback

The InstructGPT paper that introduced the SFT + reward-model + PPO RLHF recipe and showed a 1.3B aligned model is preferred over the 175B base GPT-3.

Publisher
OpenAI
Year
2022
Venue
NeurIPS
Authors
21
Hosting
External sourcelicense unknown

Cite

Notes

Only stored in your browser.

Introduces 1 artifact - 1 model

TL;DR

Semantic Scholar

The results show that fine-tuning with human feedback is a promising direction for aligning language models with human intent and showing improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets.

Artifacts

1

Authors

21