TruthfulQA: Measuring How Models Mimic Human Falsehoods

Introduces TruthfulQA, 817 adversarial questions designed so that imitating common human misconceptions yields wrong answers; larger models often do worse.

Open

Preview
Publisher: University of Oxford
Year: 2021
Venue: ACL
ArXiv: arxiv.org/abs/2109.07958
Code: github.com/sylinrl/TruthfulQA
Authors: 3
Hosting: External sourcelicense unknown

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text: arxiv.org/abs/2109.07958
TL;DR: semanticscholar.org/paper/77d956cdab4508d569ae5741549b78e715fd0749
Code: github.com/sylinrl/TruthfulQA

Attribution policy →

Introduces 1 artifact - 1 eval

TL;DR

Semantic Scholar

It is suggested that scaling up models alone is less promising for improving truthfulness than fine-tuning using training objectives other than imitation of text from the web.

Artifacts

Evals

TruthfulQA

Authors

Jacob Hilton Owain Evans Stephanie Lin