What capabilities does TruthfulQA test?

TruthfulQA evaluates hallucination, factual recall.

How can a model improve its TruthfulQA score?

Tools linked to TruthfulQA on Sophon include Anthropic HH-RLHF - RL environments, datasets, and scaffolds that target this eval.

What license is TruthfulQA under?

TruthfulQA is available under Apache-2.0.

TruthfulQA

817 questions targeting common human misconceptions, measuring whether a model gives factually true answers or repeats popular falsehoods.

Open

Publisher: Future of Humanity Institute (Oxford)
Capabilities: Hallucination Factual Recall
Format: HF Dataset
Size: 817 tasks
License: Apache-2.0
Published: Sep 2021
Notable for: Benchmark for evaluating hallucination and factual recall.
Canonical: github.com/sylinrl/TruthfulQA
Also on: huggingface.co/datasets/truthful_qa

Cite

Notes

Only stored in your browser.

Where it's ranked

Open LLM Leaderboard

Hugging Face

Aggregated

aggregated with 6 others · live

Related tools

View all

Implementations, trainers, datasets and scaffolds linked to this eval.

Anthropic HH-RLHF

Anthropic

Anthropic's foundational helpful-and-harmless human preference dataset - the first major public RLHF corpus and a long-time community baseline.

Training dataPreferenceSafetyJailbreak ResistanceMulti Turn Dialog

Papers

TruthfulQA: Measuring How Models Mimic Human Falsehoods

ACL · 2021

Introduces TruthfulQA, 817 adversarial questions designed so that imitating common human misconceptions yields wrong answers; larger models often do worse.

introduces

TruthfulQA: Measuring How Models Mimic Human Falsehoods

ACL · 2021

Introduces TruthfulQA, 817 adversarial questions designed so that imitating common human misconceptions yields wrong answers; larger models often do worse.

Contributors

SStephanie Lin JJacob Hilton OOwain Evans

FAQ

What is TruthfulQA?: 817 questions targeting common human misconceptions, measuring whether a model gives factually true answers or repeats popular falsehoods.
What capabilities does TruthfulQA test?: TruthfulQA evaluates hallucination, factual recall.
How can a model improve its TruthfulQA score?: Tools linked to TruthfulQA on Sophon include Anthropic HH-RLHF - RL environments, datasets, and scaffolds that target this eval.
What license is TruthfulQA under?: TruthfulQA is available under Apache-2.0.