0

SimpleQA

Active

4,326 short factual questions with a single unambiguous answer, designed to measure short-form factuality and calibrate hallucination rate.

Open
Publisher
OpenAI
Format
Custom
Size
4326 tasks
License
MIT
Published
Apr 2024
Notable for
Benchmark for evaluating factual recall and hallucination.

Cite

Notes

Only stored in your browser.

Attribution

Leaderboard scores
prime-hub
Attribution policy →

Top score 33.3% by GPT-4.1 Mini - 1 model reporting (1 frontier)

Top models

1
SimpleQABar chart with 1 bar. Highest value: GPT-4.1 Mini at 33.3.
1 model

Related tools

6
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

FAQ

What is SimpleQA?
4,326 short factual questions with a single unambiguous answer, designed to measure short-form factuality and calibrate hallucination rate.
What capabilities does SimpleQA test?
SimpleQA evaluates factual recall, hallucination.
What is the current top score on SimpleQA?
The top reported score is 33.3% by GPT-4.1 Mini, across 1 model reporting (1 from frontier labs).
How can a model improve its SimpleQA score?
Tools linked to SimpleQA on Sophon include Simpleqa RL Env (Prime Intellect), Simpleqa RL Env (Community), DeepDive (Serper-powered web QA), VF Openbench RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.
What license is SimpleQA under?
SimpleQA is available under MIT.