SimpleQA
Active
4,326 short factual questions with a single unambiguous answer, designed to measure short-form factuality and calibrate hallucination rate.
- Publisher
- OpenAI
- Capabilities
- Factual RecallHallucination
- Format
- Custom
- Size
- 4326 tasks
- License
- MIT
- Published
- Apr 2024
- Notable for
- Benchmark for evaluating factual recall and hallucination.
Cite
Notes
Only stored in your browser.
Top score 33.3% by GPT-4.1 Mini - 1 model reporting (1 frontier)
Top models
1Related tools
6Implementations, trainers, datasets and scaffolds linked to this eval.
FAQ
- What is SimpleQA?
- 4,326 short factual questions with a single unambiguous answer, designed to measure short-form factuality and calibrate hallucination rate.
- What capabilities does SimpleQA test?
- SimpleQA evaluates factual recall, hallucination.
- What is the current top score on SimpleQA?
- The top reported score is 33.3% by GPT-4.1 Mini, across 1 model reporting (1 from frontier labs).
- How can a model improve its SimpleQA score?
- Tools linked to SimpleQA on Sophon include Simpleqa RL Env (Prime Intellect), Simpleqa RL Env (Community), DeepDive (Serper-powered web QA), VF Openbench RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.
- What license is SimpleQA under?
- SimpleQA is available under MIT.