SimpleQA/SimpleQA Verified: Measuring short-form factuality in large language models
Active
A benchmark that evaluates the ability of language models to answer short, fact-seeking questions.
- Publisher
- OpenAI
- Domain
- Knowledge
- License
- mit
- Published
- Feb 2025
- Notable for
- Benchmark for evaluating Knowledge.
Cite
Notes
Only stored in your browser.
Related tools
4Implementations, trainers, datasets and scaffolds linked to this eval.
FAQ
- What is SimpleQA/SimpleQA Verified: Measuring short-form factuality in large language models?
- A benchmark that evaluates the ability of language models to answer short, fact-seeking questions.
- How can a model improve its SimpleQA/SimpleQA Verified: Measuring short-form factuality in large language models score?
- Tools linked to SimpleQA/SimpleQA Verified: Measuring short-form factuality in large language models on Sophon include Simpleqa RL Env (Prime Intellect), Simpleqa RL Env (Community), Simpleqa Verified RL Env (Prime Intellect), Simpleqa Verified RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.
- What license is SimpleQA/SimpleQA Verified: Measuring short-form factuality in large language models under?
- SimpleQA/SimpleQA Verified: Measuring short-form factuality in large language models is available under mit.