0

SimpleQA/SimpleQA Verified: Measuring short-form factuality in large language models

Active

A benchmark that evaluates the ability of language models to answer short, fact-seeking questions.

Publisher
OpenAI
Domain
Knowledge
License
mit
Published
Feb 2025
Notable for
Benchmark for evaluating Knowledge.

Cite

Notes

Only stored in your browser.

Related tools

4
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

FAQ

What is SimpleQA/SimpleQA Verified: Measuring short-form factuality in large language models?
A benchmark that evaluates the ability of language models to answer short, fact-seeking questions.
How can a model improve its SimpleQA/SimpleQA Verified: Measuring short-form factuality in large language models score?
Tools linked to SimpleQA/SimpleQA Verified: Measuring short-form factuality in large language models on Sophon include Simpleqa RL Env (Prime Intellect), Simpleqa RL Env (Community), Simpleqa Verified RL Env (Prime Intellect), Simpleqa Verified RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.
What license is SimpleQA/SimpleQA Verified: Measuring short-form factuality in large language models under?
SimpleQA/SimpleQA Verified: Measuring short-form factuality in large language models is available under mit.