What capabilities does GDPval test?

GDPval evaluates factual recall, instruction following, planning.

What is the current top score on GDPval?

The top reported score is 0.0% by GPT-5, across 1 model reporting (1 from frontier labs).

How can a model improve its GDPval score?

Tools linked to GDPval on Sophon include Gdpval RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.

What license is GDPval under?

GDPval is available under Closed.

GDPval

Active

OpenAI's economic-impact eval - 220 expert-curated tasks weighted by US-GDP contribution across 44 occupations, evaluating whether models can do real white-collar work.

Open

Publisher: OpenAI
Capabilities: Factual Recall Instruction Following Planning
Format: Custom
Size: 220 tasks
License: Closed
Published: Oct 2025
Notable for: Benchmark for evaluating factual recall, instruction following and planning.
Canonical: openai.com/index/gdpval

Cite

Notes

Only stored in your browser.

Attribution

Leaderboard scores: prime-hub

Attribution policy →

Top score 0.0% by GPT-5 - 1 model reporting (1 frontier)

Top models

GDPvalBar chart with 1 bar. Highest value: GPT-5 at 0.

1 model

Related tools

View all

Implementations, trainers, datasets and scaffolds linked to this eval.

Gdpval RL Env (Community)

GDPval-style evaluation environment (LLM-judged) for verifiers

ImplementationRL Env

Papers

GDPval: Evaluating AI Model Performance on Real-World Economically Valuable Tasks

preprint · 2025

OpenAI's eval of frontier models against expert deliverables in 44 occupations spanning the top GDP-contributing sectors of the US economy, judged blind by industry experts.

introduces

GDPval: Evaluating AI Model Performance on Real-World Economically Valuable Tasks

preprint · 2025

OpenAI's eval of frontier models against expert deliverables in 44 occupations spanning the top GDP-contributing sectors of the US economy, judged blind by industry experts.

FAQ

What is GDPval?: OpenAI's economic-impact eval - 220 expert-curated tasks weighted by US-GDP contribution across 44 occupations, evaluating whether models can do real white-collar work.
What capabilities does GDPval test?: GDPval evaluates factual recall, instruction following, planning.
What is the current top score on GDPval?: The top reported score is 0.0% by GPT-5, across 1 model reporting (1 from frontier labs).
How can a model improve its GDPval score?: Tools linked to GDPval on Sophon include Gdpval RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.
What license is GDPval under?: GDPval is available under Closed.