0

GDPval

Active

OpenAI's economic-impact eval - 220 expert-curated tasks weighted by US-GDP contribution across 44 occupations, evaluating whether models can do real white-collar work.

Publisher
OpenAI
Format
Custom
Size
220 tasks
License
Closed
Published
Oct 2025
Notable for
Benchmark for evaluating factual recall, instruction following and planning.

Cite

Notes

Only stored in your browser.

Attribution

Leaderboard scores
prime-hub
Attribution policy →

Top score 0.0% by GPT-5 - 1 model reporting (1 frontier)

Top models

1
GDPvalBar chart with 1 bar. Highest value: GPT-5 at 0.
1 model

Related tools

1
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

Papers

2

FAQ

What is GDPval?
OpenAI's economic-impact eval - 220 expert-curated tasks weighted by US-GDP contribution across 44 occupations, evaluating whether models can do real white-collar work.
What capabilities does GDPval test?
GDPval evaluates factual recall, instruction following, planning.
What is the current top score on GDPval?
The top reported score is 0.0% by GPT-5, across 1 model reporting (1 from frontier labs).
How can a model improve its GDPval score?
Tools linked to GDPval on Sophon include Gdpval RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.
What license is GDPval under?
GDPval is available under Closed.