GDPval
Active
OpenAI's economic-impact eval - 220 expert-curated tasks weighted by US-GDP contribution across 44 occupations, evaluating whether models can do real white-collar work.
- Publisher
- OpenAI
- Capabilities
- Factual RecallInstruction FollowingPlanning
- Format
- Custom
- Size
- 220 tasks
- License
- Closed
- Published
- Oct 2025
- Notable for
- Benchmark for evaluating factual recall, instruction following and planning.
- Canonical
- openai.com/index/gdpval
Cite
Notes
Only stored in your browser.
Top score 0.0% by GPT-5 - 1 model reporting (1 frontier)
Top models
1Related tools
1Implementations, trainers, datasets and scaffolds linked to this eval.
Papers
2FAQ
- What is GDPval?
- OpenAI's economic-impact eval - 220 expert-curated tasks weighted by US-GDP contribution across 44 occupations, evaluating whether models can do real white-collar work.
- What capabilities does GDPval test?
- GDPval evaluates factual recall, instruction following, planning.
- What is the current top score on GDPval?
- The top reported score is 0.0% by GPT-5, across 1 model reporting (1 from frontier labs).
- How can a model improve its GDPval score?
- Tools linked to GDPval on Sophon include Gdpval RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.
- What license is GDPval under?
- GDPval is available under Closed.