0

FrontierScience: Expert-Level Scientific Reasoning

Active

Evaluates AI capabilities for expert-level scientific reasoning across physics, chemistry, and biology. Contains 160 problems with two evaluation formats: Olympic (100 samples with reference answers) and Research (60 samples with rubrics).

Publisher
OpenAI
Domain
Knowledge
License
mit
Published
Jan 2026
Notable for
Benchmark for evaluating Knowledge.

Cite

Notes

Only stored in your browser.

Top score 44.8% by GPT-5.5 - 7 models reporting (1 frontier)

Score history

6
25%44%63%81%100%Feb 26Mar 26Apr 26Qwen3.5 397B A17BGPT-5.5

Top models

7
FrontierScience: Expert-Level Scientific ReasoningBar chart with 7 bars. Highest value: GPT-5.5 at 44.8.
7 models

Related tools

2
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

FAQ

What is FrontierScience: Expert-Level Scientific Reasoning?
Evaluates AI capabilities for expert-level scientific reasoning across physics, chemistry, and biology. Contains 160 problems with two evaluation formats: Olympic (100 samples with reference answers) and Research (60 samples with rubrics).
What is the current top score on FrontierScience: Expert-Level Scientific Reasoning?
The top reported score is 44.8% by GPT-5.5, across 7 models reporting (1 from frontier labs).
How can a model improve its FrontierScience: Expert-Level Scientific Reasoning score?
Tools linked to FrontierScience: Expert-Level Scientific Reasoning on Sophon include Frontierscience RL Env (Prime Intellect), Frontierscience RL Env (Wazupsteve) - RL environments, datasets, and scaffolds that target this eval.
What license is FrontierScience: Expert-Level Scientific Reasoning under?
FrontierScience: Expert-Level Scientific Reasoning is available under mit.