FrontierScience: Expert-Level Scientific Reasoning
Active
Evaluates AI capabilities for expert-level scientific reasoning across physics, chemistry, and biology. Contains 160 problems with two evaluation formats: Olympic (100 samples with reference answers) and Research (60 samples with rubrics).
- Publisher
- OpenAI
- Domain
- Knowledge
- License
- mit
- Published
- Jan 2026
- Notable for
- Benchmark for evaluating Knowledge.
Cite
Notes
Only stored in your browser.
Attribution
- README
- github.com/UKGovernmentBEIS/inspect_evals/blob/main/src/inspect_evals/frontierscience/eval.yamlMIT
- Leaderboard scores
- prime-hub
Top score 44.8% by GPT-5.5 - 7 models reporting (1 frontier)
Score history
6Top models
7Related tools
2Implementations, trainers, datasets and scaffolds linked to this eval.
FAQ
- What is FrontierScience: Expert-Level Scientific Reasoning?
- Evaluates AI capabilities for expert-level scientific reasoning across physics, chemistry, and biology. Contains 160 problems with two evaluation formats: Olympic (100 samples with reference answers) and Research (60 samples with rubrics).
- What is the current top score on FrontierScience: Expert-Level Scientific Reasoning?
- The top reported score is 44.8% by GPT-5.5, across 7 models reporting (1 from frontier labs).
- How can a model improve its FrontierScience: Expert-Level Scientific Reasoning score?
- Tools linked to FrontierScience: Expert-Level Scientific Reasoning on Sophon include Frontierscience RL Env (Prime Intellect), Frontierscience RL Env (Wazupsteve) - RL environments, datasets, and scaffolds that target this eval.
- What license is FrontierScience: Expert-Level Scientific Reasoning under?
- FrontierScience: Expert-Level Scientific Reasoning is available under mit.
