0

SciCode

Frontier

80 expert-curated scientific coding problems (PDE solvers, quantum simulation, condensed-matter calculations) with hidden test cases.

Domain
science
Format
HF Dataset
Size
80 tasks
License
Apache-2.0
Published
Jul 2024
Notable for
Benchmark for evaluating code generation and scientific reasoning in the science domain.

Cite

Notes

Only stored in your browser.

Attribution

Leaderboard scores
AAprime-hub
Attribution policy →

Top score 58.9% by Gemini 3.1 Pro Preview - 387 models reporting (79 frontier)

Score history

387
0%25%50%75%100%Jul 23Feb 24Sep 24Apr 25Nov 25Claude 2.0GPT-4 TurboGPT-4o (2024-08-06)R1 Distill Qwen 32Bo4 MiniGemini 3 ProGemini 3.1 Pro Preview

Top models

387
SciCodeBar chart with 21 bars. Highest value: Gemini 3.1 Pro Preview at 58.9.
21 models

Related tools

2
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

FAQ

What is SciCode?
80 expert-curated scientific coding problems (PDE solvers, quantum simulation, condensed-matter calculations) with hidden test cases.
What capabilities does SciCode test?
SciCode evaluates code generation, scientific reasoning.
What is the current top score on SciCode?
The top reported score is 58.9% by Gemini 3.1 Pro Preview, across 387 models reporting (79 from frontier labs).
How can a model improve its SciCode score?
Tools linked to SciCode on Sophon include Scicode RL Env (Prime Community), Scicode RL Env (Prime Intellect) - RL environments, datasets, and scaffolds that target this eval.
What license is SciCode under?
SciCode is available under Apache-2.0.