0

MATH

Saturated

12,500 high-school competition math problems with full LaTeX-formatted step-by-step solutions, spanning algebra through number theory.

Capabilities
Math
Domain
math
Format
HF Dataset
Size
12500 tasks
License
MIT
Published
Mar 2021
Notable for
Benchmark for evaluating math in the math domain.

Cite

Notes

Only stored in your browser.

Attribution

Leaderboard scores
prime-hub
Attribution policy →

Top score 100.0% by GPT-4.1 Mini - 2 models reporting (2 frontier)

Score history

2
95%96%98%99%100%Apr 25May 25Jun 25Jul 25Aug 25GPT-4.1 Mini

Top models

2
MATHBar chart with 2 bars. Highest value: GPT-4.1 Mini at 100.
2 models

Related tools

10
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

Papers

2

Contributors

1

FAQ

What is MATH?
12,500 high-school competition math problems with full LaTeX-formatted step-by-step solutions, spanning algebra through number theory.
What capabilities does MATH test?
MATH evaluates math.
What is the current top score on MATH?
The top reported score is 100.0% by GPT-4.1 Mini, across 2 models reporting (2 from frontier labs).
How can a model improve its MATH score?
Tools linked to MATH on Sophon include Hendrycks MATH RL Env (Community), Hendrycks MATH RL Env (Prime Intellect), Verifiers Math (math-python), VF Openbench RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.
What license is MATH under?
MATH is available under MIT.