MATH
Saturated
12,500 high-school competition math problems with full LaTeX-formatted step-by-step solutions, spanning algebra through number theory.
- Publisher
- University of California, Berkeley
- Capabilities
- Math
- Domain
- math
- Format
- HF Dataset
- Size
- 12500 tasks
- License
- MIT
- Published
- Mar 2021
- Notable for
- Benchmark for evaluating math in the math domain.
- Canonical
- github.com/hendrycks/math
Cite
Notes
Only stored in your browser.
Top score 100.0% by GPT-4.1 Mini - 2 models reporting (2 frontier)
Score history
2Top models
2Related tools
10Implementations, trainers, datasets and scaffolds linked to this eval.
Papers
2Contributors
1FAQ
- What is MATH?
- 12,500 high-school competition math problems with full LaTeX-formatted step-by-step solutions, spanning algebra through number theory.
- What capabilities does MATH test?
- MATH evaluates math.
- What is the current top score on MATH?
- The top reported score is 100.0% by GPT-4.1 Mini, across 2 models reporting (2 from frontier labs).
- How can a model improve its MATH score?
- Tools linked to MATH on Sophon include Hendrycks MATH RL Env (Community), Hendrycks MATH RL Env (Prime Intellect), Verifiers Math (math-python), VF Openbench RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.
- What license is MATH under?
- MATH is available under MIT.