MATH-500
Saturated
500-problem subset of the Hendrycks MATH competition-math benchmark, popularized by OpenAI's PRM800K work as a standard evaluation slice.
- Publisher
- OpenAI
- Domain
- math
- Format
- HF Dataset
- Size
- 500 tasks
- License
- MIT
- Published
- Mar 2021
- Notable for
- Benchmark for evaluating math and planning in the math domain.
- Canonical
- github.com/openai/prm800k
Cite
Notes
Only stored in your browser.
Top score 99.2% by Grok 3 mini - 178 models reporting (46 frontier)
Score history
178Top models
178Where it's ranked
2Related tools
6Implementations, trainers, datasets and scaffolds linked to this eval.
Papers
2Contributors
2FAQ
- What is MATH-500?
- 500-problem subset of the Hendrycks MATH competition-math benchmark, popularized by OpenAI's PRM800K work as a standard evaluation slice.
- What capabilities does MATH-500 test?
- MATH-500 evaluates math, planning.
- What is the current top score on MATH-500?
- The top reported score is 99.2% by Grok 3 mini, across 178 models reporting (46 from frontier labs).
- How can a model improve its MATH-500 score?
- Tools linked to MATH-500 on Sophon include MATH 500 RL Env (Community), MATH 500 RL Env (Prime Intellect), VF Openbench RL Env (Community), NuminaMath - RL environments, datasets, and scaffolds that target this eval.
- What license is MATH-500 under?
- MATH-500 is available under MIT.
