0

MATH: Measuring Mathematical Problem Solving

Active

Dataset of 12,500 challenging competition mathematics problems. Demonstrates fewshot prompting and custom scorers. NOTE: The dataset has been taken down due to a DMCA notice from The Art of Problem Solving.

Domain
Mathematics
License
mit
Published
May 2026
Notable for
Benchmark for evaluating Mathematics.

Cite

Notes

Only stored in your browser.

Related tools

4
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

Papers

1

FAQ

What is MATH: Measuring Mathematical Problem Solving?
Dataset of 12,500 challenging competition mathematics problems. Demonstrates fewshot prompting and custom scorers. NOTE: The dataset has been taken down due to a DMCA notice from The Art of Problem Solving.
How can a model improve its MATH: Measuring Mathematical Problem Solving score?
Tools linked to MATH: Measuring Mathematical Problem Solving on Sophon include Hendrycks MATH RL Env (Community), Hendrycks MATH RL Env (Prime Intellect), Hendrycksmath RL Env (Community), MATH 500 RL Env (Prime Intellect) - RL environments, datasets, and scaffolds that target this eval.
What license is MATH: Measuring Mathematical Problem Solving under?
MATH: Measuring Mathematical Problem Solving is available under mit.