Measuring Mathematical Problem Solving With the MATH Dataset

Introduces the MATH benchmark of 12,500 competition-level math problems with step-by-step solutions, spanning algebra to number theory at high-school olympiad difficulty.

Open

Preview
Publisher: University of California, Berkeley
Year: 2021
Venue: NeurIPS
ArXiv: arxiv.org/abs/2103.03874
Code: github.com/hendrycks/math
Authors: 8
Hosting: External sourcelicense unknown

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text: arxiv.org/abs/2103.03874
TL;DR: semanticscholar.org/paper/57d1e7ac339e783898f2c3b1af55737cbeee9fc5
Code: github.com/hendrycks/math

Attribution policy →

Introduces 2 artifacts - 2 evals

TL;DR

Semantic Scholar

This work introduces MATH, a new dataset of 12,500 challenging competition mathematics problems which can be used to teach models to generate answer derivations and explanations and shows that accuracy remains relatively low, even with enormous Transformer models.

Artifacts

Evals

MATH MATH-500

Authors

Akul Arora Collin Burns Dan Hendrycks Dawn Song Eric Tang Jacob Steinhardt Saurav Kadavath Steven Basart