FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI

Epoch AI benchmark of hundreds of original research-level math problems authored by professional mathematicians, with auto-verifiable answers.

Publisher: Epoch AI
Year: 2024
Venue: preprint
ArXiv: arxiv.org/abs/2411.04872
Authors: 10
Hosting: External sourcelicense unknown

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text: arxiv.org/abs/2411.04872
TL;DR: Semantic Scholar

Attribution policy →

Introduces 1 artifact - 1 eval

Artifacts

1

Evals

Authors

10

Alex Gunning Caroline Falkman Olsson Diego Chicharro Ege Erdil Elliot Glazer Evan Chen Hugh Zhang Jean-Marie Bourgade Mateusz Puciarski Tamay Besiroglu