AIME as an LLM Evaluation Benchmark
Convention of using the American Invitational Mathematics Examination (AIME) - 15 hard math problems per year - to evaluate frontier reasoning models.
- Year
- 2024
- Venue
- blog
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- artofproblemsolving.com/wiki/index.php/AIME_Problems_and_Solutions
- TL;DR
- Semantic Scholar
Introduces 1 artifact - 1 eval