0

Arena Math

LMArena subcategory ranking models on user pairwise votes restricted to math-related prompts.

Operator
LMArena
Kind
Human preference
Updates
live
Notable for
The user-preference complement to math accuracy benchmarks like MATH, AIME, and FrontierMath.
Tracks
Preference voting (no benchmark)

Cite

Notes

Only stored in your browser.

Backing benchmark

Human-preference voting. No underlying benchmark - models are ranked by pairwise votes, not by a test you can run.