Arena Hard Prompts
LMArena subcategory ranking models on a filtered slice of Arena prompts auto-classified as hard along multiple difficulty axes.
- Operator
- LMArena
- Kind
- Human preference
- Updates
- live
- Notable for
- The user-preference view of frontier model separation on hard prompts; correlated with Arena-Hard offline benchmark.
- Tracks
- Preference voting (no benchmark)
Cite
Notes
Only stored in your browser.
Backing benchmark
Human-preference voting. No underlying benchmark - models are ranked by pairwise votes, not by a test you can run.