0

Arena Coding

LMArena subcategory ranking models on user pairwise votes restricted to coding prompts.

Operator
LMArena
Kind
Human preference
Updates
live
Notable for
The reference public ranking for code-LLM preference, complementing execution-based benchmarks like SWE-bench and LiveCodeBench.
Tracks
Preference voting (no benchmark)

Cite

Notes

Only stored in your browser.

Backing benchmark

Human-preference voting. No underlying benchmark - models are ranked by pairwise votes, not by a test you can run.