0

LMArena

Crowdsourced human-preference LLM evaluation platform; the dominant public arena leaderboard, spun out of UC Berkeley LMSYS.

Type
eval co
HQ
Berkeley, CA, USA
Founded
2023 (as Chatbot Arena under LMSYS); incorporated as LMArena Inc. 2025
Funding
Private; raised $100M seed at $600M valuation in 2025, led by a16z and UC Investments
Website
lmarena.ai

Cite

Notes

Only stored in your browser.

Evals
2
Tools
1
Models
0
Papers
3
Boards
18
People
4

Leaderboards

18
Arena Hard PromptsLMArena subcategory ranking models on a filtered slice of Arena prompts auto-classified as hard along multiple difficulty axes.Arena MathLMArena subcategory ranking models on user pairwise votes restricted to math-related prompts.Arena OverallThe headline LMArena (formerly Chatbot Arena) ranking aggregating crowd pairwise preference votes across all prompt categories into a Bradley-Terry Elo-style rating.Arena - DocumentCrowdsourced document model ratings from LMArena. Elo-style scores computed from pairwise human preference votes.Arena - Document Style ControlCrowdsourced document style control model ratings from LMArena. Elo-style scores computed from pairwise human preference votes.Arena - Image EditCrowdsourced image edit model ratings from LMArena. Elo-style scores computed from pairwise human preference votes.Arena - Image to VideoCrowdsourced image to video model ratings from LMArena. Elo-style scores computed from pairwise human preference votes.Arena - SearchCrowdsourced search model ratings from LMArena. Elo-style scores computed from pairwise human preference votes.Arena - Search Style ControlCrowdsourced search style control model ratings from LMArena. Elo-style scores computed from pairwise human preference votes.Arena - TextCrowdsourced text model ratings from LMArena. Elo-style scores computed from pairwise human preference votes.Arena - Text Style ControlCrowdsourced text style control model ratings from LMArena. Elo-style scores computed from pairwise human preference votes.Arena - Text to ImageCrowdsourced text to image model ratings from LMArena. Elo-style scores computed from pairwise human preference votes.Arena - Text to VideoCrowdsourced text to video model ratings from LMArena. Elo-style scores computed from pairwise human preference votes.Arena - Video EditCrowdsourced video edit model ratings from LMArena. Elo-style scores computed from pairwise human preference votes.Arena - Vision Style ControlCrowdsourced vision style control model ratings from LMArena. Elo-style scores computed from pairwise human preference votes.Arena - WebdevCrowdsourced webdev model ratings from LMArena. Elo-style scores computed from pairwise human preference votes.Arena CodingLMArena subcategory ranking models on user pairwise votes restricted to coding prompts.Arena VisionLMArena subcategory for vision-language models (VLMs), ranked by user pairwise votes on prompts that include an image.

Evals

2

Tools

1

Papers

3