Open LLM Leaderboard
Hugging Face's automated leaderboard running a fixed evaluation harness across thousands of open-weight LLMs, reporting per-task and aggregate scores.
- Operator
- Hugging Face
- Kind
- Aggregated
- Updates
- live·updated 9h ago
- Notable for
- The dominant public ranking of open-weight LLMs; running it requires no API and surfaces small / specialty models the closed-API leaderboards ignore.
- Tracks
- 7 evals · aggregated
Cite
Notes
Only stored in your browser.
Per-eval breakdown
347models
| Model | ||||||||
|---|---|---|---|---|---|---|---|---|
| Gemini 3.1 Pro Preview Google (Alphabet Inc.) | - | 93.2% | - | - | - | - | - | 93.2% |
| Gemini 3 Pro Google (Alphabet Inc.) | 89.8% | - | - | - | - | - | - | 89.8% |
| Gemini 3 Pro Preview Google (Alphabet Inc.) | 89.5% | - | - | - | - | - | - | 89.5% |
| Gemini 3 Flash Preview Google (Alphabet Inc.) | 89.0% | 88.4% | - | - | - | - | - | 88.7% |
| GPT-5.5 OpenAI | 88.6% | - | - | - | - | - | - | 88.6% |
| GPT-5 OpenAI | 82.0% | 83.3% | 100.0% | - | - | - | - | 88.4% |
| Gemini 3 Flash Google (Alphabet Inc.) | 88.2% | - | - | - | - | - | - | 88.2% |
| Claude 4.1 Opus Anthropic | 88.0% | - | - | - | - | - | - | 88.0% |
| MiniMax M2.1 Minimax | 87.5% | - | - | - | - | - | - | 87.5% |
| Qwen3.5 397B A17B Alibaba | 87.3% | - | - | - | - | - | - | 87.3% |
| GPT-5.4 OpenAI | - | 87.2% | - | - | - | - | - | 87.2% |
| GPT-4.1 Mini OpenAI | 78.1% | - | 100.0% | 83.5% | - | - | - | 87.2% |
| Claude Opus 4.5 Anthropic | 88.9% | 84.7% | - | - | - | - | - | 86.8% |
| Grok 4 xAI | 86.6% | - | - | - | - | - | - | 86.6% |
| GPT-5 Codex OpenAI | 86.5% | - | - | - | - | - | - | 86.5% |
| DeepSeek V3.2 Speciale DeepSeek | 86.3% | - | - | - | - | - | - | 86.3% |
| Gemini 2.5 Pro Google (Alphabet Inc.) | 86.2% | - | - | - | - | - | - | 86.2% |
| Claude 4 Opus Anthropic | 86.0% | - | - | - | - | - | - | 86.0% |
| GPT-5.1-Codex OpenAI | 86.0% | - | - | - | - | - | - | 86.0% |
| Gemini 2.5 Pro Preview (Mar' 25) Google (Alphabet Inc.) | 85.8% | - | - | - | - | - | - | 85.8% |
| Doubao Seed Code ByteDance Seed | 85.4% | - | - | - | - | - | - | 85.4% |
| GLM 5.1 Zai | 85.4% | - | - | - | - | - | - | 85.4% |
| o3 OpenAI | 85.3% | - | - | - | - | - | - | 85.3% |
| DeepSeek V3.1 DeepSeek | 85.1% | - | - | - | - | - | - | 85.1% |
| MiMo-V2.5-Pro Xiaomi | 85.1% | - | - | - | - | - | - | 85.1% |
| Claude Sonnet 4.5 Anthropic | 86.0% | 83.9% | - | - | - | - | - | 85.0% |
| Kimi K2 Thinking Kimi | 84.8% | - | - | - | - | - | - | 84.8% |
| MiniMax M2.5 Minimax | - | 84.5% | - | - | - | - | - | 84.5% |
| R1 DeepSeek | 84.4% | - | - | - | - | - | - | 84.4% |
| Qwen3 235B A22B Thinking 2507 Alibaba | 84.3% | - | - | - | - | - | - | 84.3% |
| Gemini 2.5 Flash Preview (Sep '25) (Reasoning) Google (Alphabet Inc.) | 84.2% | - | - | - | - | - | - | 84.2% |
| o1 OpenAI | 84.1% | - | - | - | - | - | - | 84.1% |
| Qwen3 Max Alibaba | 84.1% | - | - | - | - | - | - | 84.1% |
| Qwen3 Max (Preview) Alibaba | 83.8% | - | - | - | - | - | - | 83.8% |
| DeepSeek V3.2 DeepSeek | 83.7% | - | - | - | - | - | - | 83.7% |
| Gemini 2.5 Pro Preview (May' 25) Google (Alphabet Inc.) | 83.7% | - | - | - | - | - | - | 83.7% |
| DeepSeek V3.1 Terminus DeepSeek | 83.6% | - | - | - | - | - | - | 83.6% |
| DeepSeek V3.2 Exp DeepSeek | 83.6% | - | - | - | - | - | - | 83.6% |
| Gemini 2.5 Flash Preview (Sep '25) (Non-reasoning) Google (Alphabet Inc.) | 83.6% | - | - | - | - | - | - | 83.6% |
| Qwen3 VL 235B A22B Thinking Alibaba | 83.6% | - | - | - | - | - | - | 83.6% |
| GLM 4.5 Zai | 83.5% | - | - | - | - | - | - | 83.5% |
| o4 Mini OpenAI | 83.2% | - | - | - | - | - | - | 83.2% |
| ERNIE 5.0 Thinking Preview Baidu | 83.0% | - | - | - | - | - | - | 83.0% |
| Grok 3 mini xAI | 82.8% | - | - | - | - | - | - | 82.8% |
| Qwen3.235B A22b Instruct 2507 Alibaba | 82.8% | - | - | - | - | - | - | 82.8% |
| Llama 3.1 Nemotron Ultra 253B v1 (Reasoning) NVIDIA | 82.5% | - | - | - | - | - | - | 82.5% |
| Kimi K2 0711 Moonshot AI | 82.4% | - | - | - | - | - | - | 82.4% |
| Qwen3 Max Thinking (Preview) Alibaba | 82.4% | - | - | - | - | - | - | 82.4% |
| Qwen3 Next 80B A3B Thinking Alibaba | 82.4% | - | - | - | - | - | - | 82.4% |
| Qwen3 VL 235B A22B Instruct Alibaba | 82.3% | - | - | - | - | - | - | 82.3% |
347 / 347 models