Artificial Analysis - Intelligence Index

Name: Artificial Analysis - Intelligence Index
Creator: Artificial Analysis

Aggregate Intelligence Index (0-100) over MMLU-Pro, GPQA-Diamond, HumanEval, MATH-500, and other reasoning benchmarks. Published by Artificial Analysis with per-model pricing, throughput, and latency.

Open

Operator: Artificial Analysis
Kind: Aggregated
Updates: weekly·updated 9h ago
Notable for: intelligence-index
URL: artificialanalysis.ai/leaderboards/models
Tracks: 12 evals · aggregated

Cite

Notes

Only stored in your browser.

Attribution

Scores: artificialanalysis.ai/leaderboards/models

Attribution policy →

Intelligence ranking

Artificial Analysis - Intelligence Index · IntelligenceBar chart with 21 bars. Highest value: Claude Fable 5 at 59.9.

21 models

Per-eval breakdown

505

models

Model	↗	↗	↗	↗	↗	↗	↗	↗	↗	↗	↗	↗
R1 1776 Perplexity AI	-	-	-	-	-	-	95.4%	-	-	-	-	-	95.4%
o1 Preview OpenAI	-	-	-	-	-	-	92.4%	-	-	-	-	-	92.4%
Qwen3-235B-A22B Alibaba Qwen (Tongyi Qianwen)	85.7%	-	-	-	-	-	-	-	-	-	-	-	85.7%
o3 Pro OpenAI	-	-	84.5%	-	-	-	-	-	-	-	-	-	84.5%
Hermes 4 (405B) Nous Research	81.9%	-	-	-	-	-	-	-	-	-	-	-	81.9%
DeepSeek-V2.5 (Dec '24) DeepSeek	-	-	-	-	-	-	76.3%	-	-	-	-	-	76.3%
DeepSeek-Coder-V2 DeepSeek	-	-	-	-	-	-	74.3%	-	-	-	-	-	74.3%
Gemini 3 Pro Google (Alphabet Inc.)	-	95.7%	91.9%	37.5%	70.4%	91.7%	-	89.8%	56.1%	87.1%	41.7%	-	73.5%
Gemini 3 Flash Preview Google (Alphabet Inc.)	-	97.0%	89.8%	34.7%	78.0%	90.8%	-	89.0%	50.6%	80.4%	38.6%	-	72.1%
Claude Fable 5 Anthropic	-	-	92.6%	53.3%	63.5%	-	-	-	60.2%	98.5%	62.9%	-	71.8%
o3 OpenAI	96.7%	88.3%	87.7%	20.0%	71.4%	80.8%	99.2%	85.3%	41.0%	80.7%	37.1%	-	71.7%
Grok 4 xAI	94.3%	92.7%	87.7%	23.9%	53.7%	81.9%	99.0%	86.6%	45.7%	74.9%	37.9%	-	70.7%
Gemini 3.1 Pro Preview Google (Alphabet Inc.)	-	-	94.1%	44.7%	77.1%	-	-	-	58.9%	95.6%	53.8%	-	70.7%
GPT-5.6 Sol OpenAI	-	-	94.1%	47.2%	72.7%	-	-	-	56.1%	85.1%	65.9%	-	70.2%
Gemini 2.5 Pro Preview (Mar' 25) Google (Alphabet Inc.)	87.0%	-	83.6%	17.1%	-	77.8%	98.0%	85.8%	39.5%	-	-	-	69.8%
Gemini 2.5 Pro Preview (May' 25) Google (Alphabet Inc.)	84.3%	-	82.2%	15.4%	-	77.0%	98.6%	83.7%	41.6%	-	-	-	69.0%
GPT-5 Codex OpenAI	-	98.7%	83.7%	25.6%	74.1%	84.0%	-	86.5%	40.9%	86.8%	37.9%	-	68.7%
GPT-5.6 Sol (xhigh) OpenAI	-	-	93.1%	44.7%	71.0%	-	-	-	56.0%	84.8%	61.4%	-	68.5%
GPT-5.6 Sol (high) OpenAI	-	-	92.8%	44.1%	69.2%	-	-	-	56.9%	83.3%	62.1%	-	68.1%
Claude Opus 4.8 Anthropic	-	-	92.0%	45.7%	62.2%	-	-	-	53.5%	94.4%	58.3%	-	67.7%
Qwen3.7 Max Alibaba	-	-	92.3%	38.1%	80.5%	-	-	-	48.8%	94.7%	50.8%	-	67.5%
Gemini 3 Deep Think Google DeepMind	-	-	93.8%	41.0%	-	-	-	-	-	-	-	-	67.4%
Kimi K2 Thinking Kimi	-	94.7%	83.8%	22.3%	68.1%	85.3%	-	84.8%	42.4%	93.0%	31.1%	-	67.3%
GLM 5.2 Zai	-	-	89.5%	40.1%	73.3%	-	-	-	50.5%	99.1%	50.8%	-	67.2%
GPT-5.6 Terra OpenAI	-	-	92.5%	41.8%	71.2%	-	-	-	53.9%	86.3%	57.6%	-	67.2%
GPT-5.1-Codex OpenAI	-	95.7%	86.0%	23.4%	70.0%	84.9%	-	86.0%	40.2%	83.0%	34.8%	-	67.1%
GPT-5.6 Sol (medium) OpenAI	-	-	92.6%	39.7%	69.6%	-	-	-	56.5%	81.0%	62.9%	-	67.0%
o4 Mini OpenAI	94.0%	90.7%	78.4%	17.5%	68.7%	85.9%	98.9%	83.2%	46.5%	55.6%	15.2%	-	66.8%
GPT-5.3-Codex OpenAI	-	-	91.5%	39.9%	75.4%	-	-	-	53.2%	86.0%	53.0%	-	66.5%
Kimi K2.6 Moonshot AI	-	-	91.1%	35.9%	76.0%	-	-	-	53.5%	95.9%	43.9%	-	66.1%
Qwen3.5 397B A17B Alibaba	-	-	89.3%	27.3%	78.8%	-	-	87.3%	42.0%	95.6%	40.9%	-	65.9%
Kimi K3 Kimi	-	-	93.5%	44.3%	-	-	-	-	58.7%	-	-	-	65.5%
Muse Spark Meta Platforms	-	-	88.4%	39.9%	75.9%	-	-	-	51.5%	91.5%	45.5%	-	65.4%
GPT-5.6 Terra (xhigh) OpenAI	-	-	90.8%	40.0%	66.3%	-	-	-	51.6%	80.4%	62.9%	-	65.3%
Gemini 2.5 Pro Google (Alphabet Inc.)	88.7%	87.7%	84.4%	21.1%	48.7%	80.1%	96.7%	86.2%	42.8%	54.1%	26.5%	-	65.2%
MiniMax M3 Minimax	-	-	92.9%	37.1%	82.9%	-	-	-	45.4%	88.9%	42.4%	-	64.9%
Grok 3 mini xAI	93.3%	84.7%	79.1%	11.1%	45.9%	69.6%	99.2%	82.8%	40.6%	90.4%	17.4%	-	64.9%
Claude Mythos Preview Anthropic	-	-	-	64.7%	-	-	-	-	-	-	-	-	64.7%
MiniMax M2.1 Minimax	-	82.7%	83.0%	22.2%	69.9%	81.0%	-	87.5%	40.7%	85.4%	28.8%	-	64.6%
Qwen3.7 Plus Alibaba	-	-	90.0%	33.4%	78.0%	-	-	-	45.5%	93.0%	47.0%	-	64.5%
Gemini 3 Pro Preview Google (Alphabet Inc.)	-	86.7%	88.7%	27.6%	49.7%	85.7%	-	89.5%	49.9%	68.1%	34.1%	-	64.4%
Muse Spark 1.1 Meta Platforms	-	-	89.8%	45.1%	-	-	-	-	58.2%	-	-	-	64.4%
GPT-5.6 Sol (low) OpenAI	-	-	89.8%	36.6%	66.5%	-	-	-	55.4%	76.0%	60.6%	-	64.2%
GPT-5.2-Codex OpenAI	-	-	89.9%	33.5%	77.6%	-	-	-	54.6%	92.1%	37.1%	-	64.1%
Qwen3 235B A22B Thinking 2507 Alibaba	94.0%	91.0%	79.0%	15.0%	51.2%	78.8%	98.4%	84.3%	42.4%	53.2%	13.6%	-	63.7%
Qwen3.6 Max Preview Alibaba	-	-	88.8%	28.9%	76.6%	-	-	-	46.9%	95.9%	43.9%	-	63.5%
GPT-5.6 Terra (high) OpenAI	-	-	89.6%	36.7%	64.4%	-	-	-	50.1%	78.4%	57.6%	-	62.8%
KAT-Coder-Pro V1 KwaiKAT	-	94.7%	76.4%	33.4%	68.4%	74.7%	-	81.3%	36.6%	88.6%	9.1%	-	62.6%
Grok 4.5 xAI	-	-	93.1%	40.3%	-	-	-	-	54.1%	-	-	-	62.5%
GPT-5.1-Codex-Mini OpenAI	-	91.7%	81.3%	16.9%	67.9%	83.6%	-	82.0%	42.6%	62.9%	33.3%	-	62.5%

505 / 505 models

Evals tracked

AIME 2024: Problems from the American Invitational Mathematics Examination

AIME 2025: Problems from the American Invitational Mathematics Examination

GPQA Diamond

Humanity's Last Exam (HLE)

τ²-bench (Tau²-bench)

Terminal-Bench (Hard)