0

LiveBench

Frontier

Rolling contamination-free benchmark that updates questions monthly across math, coding, reasoning, language, instruction-following, and data analysis.

Format
Custom
Size
1000 tasks
License
Apache-2.0
Published
Jun 2024
Notable for
Benchmark for evaluating math, code generation and instruction following.
Canonical
livebench.ai

Cite

Notes

Only stored in your browser.

Attribution

Leaderboard scores
LiveBench
Attribution policy →

Top score 82.4% by Gemini 3.1 Pro Preview - 137 models reporting (59 frontier)

Score history

100
20%40%60%80%100%Nov 22Aug 23May 24Feb 25Nov 25GPT-3.5 TurboGPT-4GPT-4 TurboGPT-4o (2024-05-13)o1 PreviewClaude Sonnet 3.7Claude Opus 4.5Gemini 3.1 Pro Preview

Top models

137
LiveBenchBar chart with 21 bars. Highest value: Gemini 3.1 Pro Preview at 82.4.
21 models

Where it's ranked

1

Contributors

1

FAQ

What is LiveBench?
Rolling contamination-free benchmark that updates questions monthly across math, coding, reasoning, language, instruction-following, and data analysis.
What capabilities does LiveBench test?
LiveBench evaluates math, code generation, instruction following, factual recall.
What is the current top score on LiveBench?
The top reported score is 82.4% by Gemini 3.1 Pro Preview, across 137 models reporting (59 from frontier labs).
What license is LiveBench under?
LiveBench is available under Apache-2.0.