What capabilities does LiveBench test?

LiveBench evaluates math, code generation, instruction following, factual recall.

What is the current top score on LiveBench?

The top reported score is 82.4% by Gemini 3.1 Pro Preview, across 142 models reporting (61 from frontier labs).

What license is LiveBench under?

LiveBench is available under Apache-2.0.

LiveBench

Frontier

Rolling contamination-free benchmark that updates questions monthly across math, coding, reasoning, language, instruction-following, and data analysis.

Open

Publisher: Meta FAIR (Fundamental AI Research)
Capabilities: Math Code Generation Instruction Following Factual Recall
Format: Custom
Size: 1000 tasks
License: Apache-2.0
Published: Jun 2024
Notable for: Benchmark for evaluating math, code generation and instruction following.
Canonical: livebench.ai
Also on: github.com/LiveBench/LiveBench

Cite

Notes

Only stored in your browser.

Attribution

Leaderboard scores: LiveBench

Attribution policy →

Top score 82.4% by Gemini 3.1 Pro Preview - 142 models reporting (61 frontier)

Score history

106

Top models

142

LiveBenchBar chart with 21 bars. Highest value: Gemini 3.1 Pro Preview at 82.4.

21 models

Where it's ranked

LiveBench

Abacus.AI

Aggregated

aggregated with 6 others · monthly

Contributors

YYann LeCun

FAQ

What is LiveBench?: Rolling contamination-free benchmark that updates questions monthly across math, coding, reasoning, language, instruction-following, and data analysis.
What capabilities does LiveBench test?: LiveBench evaluates math, code generation, instruction following, factual recall.
What is the current top score on LiveBench?: The top reported score is 82.4% by Gemini 3.1 Pro Preview, across 142 models reporting (61 from frontier labs).
What license is LiveBench under?: LiveBench is available under Apache-2.0.