What capabilities does LiveCodeBench test?

LiveCodeBench evaluates code generation, debugging.

What is the current top score on LiveCodeBench?

The top reported score is 91.7% by Gemini 3 Pro, across 279 models reporting (62 from frontier labs).

How can a model improve its LiveCodeBench score?

Tools linked to LiveCodeBench on Sophon include Livecodebench RL Env (Prime Intellect), OpenThoughts - RL environments, datasets, and scaffolds that target this eval.

What license is LiveCodeBench under?

LiveCodeBench is available under MIT.

LiveCodeBench

Frontier

Rolling competitive-programming benchmark that scrapes LeetCode / AtCoder / Codeforces problems after a known cutoff to fight contamination.

Open

Publisher: University of California, Berkeley
Capabilities: Code Generation Debugging
Domain: code
Format: Custom
Size: 1055 tasks
License: MIT
Published: Mar 2024
Updates: Monthly
Notable for: The reference contamination-free coding leaderboard — problems are date-stamped so each model is only scored on problems released after its training cutoff.
Canonical: livecodebench.github.io
Official leaderboard: livecodebench.github.io/leaderboard.html
Also on: huggingface.co/datasets/livecodebench/code_generation

Cite

Notes

Only stored in your browser.

Attribution

Leaderboard scores: AA prime-hub

Attribution policy →

Top score 91.7% by Gemini 3 Pro - 279 models reporting (62 frontier)

Score history

279

Top models

279

LiveCodeBenchBar chart with 21 bars. Highest value: Gemini 3 Pro at 91.7.

21 models

Where it's ranked

Official leaderboard

livecodebench.github.io

Single benchmark

monthly

LiveBench

Abacus.AI

Aggregated

aggregated with 6 others · monthly

Related tools

View all

Implementations, trainers, datasets and scaffolds linked to this eval.

Livecodebench RL Env (Prime Intellect)

Prime Intellect

LiveCodeBench evaluation environment

ImplementationRL EnvCode

OpenThoughts

Open Thoughts

A fully-open distillation of long DeepSeek-R1 reasoning traces - the community's flagship "open R1" SFT corpus for reasoning models.

Training dataSFT DatasetMathCode GenerationScientific Reasoning

Papers

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

NeurIPS · 2024

UC Berkeley benchmark that continuously scrapes new LeetCode/AtCoder/CodeForces problems to give a contamination-free, time-stamped coding leaderboard.

introduces

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

NeurIPS · 2024

UC Berkeley benchmark that continuously scrapes new LeetCode/AtCoder/CodeForces problems to give a contamination-free, time-stamped coding leaderboard.

Contributors

NNaman Jain

FAQ

What is LiveCodeBench?: Rolling competitive-programming benchmark that scrapes LeetCode / AtCoder / Codeforces problems after a known cutoff to fight contamination.
What capabilities does LiveCodeBench test?: LiveCodeBench evaluates code generation, debugging.
What is the current top score on LiveCodeBench?: The top reported score is 91.7% by Gemini 3 Pro, across 279 models reporting (62 from frontier labs).
How can a model improve its LiveCodeBench score?: Tools linked to LiveCodeBench on Sophon include Livecodebench RL Env (Prime Intellect), OpenThoughts - RL environments, datasets, and scaffolds that target this eval.
What license is LiveCodeBench under?: LiveCodeBench is available under MIT.