What capabilities does Aider Polyglot Benchmark test?

Aider Polyglot Benchmark evaluates code editing, code generation.

What is the current top score on Aider Polyglot Benchmark?

The top reported score is 88.0% by GPT-5, across 37 models reporting (25 from frontier labs).

How can a model improve its Aider Polyglot Benchmark score?

Tools linked to Aider Polyglot Benchmark on Sophon include Aiderpolyglot RL Env (Community), Aider Polyglot RL Env (Prime Community), Aiderpolyglot RL Env (Prime Intellect) - RL environments, datasets, and scaffolds that target this eval.

What license is Aider Polyglot Benchmark under?

Aider Polyglot Benchmark is available under Apache-2.0.

Aider Polyglot Benchmark

Frontier

225 Exercism coding exercises across six programming languages, run through the Aider CLI to measure real-world code-editing agent performance.

Open

Publisher: Aider
Capabilities: Code Editing Code Generation
Domain: code
Format: Custom
Size: 225 tasks
License: Apache-2.0
Published: Dec 2024
Updates: Weekly
Notable for: The most cited public leaderboard specifically for code-editing capability (vs synthesis-only HumanEval-style benches).
Canonical: aider.chat/docs/leaderboards
Official leaderboard: aider.chat/docs/leaderboards
Also on: github.com/Aider-AI/polyglot-benchmark

Cite

Notes

Only stored in your browser.

Attribution

Leaderboard scores: Aider prime-hub

Attribution policy →

Top score 88.0% by GPT-5 - 37 models reporting (25 frontier)

Score history

Top models

Aider Polyglot BenchmarkBar chart with 21 bars. Highest value: GPT-5 at 88.

21 models

Where it's ranked

Official leaderboard

aider.chat

Single benchmark

weekly

Related tools

View all

Implementations, trainers, datasets and scaffolds linked to this eval.

Aiderpolyglot RL Env (Community)

Multi-turn environment for testing coding abilities across multiple programming languages using Exercism exercises

ImplementationRL EnvCodingPolyglotCode

Aider Polyglot RL Env (Prime Community)

Prime Community

Multi-turn environment for testing coding abilities across multiple programming languages using Exercism exercises

ImplementationRL EnvCodingPolyglotCode

Aiderpolyglot RL Env (Prime Intellect)

Prime Intellect

Multi-turn environment for testing coding abilities across multiple programming languages using Exercism exercises

ImplementationRL EnvCodingPolyglotCode

Papers

Aider's Polyglot Coding Benchmark

blog · 2024

Aider's polyglot benchmark of 225 hard Exercism exercises across six languages used to rank coding LLMs by edit-correctness.

introduces

Aider's Polyglot Coding Benchmark

blog · 2024

Aider's polyglot benchmark of 225 hard Exercism exercises across six languages used to rank coding LLMs by edit-correctness.

Contributors

PPaul Gauthier

FAQ

What is Aider Polyglot Benchmark?: 225 Exercism coding exercises across six programming languages, run through the Aider CLI to measure real-world code-editing agent performance.
What capabilities does Aider Polyglot Benchmark test?: Aider Polyglot Benchmark evaluates code editing, code generation.
What is the current top score on Aider Polyglot Benchmark?: The top reported score is 88.0% by GPT-5, across 37 models reporting (25 from frontier labs).
How can a model improve its Aider Polyglot Benchmark score?: Tools linked to Aider Polyglot Benchmark on Sophon include Aiderpolyglot RL Env (Community), Aider Polyglot RL Env (Prime Community), Aiderpolyglot RL Env (Prime Intellect) - RL environments, datasets, and scaffolds that target this eval.
What license is Aider Polyglot Benchmark under?: Aider Polyglot Benchmark is available under Apache-2.0.