0

IDE-Bench

Frontier

Agentic software-engineering tasks evaluated inside a real IDE development workflow, not isolated patch generation. By AfterQuery.

Publisher
AfterQuery
Domain
Coding
Published
Jun 2026
Notable for
Scores coding agents on end-to-end IDE workflows; open dataset on GitHub (AfterQuery/ide-bench).
Canonical
ide-bench.com
Official leaderboard
ide-bench.com

Cite

Notes

Only stored in your browser.

Attribution

Leaderboard scores
AfterQuery
Attribution policy →

Top score 87.5% by Claude Sonnet 4.5 - 13 models reporting (7 frontier)

Score history

12
0%25%50%75%100%Jan 25Apr 25Jul 25Oct 25R1Qwen3 Coder 480B A35BQwen3 MaxClaude Sonnet 4.5

Top models

13
IDE-BenchBar chart with 13 bars. Highest value: Claude Sonnet 4.5 at 87.5.
13 models

Where it's ranked

1

FAQ

What is IDE-Bench?
Agentic software-engineering tasks evaluated inside a real IDE development workflow, not isolated patch generation. By AfterQuery.
What is the current top score on IDE-Bench?
The top reported score is 87.5% by Claude Sonnet 4.5, across 13 models reporting (7 from frontier labs).