IDE-Bench
Frontier
Agentic software-engineering tasks evaluated inside a real IDE development workflow, not isolated patch generation. By AfterQuery.
- Publisher
- AfterQuery
- Domain
- Coding
- Published
- Jun 2026
- Notable for
- Scores coding agents on end-to-end IDE workflows; open dataset on GitHub (AfterQuery/ide-bench).
- Canonical
- ide-bench.com
- Official leaderboard
- ide-bench.com
Cite
Notes
Only stored in your browser.
Top score 87.5% by Claude Sonnet 4.5 - 13 models reporting (7 frontier)
Score history
12Top models
13Where it's ranked
1FAQ
- What is IDE-Bench?
- Agentic software-engineering tasks evaluated inside a real IDE development workflow, not isolated patch generation. By AfterQuery.
- What is the current top score on IDE-Bench?
- The top reported score is 87.5% by Claude Sonnet 4.5, across 13 models reporting (7 from frontier labs).