0

SWE-bench Verified: Resolving Real-World GitHub Issues

Active

Evaluates AI's ability to resolve genuine software engineering issues sourced from 12 popular Python GitHub repositories, reflecting realistic coding and debugging scenarios.

Domain
Coding
License
mit
Published
May 2026
Notable for
Benchmark for evaluating Coding.

Cite

Notes

Only stored in your browser.

Related tools

4
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

Papers

1

FAQ

What is SWE-bench Verified: Resolving Real-World GitHub Issues?
Evaluates AI's ability to resolve genuine software engineering issues sourced from 12 popular Python GitHub repositories, reflecting realistic coding and debugging scenarios.
How can a model improve its SWE-bench Verified: Resolving Real-World GitHub Issues score?
Tools linked to SWE-bench Verified: Resolving Real-World GitHub Issues on Sophon include Agent Bench RL Env (Prime Community), Deepswe RL Env (Prime Intellect), Agent PLUS RL Env (Prime Intellect), Opencode SWE RL Env (Prime Intellect) - RL environments, datasets, and scaffolds that target this eval.
What license is SWE-bench Verified: Resolving Real-World GitHub Issues under?
SWE-bench Verified: Resolving Real-World GitHub Issues is available under mit.