SWE-bench Verified: Resolving Real-World GitHub Issues
Active
Evaluates AI's ability to resolve genuine software engineering issues sourced from 12 popular Python GitHub repositories, reflecting realistic coding and debugging scenarios.
- Publisher
- Princeton University
- Domain
- Coding
- License
- mit
- Published
- May 2026
- Notable for
- Benchmark for evaluating Coding.
Cite
Notes
Only stored in your browser.
Related tools
4Implementations, trainers, datasets and scaffolds linked to this eval.
Papers
1FAQ
- What is SWE-bench Verified: Resolving Real-World GitHub Issues?
- Evaluates AI's ability to resolve genuine software engineering issues sourced from 12 popular Python GitHub repositories, reflecting realistic coding and debugging scenarios.
- How can a model improve its SWE-bench Verified: Resolving Real-World GitHub Issues score?
- Tools linked to SWE-bench Verified: Resolving Real-World GitHub Issues on Sophon include Agent Bench RL Env (Prime Community), Deepswe RL Env (Prime Intellect), Agent PLUS RL Env (Prime Intellect), Opencode SWE RL Env (Prime Intellect) - RL environments, datasets, and scaffolds that target this eval.
- What license is SWE-bench Verified: Resolving Real-World GitHub Issues under?
- SWE-bench Verified: Resolving Real-World GitHub Issues is available under mit.