SWE-bench Lite
Frontier
300-issue subset of SWE-bench focused on functional bug fixes that are easier to evaluate - used for fast iteration before full SWE-bench runs.
- Publisher
- Princeton University
- Capabilities
- Code EditingDebuggingTool Calling
- Domain
- code
- Format
- Custom
- Size
- 300 tasks
- License
- MIT
- Published
- Oct 2023
- Notable for
- Benchmark for evaluating code editing, debugging and tool calling in the code domain.
- Canonical
- swebench.com/lite.html
Cite
Notes
Only stored in your browser.
Top score 58.3% by Claude 4 Sonnet - 11 models reporting (10 frontier)
Score history
10Top models
11Where it's ranked
1Related tools
4Implementations, trainers, datasets and scaffolds linked to this eval.
Papers
1FAQ
- What is SWE-bench Lite?
- 300-issue subset of SWE-bench focused on functional bug fixes that are easier to evaluate - used for fast iteration before full SWE-bench runs.
- What capabilities does SWE-bench Lite test?
- SWE-bench Lite evaluates code editing, debugging, tool calling.
- What is the current top score on SWE-bench Lite?
- The top reported score is 58.3% by Claude 4 Sonnet, across 11 models reporting (10 from frontier labs).
- How can a model improve its SWE-bench Lite score?
- Tools linked to SWE-bench Lite on Sophon include SWE-Gym, Agent Bench RL Env (Prime Community), Deepswe RL Env (Prime Intellect), Agent PLUS RL Env (Prime Intellect) - RL environments, datasets, and scaffolds that target this eval.
- What license is SWE-bench Lite under?
- SWE-bench Lite is available under MIT.