0

SWE-bench Lite

Frontier

300-issue subset of SWE-bench focused on functional bug fixes that are easier to evaluate - used for fast iteration before full SWE-bench runs.

Domain
code
Format
Custom
Size
300 tasks
License
MIT
Published
Oct 2023
Notable for
Benchmark for evaluating code editing, debugging and tool calling in the code domain.

Cite

Notes

Only stored in your browser.

Attribution

Leaderboard scores
SWE-bench
Attribution policy →

Top score 58.3% by Claude 4 Sonnet - 11 models reporting (10 frontier)

Score history

10
0%25%50%75%100%Mar 23Sep 23Mar 24Sep 24Mar 25GPT-4GPT-4o (2024-05-13)GPT-4o (2024-08-06)Claude Sonnet 3.7Claude 4 Sonnet

Top models

11
SWE-bench LiteBar chart with 11 bars. Highest value: Claude 4 Sonnet at 58.3.
11 models

Where it's ranked

1

Related tools

4
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

Papers

1

FAQ

What is SWE-bench Lite?
300-issue subset of SWE-bench focused on functional bug fixes that are easier to evaluate - used for fast iteration before full SWE-bench runs.
What capabilities does SWE-bench Lite test?
SWE-bench Lite evaluates code editing, debugging, tool calling.
What is the current top score on SWE-bench Lite?
The top reported score is 58.3% by Claude 4 Sonnet, across 11 models reporting (10 from frontier labs).
How can a model improve its SWE-bench Lite score?
Tools linked to SWE-bench Lite on Sophon include SWE-Gym, Agent Bench RL Env (Prime Community), Deepswe RL Env (Prime Intellect), Agent PLUS RL Env (Prime Intellect) - RL environments, datasets, and scaffolds that target this eval.
What license is SWE-bench Lite under?
SWE-bench Lite is available under MIT.