0

Mini Swe Agent Bench

Frontier

Benchmarking model performance on SWE Bench in the Mini SWE Agent harness.

Domain
rl-env
License
unknown
Published
Mar 2026

Cite

Notes

Only stored in your browser.

Attribution

Leaderboard scores
prime-hub
Attribution policy →

Top score 93.3% by GPT-5 - 3 models reporting (3 frontier)

Score history

3
75%81%88%94%100%May 25Jun 25Jul 25Aug 25Claude 4 SonnetGPT-5

Top models

3
Mini Swe Agent BenchBar chart with 3 bars. Highest value: GPT-5 at 93.3.
3 models

Related tools

1
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

FAQ

What is Mini Swe Agent Bench?
Benchmarking model performance on SWE Bench in the Mini SWE Agent harness.
What is the current top score on Mini Swe Agent Bench?
The top reported score is 93.3% by GPT-5, across 3 models reporting (3 from frontier labs).
How can a model improve its Mini Swe Agent Bench score?
Tools linked to Mini Swe Agent Bench on Sophon include Agent Bench RL Env (Prime Community) - RL environments, datasets, and scaffolds that target this eval.
What license is Mini Swe Agent Bench under?
Mini Swe Agent Bench is available under unknown.