0

SWE-bench Multilingual

Frontier

Cross-language extension of SWE-bench Verified - real GitHub issues across multiple programming languages.

Open
Published
May 2026

Cite

Notes

Only stored in your browser.

Attribution

Leaderboard scores
SWE-bench
Attribution policy →

Top score 72.7% by Gemini 3 Flash - 11 models reporting (7 frontier)

Score history

11
30%48%65%83%100%Aug 25Oct 25Dec 25Feb 26GPT-5 MiniClaude Sonnet 4.5Gemini 3 ProClaude Opus 4.5Gemini 3 Flash

Top models

11
SWE-bench MultilingualBar chart with 11 bars. Highest value: Gemini 3 Flash at 72.7.
11 models

Related tools

2
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

FAQ

What is SWE-bench Multilingual?
Cross-language extension of SWE-bench Verified - real GitHub issues across multiple programming languages.
What is the current top score on SWE-bench Multilingual?
The top reported score is 72.7% by Gemini 3 Flash, across 11 models reporting (7 from frontier labs).
How can a model improve its SWE-bench Multilingual score?
Tools linked to SWE-bench Multilingual on Sophon include Agent Bench RL Env (Prime Community), SWE RL Env (Prime Intellect) - RL environments, datasets, and scaffolds that target this eval.