SWE Bench Multilingual
Fresh
A benchmark of 300 software engineering tasks across 42 repositories and 9 programming languages: C, C++, Go, Java, JavaScript, TypeScript, PHP, Ruby, and Rust. Each instance is derived from a real GitHub pull request, following the same format and evaluation protocol as SWE-b…
- Type
- RL Env
- Publisher
- General Reasoning
- Capabilities
- Code Generation
- Runtime
ORS- License
- unknown
- Size
- 0 tasks
- Published
- Jan 2026
Cite
Notes
Only stored in your browser.
Public scores on this env
88 vf-eval reports across 8 models
1Claude Mythos PreviewAnthropic87.32DeepSeek V4 Pro MaxDeepSeek76.23MiniMax M2.5Minimax74.14Qwen3.6 PlusAlibaba73.85GLM 5Zai73.36Kimi K2.5Moonshot AI737Gemini 3.1 ProGoogle (Alphabet Inc.)728Qwen 3 Coder NextAlibaba64.3
Open the scoring view →