SWE-bench Multimodal
Frontier
Multimodal extension of SWE-bench - software-engineering tasks with visual context (screenshots, diagrams).
- Publisher
- Princeton University
- Published
- May 2026
- Canonical
- swebench.com/multimodal.html
Cite
Notes
Only stored in your browser.
Top score 36.0% by o3 - 7 models reporting (7 frontier)
Score history
6Top models
7Related tools
1Implementations, trainers, datasets and scaffolds linked to this eval.
FAQ
- What is SWE-bench Multimodal?
- Multimodal extension of SWE-bench - software-engineering tasks with visual context (screenshots, diagrams).
- What is the current top score on SWE-bench Multimodal?
- The top reported score is 36.0% by o3, across 7 models reporting (7 from frontier labs).
- How can a model improve its SWE-bench Multimodal score?
- Tools linked to SWE-bench Multimodal on Sophon include Agent Bench RL Env (Prime Community) - RL environments, datasets, and scaffolds that target this eval.