Question 1

What is SWE-bench Multimodal?

Accepted Answer

Multimodal extension of SWE-bench - software-engineering tasks with visual context (screenshots, diagrams).

Question 2

What is the current top score on SWE-bench Multimodal?

Accepted Answer

The top reported score is 36.0% by o3, across 7 models reporting (7 from frontier labs).

Question 3

How can a model improve its SWE-bench Multimodal score?

Accepted Answer

Tools linked to SWE-bench Multimodal on Sophon include Agent Bench RL Env (Prime Community) - RL environments, datasets, and scaffolds that target this eval.

SWE-bench Multimodal

Score history

Top models

Related tools

Agent Bench RL Env (Prime Community)

FAQ