0

MiniWoB++

100+ small synthetic web-page tasks (click button, fill form, drag slider) - the original web-agent benchmark, still used as a unit test.

Domain
agentic
Format
Custom
Size
125 tasks
License
MIT
Published
Feb 2018
Notable for
Benchmark for evaluating browser use and tool calling in the agentic domain.

Cite

Notes

Only stored in your browser.

Attribution

Leaderboard scores
prime-hub
Attribution policy →

Top score 80.0% by gpt-oss-120b - 1 model reporting (1 frontier)

Top models

1
MiniWoB++Bar chart with 1 bar. Highest value: gpt-oss-120b at 80.
1 model

Related tools

6
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

FAQ

What is MiniWoB++?
100+ small synthetic web-page tasks (click button, fill form, drag slider) - the original web-agent benchmark, still used as a unit test.
What capabilities does MiniWoB++ test?
MiniWoB++ evaluates browser use, tool calling.
What is the current top score on MiniWoB++?
The top reported score is 80.0% by gpt-oss-120b, across 1 model reporting (1 from frontier labs).
How can a model improve its MiniWoB++ score?
Tools linked to MiniWoB++ on Sophon include Browser Miniwob RL Env (Community), Browser Miniwob RL Env (Community), BrowserGym, Openenv Browsergym RL Env (Hugging Face) - RL environments, datasets, and scaffolds that target this eval.
What license is MiniWoB++ under?
MiniWoB++ is available under MIT.