MiniWoB++
100+ small synthetic web-page tasks (click button, fill form, drag slider) - the original web-agent benchmark, still used as a unit test.
- Publisher
- University of California, Berkeley
- Capabilities
- Browser UseTool Calling
- Domain
- agentic
- Format
- Custom
- Size
- 125 tasks
- License
- MIT
- Published
- Feb 2018
- Notable for
- Benchmark for evaluating browser use and tool calling in the agentic domain.
- Canonical
- miniwob.farama.org
Cite
Notes
Only stored in your browser.
Top score 80.0% by gpt-oss-120b - 1 model reporting (1 frontier)
Top models
1Related tools
6Implementations, trainers, datasets and scaffolds linked to this eval.
FAQ
- What is MiniWoB++?
- 100+ small synthetic web-page tasks (click button, fill form, drag slider) - the original web-agent benchmark, still used as a unit test.
- What capabilities does MiniWoB++ test?
- MiniWoB++ evaluates browser use, tool calling.
- What is the current top score on MiniWoB++?
- The top reported score is 80.0% by gpt-oss-120b, across 1 model reporting (1 from frontier labs).
- How can a model improve its MiniWoB++ score?
- Tools linked to MiniWoB++ on Sophon include Browser Miniwob RL Env (Community), Browser Miniwob RL Env (Community), BrowserGym, Openenv Browsergym RL Env (Hugging Face) - RL environments, datasets, and scaffolds that target this eval.
- What license is MiniWoB++ under?
- MiniWoB++ is available under MIT.