VisualWebArena
Active
910 visually grounded web tasks across three self-hosted sites (Classifieds, Shopping, Reddit) requiring image understanding to complete.
- Publisher
- Carnegie Mellon University
- Capabilities
- Browser UseImage UnderstandingPlanning
- Domain
- agentic
- Format
- Web Arena
- Size
- 910 tasks
- License
- Apache-2.0
- Published
- Jan 2024
- Notable for
- Benchmark for evaluating browser use, image understanding and planning in the agentic domain.
- Canonical
- jykoh.com/vwa
Cite
Notes
Only stored in your browser.
Related tools
1Implementations, trainers, datasets and scaffolds linked to this eval.
Papers
2Contributors
2FAQ
- What is VisualWebArena?
- 910 visually grounded web tasks across three self-hosted sites (Classifieds, Shopping, Reddit) requiring image understanding to complete.
- What capabilities does VisualWebArena test?
- VisualWebArena evaluates browser use, image understanding, planning.
- How can a model improve its VisualWebArena score?
- Tools linked to VisualWebArena on Sophon include BrowserGym - RL environments, datasets, and scaffolds that target this eval.
- What license is VisualWebArena under?
- VisualWebArena is available under Apache-2.0.