0

VisualWebArena

Active

910 visually grounded web tasks across three self-hosted sites (Classifieds, Shopping, Reddit) requiring image understanding to complete.

Domain
agentic
Format
Web Arena
Size
910 tasks
License
Apache-2.0
Published
Jan 2024
Notable for
Benchmark for evaluating browser use, image understanding and planning in the agentic domain.
Canonical
jykoh.com/vwa

Cite

Notes

Only stored in your browser.

Related tools

1
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

Papers

2

Contributors

2

FAQ

What is VisualWebArena?
910 visually grounded web tasks across three self-hosted sites (Classifieds, Shopping, Reddit) requiring image understanding to complete.
What capabilities does VisualWebArena test?
VisualWebArena evaluates browser use, image understanding, planning.
How can a model improve its VisualWebArena score?
Tools linked to VisualWebArena on Sophon include BrowserGym - RL environments, datasets, and scaffolds that target this eval.
What license is VisualWebArena under?
VisualWebArena is available under Apache-2.0.