browser use
- Slug
browser-use- Evals
- 7
- Tools
- 14
- Models
- 6
- Papers
- 5
Evals testing this capability
7Tools lifting evals here
14Top models on this capability
6by avg parsed score across evals here
Papers in this area
5introducesBrowseComp: A Simple Yet Challenging Benchmark for Browsing AgentsintroducesGAIA: A Benchmark for General AI AssistantsintroducesVisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web TasksintroducesWebArena: A Realistic Web Environment for Building Autonomous AgentsintroducesWorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks?