WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks?
ServiceNow + MILA benchmark of 33 enterprise knowledge-work tasks (forms, dashboards, service catalogs) on a real ServiceNow instance.
- Publisher
- ServiceNow Research
- Year
- 2024
- Venue
- ICML
- Authors
- 12
- Hosting
- External sourcelicense unknown
Cite
Notes
Only stored in your browser.
Introduces 3 artifacts - 1 eval, 2 tools
TL;DR
Semantic Scholar
This work proposes WorkArena, a remote-hosted benchmark of 33 tasks based on the widely-used ServiceNow platform and introduces BrowserGym, an environment for the design and evaluation of such agents, offering a rich set of actions as well as multimodal observations.