0

BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents

OpenAI benchmark of 1,266 web-research questions that require persistent, creative browsing to find a single short verifiable answer.

Publisher
OpenAI
Year
2025
Venue
preprint
Authors
10
Hosting
External sourcelicense unknown

Cite

Notes

Only stored in your browser.

Introduces 1 artifact - 1 eval

TL;DR

Semantic Scholar

While BrowseComp sidesteps challenges of a true user query distribution, like generating long answers or resolving ambiguity, it measures the important core capability of exercising persistence and creativity in finding information.

Artifacts

1

Authors

10