BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents
OpenAI benchmark of 1,266 web-research questions that require persistent, creative browsing to find a single short verifiable answer.
- Publisher
- OpenAI
- Year
- 2025
- Venue
- preprint
- Authors
- 10
- Hosting
- External sourcelicense unknown
Cite
Notes
Only stored in your browser.
Introduces 1 artifact - 1 eval
TL;DR
Semantic Scholar
While BrowseComp sidesteps challenges of a true user query distribution, like generating long answers or resolving ambiguity, it measures the important core capability of exercising persistence and creativity in finding information.
Artifacts
1Evals