BrowseComp
Frontier
1,266 hard fact-finding questions on the open web requiring persistent browsing and reasoning over scattered, obscure sources.
- Publisher
- OpenAI
- Capabilities
- Browser UseRetrievalPlanning
- Domain
- agentic
- Format
- Custom
- Size
- 1266 tasks
- License
- MIT
- Published
- Apr 2025
- Updates
- Monthly
- Notable for
- The canonical benchmark for "deep research" / browsing agents — where GPT-4o scored near 0% at launch but GPT-5.5 Pro now exceeds 90%.
- Canonical
- openai.com/index/browsecomp
- Official leaderboard
- openai.com/index/browsecomp
Cite
Notes
Only stored in your browser.
Top score 20.0% by GPT-5 Mini - 5 models reporting (5 frontier)
Score history
5Top models
5Where it's ranked
1Related tools
7Implementations, trainers, datasets and scaffolds linked to this eval.
Papers
2Contributors
1FAQ
- What is BrowseComp?
- 1,266 hard fact-finding questions on the open web requiring persistent browsing and reasoning over scattered, obscure sources.
- What capabilities does BrowseComp test?
- BrowseComp evaluates browser use, retrieval, planning.
- What is the current top score on BrowseComp?
- The top reported score is 20.0% by GPT-5 Mini, across 5 models reporting (5 from frontier labs).
- How can a model improve its BrowseComp score?
- Tools linked to BrowseComp on Sophon include BB Browsecomp RL Env (Prime Intellect), Browsecomp RL Env (Prime Intellect), DeepDive (Serper-powered web QA), Browsecomp Openai RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.
- What license is BrowseComp under?
- BrowseComp is available under MIT.