BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents
Active
A benchmark for evaluating agents' ability to browse the web. The dataset consists of challenging questions that generally require web-access to answer correctly.
- Publisher
- OpenAI
- Domain
- Assistants
- License
- mit
- Published
- Jun 2025
- Notable for
- Benchmark for evaluating Assistants.
Cite
Notes
Only stored in your browser.
Related tools
6Implementations, trainers, datasets and scaffolds linked to this eval.
Papers
1FAQ
- What is BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents?
- A benchmark for evaluating agents' ability to browse the web. The dataset consists of challenging questions that generally require web-access to answer correctly.
- How can a model improve its BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents score?
- Tools linked to BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents on Sophon include Browsecomp RL Env (Prime Intellect), BB Browsecomp RL Env (Prime Intellect), Browsecomp Openai RL Env (Community), Browsecomp Openai RL Env (Prime Intellect) - RL environments, datasets, and scaffolds that target this eval.
- What license is BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents under?
- BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents is available under mit.