0

BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents

Active

A benchmark for evaluating agents' ability to browse the web. The dataset consists of challenging questions that generally require web-access to answer correctly.

Publisher
OpenAI
Domain
Assistants
License
mit
Published
Jun 2025
Notable for
Benchmark for evaluating Assistants.

Cite

Notes

Only stored in your browser.

Related tools

6
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

Papers

1

FAQ

What is BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents?
A benchmark for evaluating agents' ability to browse the web. The dataset consists of challenging questions that generally require web-access to answer correctly.
How can a model improve its BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents score?
Tools linked to BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents on Sophon include Browsecomp RL Env (Prime Intellect), BB Browsecomp RL Env (Prime Intellect), Browsecomp Openai RL Env (Community), Browsecomp Openai RL Env (Prime Intellect) - RL environments, datasets, and scaffolds that target this eval.
What license is BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents under?
BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents is available under mit.