BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents

Active

A benchmark for evaluating agents' ability to browse the web. The dataset consists of challenging questions that generally require web-access to answer correctly.

Open

Publisher: OpenAI
Domain: Assistants
License: mit
Published: Jun 2025
Notable for: Benchmark for evaluating Assistants.
Canonical: github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/browse_comp

Cite

Notes

Only stored in your browser.

Attribution

README: github.com/UKGovernmentBEIS/inspect_evals/blob/main/src/inspect_evals/browse_comp/README.mdMIT

Attribution policy →

Related tools

View all

Implementations, trainers, datasets and scaffolds linked to this eval.

Browsecomp RL Env (Prime Intellect)

Prime Intellect

BrowseComp evaluation environment

ImplementationRL EnvWeb SearchTool UseWeb

BB Browsecomp RL Env (Prime Intellect)

Prime Intellect

BrowserEnv[DOM] Environment on Browsecomp

ImplementationRL Env

Browsecomp Openai RL Env (Community)

Tool-use environment for the model to browse the web and locate hard-to-find information; scored using an LLM-as-judge rubric

Trains towardRL EnvWeb SearchTool UseWeb

Browsecomp Openai RL Env (Prime Intellect)

Prime Intellect

Tool-use environment for the model to browse the web and locate hard-to-find information; scored using an LLM-as-judge rubric

Trains towardRL EnvWeb SearchTool UseWeb

DDBC RL Env (Prime Intellect)

Prime Intellect

browsecomp with deepdive tools

Trains towardRL EnvSearchQA

DDBC RLM RL Env (Prime Intellect)

Prime Intellect

BrowseComp with DeepDive tools using RLM

Trains towardRL EnvSearchQA

Papers

BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents

preprint · 2025

OpenAI benchmark of 1,266 web-research questions that require persistent, creative browsing to find a single short verifiable answer.

FAQ

What is BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents?: A benchmark for evaluating agents' ability to browse the web. The dataset consists of challenging questions that generally require web-access to answer correctly.
How can a model improve its BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents score?: Tools linked to BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents on Sophon include Browsecomp RL Env (Prime Intellect), BB Browsecomp RL Env (Prime Intellect), Browsecomp Openai RL Env (Community), Browsecomp Openai RL Env (Prime Intellect) - RL environments, datasets, and scaffolds that target this eval.
What license is BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents under?: BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents is available under mit.