Browsecomp Plus
Verifiers environment for BrowseComp-Plus Deep-Research Agent Benchmark. Controlled agent/retriever evaluation on the fixed human-verified corpus.
- Domain
- rl-env
- License
- apache-2.0
- Published
- Oct 2025
Cite
Notes
Only stored in your browser.
Top score 1.11 by GPT-4.1 Mini - 3 models reporting (1 frontier)
Score history
3Top models
3Related tools
1Implementations, trainers, datasets and scaffolds linked to this eval.
FAQ
- What is Browsecomp Plus?
- Verifiers environment for BrowseComp-Plus Deep-Research Agent Benchmark. Controlled agent/retriever evaluation on the fixed human-verified corpus.
- What is the current top score on Browsecomp Plus?
- The top reported score is 1.11 by GPT-4.1 Mini, across 3 models reporting (1 from frontier labs).
- How can a model improve its Browsecomp Plus score?
- Tools linked to Browsecomp Plus on Sophon include Browsecomp PLUS RL Env (Prime Intellect) - RL environments, datasets, and scaffolds that target this eval.
- What license is Browsecomp Plus under?
- Browsecomp Plus is available under apache-2.0.
