0

ANTI BOT RL Env (Browserbase)

Fresh

WebVoyager browser benchmark with filtered dataset (600 tasks from sites without anti-bot protection)

Type
RL Env
Publisher
Browserbase
License
unknown
Size
v0.1.4
Published
Feb 2026

Cite

Notes

Only stored in your browser.

WebVoyager Browser Benchmark (No Anti-Bot)

A browser benchmark environment for evaluating LLM agents on WebVoyager web navigation tasks using Browserbase.

This version uses a filtered dataset that excludes websites with anti-bot protection for more reliable evaluation.

WebVoyager contains tasks across multiple real-world websites. Tasks are evaluated based on successful completion rather than explicit ground-truth answers. This environment judges against the recorded browser interaction transcript, not just the agent's final free-text claim.

Dataset

  • Total tasks: 600 tasks (93.3% of original 643 tasks)
  • Websites: Allrecipes, Amazon, Apple, ArXiv, BBC News, Booking, Coursera, ESPN, GitHub, Google Flights, Google Map, Google Search, Hugging Face, Wolfram Alpha
  • Excluded sites: dictionary.cambridge.org (Cloudflare protection)
  • Removed tasks: 43 tasks from 1 site with anti-bot detection
  • Task format: Web navigation tasks
  • Evaluation: Task completion judging via LLM over the browser interaction transcript

Installation

First, install the browser extras for verifiers:

uv pip install -e ".[browser]"

Then install the webvoyager-no-anti-bot environment locally:

uv pip install -e ./environments/webvoyager_no_anti_bot

Or install from Prime hub:

prime env install browserbase/webvoyager-no-anti-bot

Usage

Quick Start

# Run WebVoyager benchmark with OpenAI (clean dataset)
prime eval run webvoyager-no-anti-bot -m gpt-4.1-mini -b https://api.openai.com/v1 -k OPENAI_API_KEY

Configuration

Set your Browserbase credentials:

export BROWSERBASE_API_KEY="your-api-key"
# Optional: export BROWSERBASE_PROJECT_ID="your-project-id"

For DOM mode (default), you'll also need:

export OPENAI_API_KEY="your-openai-key"  # For agent model and judge
export MODEL_API_KEY="your-openai-key"   # For Stagehand browser operations

Website Filtering

WebVoyager includes tasks across many websites. You can filter by website:

# Run all tasks (clean dataset)
prime eval run webvoyager-no-anti-bot -m gpt-4.1-mini -b https://api.openai.com/v1 -k OPENAI_API_KEY

# Run only Amazon tasks
prime eval run webvoyager-no-anti-bot -m gpt-4.1-mini -b https://api.openai.com/v1 -k OPENAI_API_KEY -a '{"web_filter": "Amazon"}'

# Run only Allrecipes tasks
prime eval run webvoyager-no-anti-bot -m gpt-4.1-mini -b https://api.openai.com/v1 -k OPENAI_API_KEY -a '{"web_filter": "Allrecipes"}'

Browser Modes

DOM Mode (default): Uses Stagehand SDK for natural language browser control.

prime eval run webvoyager-no-anti-bot -m gpt-4.1-mini -b https://api.openai.com/v1 -k OPENAI_API_KEY

CUA Mode: Uses vision-based primitives via a CUA server.

prime eval run webvoyager-no-anti-bot -m gpt-4.1-mini -b https://api.openai.com/v1 -k OPENAI_API_KEY -a '{"mode": "cua", "server_url": "http://localhost:3000"}'

Environment Arguments

ArgumentDefaultDescription
mode"dom"Browser control mode ("dom" or "cua")
max_turns15Maximum conversation turns (recommended: 50 for complex tasks)
judge_model"gpt-4o-mini"Model for task completion judging
num_examples-1Number of examples (-1 for all)
web_filterNoneFilter by website name

Requirements

  • Python >= 3.10
  • Browserbase account with API credentials
  • OpenAI API key (for agent model, judge, and Stagehand)