Browser DOM Mode Example
A simple example environment demonstrating DOM mode browser automation using Browserbase and Stagehand.
DOM mode uses the Stagehand SDK to translate natural language commands into browser actions.
How DOM Mode Works
DOM mode provides these natural language operations:
- act: Perform actions like clicking buttons, filling forms
- observe: Get information about visible elements
- extract: Extract structured data from the page
- navigate: Go to URLs
Stagehand uses an LLM (configured via stagehand_model) to understand the page DOM and execute the appropriate browser actions.
Installation
# Install browser extras
uv pip install -e ".[browser]"
# Install this example environment
uv pip install -e ./environments/browser_dom_example
Configuration
Required Environment Variables
# Browserbase credentials
export BROWSERBASE_API_KEY="your-api-key"
export BROWSERBASE_PROJECT_ID="your-project-id"
# API keys for models
export OPENAI_API_KEY="your-openai-key" # For agent model
export MODEL_API_KEY="your-openai-key" # For Stagehand (can be same as OPENAI_API_KEY)
Why MODEL_API_KEY?
Stagehand needs its own LLM to understand the DOM and translate natural language to actions. The MODEL_API_KEY environment variable provides the API key for this internal Stagehand model.
Usage
prime eval run browser-dom-example -m gpt-4.1-mini -b https://api.openai.com/v1 -k OPENAI_API_KEY
Environment Arguments
| Argument | Default | Description |
|---|---|---|
max_turns | 10 | Maximum conversation turns |
judge_model | "gpt-4o-mini" | Model for task completion judging |
stagehand_model | "openai/gpt-4o-mini" | Model for Stagehand DOM operations |
Example Task
The smoke test navigates to the Prime Intellect homepage and asks the agent to read the headline. The agent uses DOM mode operations to:
- Navigate to the page
- Observe visible text
- Extract the headline content
- Report the answer
Requirements
- Python >= 3.10
- Browserbase account with API credentials
- OpenAI API key (for agent and Stagehand)