Browser CUA Mode Example
A simple example environment demonstrating CUA (Computer Use Agent) mode browser automation using Browserbase.
CUA mode uses vision-based primitives to control the browser through screenshots, similar to how a human would interact with a screen.
How CUA Mode Works
CUA mode provides low-level vision-based operations:
- click(x, y): Click at screen coordinates
- type_text(text): Type text into focused element
- scroll(direction): Scroll the page
- screenshot(): Capture current screen state
- navigate(url): Go to a URL
The agent sees screenshots and decides which actions to take based on visual understanding.
Installation
# Install browser extras
uv pip install -e ".[browser]"
# Install this example environment
uv pip install -e ./environments/browser_cua_example
Configuration
Required Environment Variables
# Browserbase credentials
export BROWSERBASE_API_KEY="your-api-key"
# Optional: export BROWSERBASE_PROJECT_ID="your-project-id"
# API key for agent model
export OPENAI_API_KEY="your-openai-key"
Note: When running in manual server mode, ensure OPENAI_API_KEY is set in the terminal where the CUA server runs (Stagehand requires it internally).
Usage
Quick Test Commands
# Default - pre-built image (fastest)
prime eval run browser-cua-example -m openai/gpt-4o-mini
# Binary upload (custom server)
prime eval run browser-cua-example -m openai/gpt-4o-mini -a '{"use_prebuilt_image": false}'
# Local development
prime eval run browser-cua-example -m openai/gpt-4o-mini -a '{"use_sandbox": false}'
Pre-built Docker Image (Default, Fastest)
By default, CUA mode uses a pre-built Docker image (deepdream19/cua-server:latest) for fastest startup. The image includes the CUA server binary and all dependencies pre-installed:
prime eval run browser-cua-example -m openai/gpt-4.1-mini -b https://api.openai.com/v1 -k OPENAI_API_KEY
This is the recommended approach for production use. Startup is ~5-10 seconds compared to ~30-60 seconds with binary upload.
Binary Upload Mode (Custom Server)
If you need to use a custom version of the CUA server, disable the prebuilt image to build and upload the binary at runtime:
prime eval run browser-cua-example -m openai/gpt-4.1-mini -b https://api.openai.com/v1 -k OPENAI_API_KEY -a '{"use_prebuilt_image": false}'
This mode:
- Builds the CUA server binary via Docker (first run only)
- Uploads the binary to a sandbox container
- Installs dependencies (curl) in the sandbox
- Starts the server
Manual Server Mode (Local Development)
For local development, you can run the CUA server manually:
-
Start the CUA server (in a separate terminal):
cd assets/templates/browserbase/cua export OPENAI_API_KEY="your-openai-key" pnpm devThe server runs on
http://localhost:3000by default. -
Run the evaluation with sandbox disabled:
prime eval run browser-cua-example -m openai/gpt-4.1-mini -b https://api.openai.com/v1 -k OPENAI_API_KEY -a '{"use_sandbox": false}'
Custom Server URL
If running the CUA server on a different port:
prime eval run browser-cua-example -m openai/gpt-4.1-mini -b https://api.openai.com/v1 -k OPENAI_API_KEY -a '{"use_sandbox": false, "server_url": "http://localhost:8080"}'
Environment Arguments
| Argument | Default | Description |
|---|---|---|
max_turns | 15 | Maximum conversation turns (recommended: 50 for complex tasks) |
judge_model | "gpt-4o-mini" | Model for task completion judging |
use_sandbox | True | Auto-deploy CUA server to sandbox |
use_prebuilt_image | True | Use pre-built Docker image (fastest startup) |
prebuilt_image | "deepdream19/cua-server:latest" | Docker image to use when use_prebuilt_image=True |
server_url | "http://localhost:3000" | CUA server URL (only used when use_sandbox=False) |
viewport_width | 1024 | Browser viewport width |
viewport_height | 768 | Browser viewport height |
save_screenshots | False | Save screenshots during execution |
Execution Modes Summary
| Mode | Flag | Startup Time | Use Case |
|---|---|---|---|
| Pre-built image (default) | None | ~5-10s | Production, fastest startup |
| Binary upload | use_prebuilt_image=false | ~30-60s | Custom server version |
| Manual server | use_sandbox=false | Instant | Local development |
Building a Custom Docker Image
To build and push a custom CUA server image:
cd assets/templates/browserbase/cua
./build-and-push.sh bb-project-id-optional-20260326
DOCKERHUB_USER=myuser ./build-and-push.sh bb-project-id-optional-20260326
DOCKERHUB_USER=myuser PUSH_LATEST=true ./build-and-push.sh bb-project-id-optional-20260326
Then use your custom image:
prime eval run browser-cua-example -m openai/gpt-4.1-mini -a '{"prebuilt_image": "myuser/cua-server:bb-project-id-optional-20260326"}'
Use the versioned tag first for validation. Only set PUSH_LATEST=true once you want latest to move as well.
DOM vs CUA Mode Comparison
| Aspect | DOM Mode | CUA Mode |
|---|---|---|
| Control | Natural language via Stagehand | Vision-based coordinates |
| Server | None required | CUA server (auto-deployed) |
| MODEL_API_KEY | Required (for Stagehand) | Not required |
| Best for | Structured web interactions | Visual/complex UIs |
| Speed | Faster (direct DOM) | Slower (screenshots) |
Requirements
- Python >= 3.10
- Browserbase account with API credentials
- OpenAI API key