dojo
Overview
- Environment ID:
chakra-labs/dojo - Short description: Multi-turn agent evaluation using Dojo infrastructure for task execution
- Tags: multi-turn, tool-use, benchmark, agent-evaluation, multimodal, dojo, web
Datasets
- Primary dataset(s):
dojo-mini-bench- Collection of multi-turn tasks including LinkedIn, Linear, Gmail - Source links: Dojo Documentation
Task
- Type: multi-turn, tool use
- Parser: OpenAI-compatible tool calling format
- Rubric overview: Task-specific verification logic
Quickstart
Run an evaluation with default settings:
DOJO_API_KEY="your_key" uv run vf-eval dojo
Configure model and sampling:
DOJO_API_KEY="your_key" uv run vf-eval dojo -m gpt-4.1-mini -n 10 -r 1
If you want to run with browserbase
DOJO_API_KEY="your_key" BROWSERBASE_PROJECT_ID="project_id" BROWSERBASE_API_KEY="your_browserbase_key" DOJO_ENGINE=browserbase BROWSERBASE_CONCURRENT_LIMIT=1 uv run vf-eval dojo -m gpt-4.1-mini -n 10 -r 1
Notes:
- Use
-a/--env-argsto pass environment-specific configuration as a JSON object.
Metrics
Task-specific verification that returns a fraction between 0.0 and 1.0. Failure means 0.0, partial sucess is <= 1 and sucess is 1.0
| Metric | Meaning |
|---|---|
reward | Score between 0.0 and 1.0 |
For more information, see the Dojo Verifiers Integration Documentation.