0

Harbor ENV RL Env (Prime Intellect)

Fresh

Minimal Harbor environment for testing the CLI agent interception framework

Type
RL Env
Runtime
single-turn
License
unknown
Size
v0.1.0
Published
Dec 2025

Cite

Notes

Only stored in your browser.

dummy-harbor-env

Source Code

Overview

  • Environment ID: dummy-harbor-env
  • Short description: Minimal Harbor environment for testing the CLI agent interception framework
  • Tags: dummy, testing, cli-agent, harbor

Datasets

  • Primary dataset: Harbor-format tasks in tasks/ directory
  • Source: Bundled with environment
  • Tasks: 1 dummy task (hello-world)

Task

  • Type: single-turn (via HarborEnv)
  • Base class: HarborEnv (extends CliAgentEnv)
  • Rubric overview:
    • Reward computed by tests/test.sh which runs pytest on test_state.py
    • Returns 1.0 if /app/hello.txt contains "Hello, world!", 0.0 otherwise

Quickstart

Run an evaluation with default settings:

prime eval run dummy-harbor-env

Configure model and sampling:

prime eval run dummy-harbor-env -m gpt-4.1-mini -n 1 -r 1

How It Works

This environment demonstrates the HarborEnv/CliAgentEnv data flow:

  1. Harbor Task Loading: Task is loaded from tasks/hello-world/ with task.toml, instruction.md, and tests/
  2. Sandbox Creation: A Docker sandbox is created with the task instruction uploaded to /task/
  3. Agent Execution: A Python script reads the instruction and makes an OpenAI API call
  4. Interception: The API call is intercepted by CliAgentEnv's HTTP proxy server (via Cloudflare tunnel)
  5. LLM Response: The LLM returns a bash command to complete the task
  6. Execution: The agent executes the command in /app
  7. Testing: Harbor's tests/test.sh runs pytest to verify the result

Agent Script Details

The embedded agent script:

  • Reads task instruction from /task/instruction.md
  • Asks the LLM for a bash command to complete the task
  • Executes the returned command in /app

For the hello-world task, the LLM should respond with something like:

echo "Hello, world!" > hello.txt

Environment Arguments

ArgumentTypeDefaultDescription
dataset_pathstr | Path./tasksPath to Harbor-format tasks directory
taskslist[str] | NoneNoneSpecific task names to load (None = all)
agent_workdirstr/appWorking directory for agent in sandbox
docker_imagestrpython:3.11-slimDocker image for sandbox
timeout_secondsfloat300.0Overall rollout timeout
max_turnsint-1Max turns (-1 = unlimited)

Metrics

MetricMeaning
reward1.0 if pytest passes (hello.txt correct), 0.0 otherwise