0

CODE ENV RL Env (Prime Intellect)

Fresh

Single-turn code training environment

Type
RL Env
Tags
Coding
Runtime
single-turn
License
unknown
Size
v0.3.2
Published
Dec 2025

Cite

Notes

Only stored in your browser.

code-env

Source Code

Overview

  • Environment ID: code-env
  • Short description: Code training environment
  • Tags: single-turn, coding, sandbox

Datasets

  • Primary dataset(s): The code subset of PrimeIntellect/INTELLECT-3-RL
  • Source links: PrimeIntellect/INTELLECT-3-RL
  • Split sizes: 22k train examples (pre-filtering)

Task

  • Type: single-turn
  • Parser: StrictMaybeThinkParser with code extraction
  • Rubric overview: CodingRubric with passed, pass_rate, num_test_cases, and has_error metrics

Quickstart

Create an API key for Prime Intellect sandboxes at https://app.primeintellect.ai/dashboard/tokens

Install Prime Intellect CLI:

uv tool install prime

Set your API key in Prime Intellect CLI:

prime config set-api-key <your-api-key>

Run an evaluation with default settings:

prime eval run code-env

Docker Image

For production use, build and deploy a custom Docker image with pre-installed dependencies:

cd environments/code_env
export GCP_PROJECT=your-project REGION=us-central1 REPO_NAME=your-repo
./scripts/build_and_push.sh

Environment Arguments

ArgTypeDefaultDescription
dataset_namestr"PrimeIntellect/INTELLECT-3-RL"HuggingFace dataset name to load
dataset_subsetstr"code"Dataset subset to use
dataset_splitstr"train"Dataset split to use ("train" or "test")
dataset_shuffleboolFalseWhether to shuffle the dataset after loading
dataset_num_procint1Number of processes to use for dataset mapping operations
difficulty_keystr"avg@8_qwen3_4b_instruct_2507"The key to use for the difficulty filter
min_solve_ratefloat0.0Minimum solve rate to include problem
max_solve_ratefloat1.0Maximum solve rate to include problem
timeout_per_testint10Maximum execution time (in seconds) for each test case
max_num_testsint15Maximum number of test cases per problem
skip_firstint0Skip first N examples in dataset
docker_imagestr | NoneNoneDocker image to use for sandboxes (defaults to DEFAULT_DOCKER_IMAGE env var or us-central1-docker.pkg.dev/prime-intellect-platform/prod-sandbox/i3-code:latest)
instruction_promptstrDEFAULT_INSTRUCTION_PROMPTThe prompt to use for the instruction
random_seedint | None42Random seed to use for dataset shuffling and test case sampling
timeout_minutesint360Maximum execution time (in minutes) for each sandbox

Metrics

MetricMeaning
passedWhether the answer passed all test cases
pass_rateThe fraction of test cases that passed
num_test_casesThe number of test cases
has_errorWhether the answer caused an error in the sandbox

The main reward metric is identical to passed.

Changelog

v0.3.1

  • Default sandbox_client_max_workers to None so the shared sandbox client uses the verifiers default worker cap unless callers explicitly override it.

v0.3.0 (Apr 17, 2026)

  • Replace custom SandboxPool with shared SandboxMixin from verifiers
  • Remove pool_size parameter (sandbox lifecycle now managed per-rollout)
  • Bump prime-sandboxes>=0.2.19, verifiers>=0.1.12.dev6

v0.1.0

  • Copy from single-turn-code