0

Prime GREP RL Env (Prime)

Fresh

Cross-repo code-search tasks over prime-rl, verifiers, vllm, pytorch

Type
RL Env
Publisher
Prime
License
unknown
Size
v0.4.7
Published
May 2026

Cite

Notes

Only stored in your browser.

prime-grep

Cross-repo code-search environment for the prime-swe-grep task set. Each task is a user-style question; the agent uses grep / list_files / read_file to search four pinned repos in a sandbox, then calls submit_spans with the spans that justify the answer.

Repos (pinned)

RepoCommit
prime-rl65919439195fb384eda593b3f93d62cba5b60cf3
verifiers58b119fa1b24eff85b74a75ccf3e132523b3c6c3
vllma171e6b52dff47dc567657e7d51f641bdcb22774
pytorchc200b7e590a77d52373861844be4287c8ef9507a

Bumping any of these requires re-verifying every span in tasks/ — spans store absolute line numbers, so cosmetic upstream edits will silently shift the ground truth.

Lifecycle

  1. setup (PrimeGrepEnv.setup_state) — creates one persistent prime sandbox for the rollout from the prebuilt image.
  2. rolloutvf.StatefulToolEnv alternates between tool calls (grep / list_files / read_file) and assistant messages until submit_spans records the final answer.
  3. stop (PrimeGrepEnv.answer_submitted) — fires once state["submitted"] is set.
  4. cleanup (PrimeGrepEnv.destroy_sandbox) — deletes the sandbox through src/sandbox_manager.
  5. reward (span_score) — compares state["submitted_spans"] against the task's gold spans.

Reward modes

Set per-task via reward_mode in the YAML, or globally via the PRIME_GREP_REWARD_MODE env var.

  • file_recall (default) — 1.0 per essential gold span whose (repo, path) appears in the submission. Lenient: ignores exact line ranges.
  • span_iou — line-range IoU averaged across essential gold spans. Strict: sloppy ranges hurt the score.

Supporting spans (role: supporting in the YAML) never affect the score in either mode. They exist so depth answers (e.g. citing a torch primitive at the bottom of a call chain) are accepted but not required.

Tasks

tasks/*.yaml is bundled with this package; the repo root has a symlink for authoring convenience. See ../../AUTHORING.md for the recipe.

Quickstart

prime eval run prime-grep -n 8 -r 1

Environment Arguments

ArgTypeDefaultDescription
num_examplesint-1Limit on number of tasks (-1 = all)
max_turnsint30Max rollout turns before forced stop
sandbox_configdictprebuilt prime-grep image configOverride sandbox request settings

For compatibility with previous v1-style configs, load_environment also accepts taskset.num_examples and harness.max_turns, but it always returns a plain PrimeGrepEnv(vf.StatefulToolEnv).

Metrics

MetricMeaning
span_scoreMean recall over essential gold spans (mode-dependent)
num_predictedNumber of spans the model submitted
submitted1.0 if the model called submit_spans, else 0.0
avg_parallel_tool_calls_per_turnMean tool-call batch size across assistant turns that call tools; single-call turns are 1.0

Sandbox management

Sandbox command execution and cleanup are handled by src/sandbox_manager. It centralizes app-level retries, lifecycle counters, recoverable tool-facing errors, fatal infra errors, and best-effort sandbox deletion. In particular, a transient DELETE /sandbox/{id} 500 during rollout cleanup is retried and then recorded as an orphaned sandbox instead of failing the worker after an otherwise successful rollout.