0

RLM Science RL Env (Community)

Fresh

RLM agent solving science problems via ComposableEnv.

Type
RL Env
License
apache-2.0
Published
May 2026

Cite

Notes

Only stored in your browser.

rlm-science

RLM agent solving science problems inside Prime Sandboxes via ComposableEnv.

Overview

  • Environment ID: rlm_science
  • Agent: RLM — minimalistic CLI agent with builtin ipython (Python + shell). numpy, scipy, and sympy are installed into rlm's tool venv at agent install time via RLM_EXTRA_UV_ARGS, so they're importable from inside ipython.
  • Dataset: PrimeIntellect/INTELLECT-3-RL (subset science, split train) by default. Any HF dataset with question/answer columns works.
  • Scoring: RemoteHybridMathRubric — reads the agent's answer from /app/answer.txt, runs math_verify against the gold answer (the env installs math-verify into the sandbox's system Python during setup so the scorer can import it), and falls back to an LLM judge on miss.

Quickstart

# From research-environments root
uv pip install -e ./environments/rlm_science

# Single debug rollout (GH_TOKEN required when the host needs to fill the rlm checkout cache)
GH_TOKEN=... uv run vf-eval rlm-science -d -v -n1 -r1

# Multiple rollouts, save results
GH_TOKEN=... uv run vf-eval rlm-science -n5 -r3 -s

Environment Arguments

ArgumentDefaultDescription
dataset_name"PrimeIntellect/INTELLECT-3-RL"HF dataset name
dataset_subset"science"HF subset
dataset_split"train"HF split
question_key"question"Column for questions
answer_key"answer"Column for expected answers
instruction_prompt"Solve the following problem.\n\n"Prefix prepended to each question
answer_path"/app/answer.txt"Path the agent writes its final answer to
difficulty_key"avg@16_qwen3_4b_instruct_2507"Column for difficulty filtering
min_avg_reward0.0Min reward for difficulty filter
max_avg_reward1.0Max reward for difficulty filter
judge_model"openai/gpt-5-nano"Judge model for fallback
judge_base_url"https://api.pinference.ai/api/v1"Judge API base URL
judge_api_key_var"PRIME_API_KEY"Env var for judge API key
use_judge_fallbackTrueUse LLM judge if math_verify fails
judge_promptNoneOverride judge prompt
judge_timeout1200.0Judge HTTP timeout (s)
sandbox_docker_image"python:3.11-slim"Sandbox image; computation libs are installed live into rlm's tool venv
extra_pip_packages["numpy", "scipy", "sympy"]Extra packages installed into rlm's tool venv via RLM_EXTRA_UV_ARGS. Pass [] to install nothing extra. Avoid shell metacharacters (e.g. >=) — install.sh word-splits these tokens.
gh_token$GH_TOKENGitHub token used on the host to fill the rlm checkout cache when needed
max_turns200Interception server turns
timeout_seconds3600.0Rollout timeout (1h); also drives sandbox lifetime
poll_interval1.0Intercept-queue poll cadence (s)
sandbox_cpu_cores1CPU cores per sandbox (matches opencode_science)
sandbox_memory_gb2Memory (GB) per sandbox
sandbox_disk_size_gb4Disk (GB) per sandbox
sandbox_client_max_workersNoneMax workers in shared sandbox client
labels["rlm-science"]Sandbox labels
**rlm_kwargsForwarded as-is to rlm_harness. Includes rlm_max_turns, rlm_exec_timeout, summarize_at_tokens, rlm_ref, rlm_repo_url, local_checkout, rlm_tools, append_to_system_prompt.

Computation libraries inside the agent

rlm's ipython kernel runs in rlm's own tool venv (created by uv tool install in rlm's install.sh). To make numpy/scipy/sympy importable from inside the kernel, this env passes RLM_EXTRA_UV_ARGS="--with numpy --with scipy --with sympy" via ComposableEnv(install_env=...). The install script reads it and uv tool installs the extras alongside rlm itself. Override via extra_pip_packages.

How scoring works

MathTaskSet.get_instruction appends "Write your final answer to \/app/answer.txt`."to each prompt. After the rollout,RemoteHybridMathRubricreads that file from the sandbox and runsmath_verify` against the gold answer. On miss it falls back to an LLM judge. Reward is 1.0 on match, else 0.0.

Changelog

v0.1.1

  • Default sandbox_client_max_workers to None so the shared sandbox client uses the verifiers default worker cap unless callers explicitly override it.

v0.1.0

  • Initial release.