rlm-lean
RLM agent solving Lean 4 theorem proving tasks inside Prime Sandboxes via ComposableEnv.
Overview
- Environment ID:
rlm-lean - Agent: RLM — minimalistic CLI agent. Defaults to the
ipythonbuiltin (which supports!shellescapes);bashandeditcan be enabled viarlm_tools. - TaskSet:
LeanTaskSet(shared withopencode-lean). Presets:deepseek-prover-v1(default),minif2f,goedel-pset,numina-lean,kimina,deepseek-proverbench. - Scoring: Compile
/tmp/proof.leanwithlake env leanfrom/workspace/mathlib4. Reward = 1.0 iff exit 0 and nodeclaration uses 'sorry'.
Quickstart
# From research-environments root
uv pip install -e ./environments/rlm_lean
# Single debug rollout (requires GH_TOKEN to fill the local RLM cache)
GH_TOKEN=... uv run vf-eval rlm-lean -d -v -n1 -r1
Environment Arguments
| Argument | Default | Description |
|---|---|---|
preset | "deepseek-prover-v1" | Lean dataset preset |
dataset_name | preset default | Override HF dataset name |
dataset_split | preset default | Override HF split |
sandbox_docker_image | LeanTaskSet default | Docker image with Lean 4 + Mathlib |
compile_timeout | 120 | Per-compile shell timeout (seconds) |
ds_keep_in_memory | True | Forwarded to LeanTaskSet dataset loader |
ds_num_proc | 8 | Forwarded to LeanTaskSet dataset loader |
rlm_max_turns | 100 | Max tool-calling turns for RLM |
rlm_max_turns_in_context | -1 | Max assistant turns retained in live context (-1 disables the limit) |
rlm_exec_timeout | 300 | Per-tool execution timeout forwarded as RLM_EXEC_TIMEOUT |
rlm_repo_url | harness default | Override the RLM repo URL |
rlm_ref | harness default | Override the RLM Git ref (branch, tag, or commit SHA) |
rlm_local_checkout | None | Use an existing host-side RLM checkout instead of the host cache |
rlm_tools | None | RLM builtins to enable. Harness default is ["ipython"]; also available: bash, edit, summarize. The default ipython tool supports !lake env lean ... shell escapes |
append_to_system_prompt | Lean workflow prompt | Lean-specific instructions appended to RLM's default system prompt |
gh_token | $GH_TOKEN | GitHub token for the private RLM repo (used host-side only) |
max_turns | 200 | Max interception server turns |
timeout_seconds | 5400 | Rollout deadline (90 min); also drives the sandbox lifetime |
poll_interval | 1.0 | Seconds between intercept-queue polls |
sandbox_cpu_cores | 4 | CPU cores per sandbox |
sandbox_memory_gb | 4 | Memory per sandbox |
sandbox_disk_size_gb | 10 | Disk per sandbox (Mathlib needs space) |
sandbox_client_max_workers | None | Shared sandbox client worker pool |
labels | ["rlm-lean"] | Sandbox labels |
Notes
The default ipython-only configuration is enough to drive the proof loop: the agent reads the file via !cat, edits the body below the lean-guard end marker via text.replace(" sorry\n", ...) + Path.write_text(...) (or via the edit tool if enabled), and compiles via !cd /workspace/mathlib4 && lake env lean /tmp/proof.lean. Pass rlm_tools=["ipython", "bash", "edit"] if you want explicit bash and edit tools as separate calls.
Reward-hacking guard
Inherits the LeanTaskSet lean-guard introduced in PrimeIntellect-ai/verifiers#1271: the starter file wraps the theorem signature in -- lean-guard: begin/end protected marker comments, and the rubric refuses reward if the protected region differs from the original after the rollout. The default system prompt tells the agent to only edit lines below the end marker. Marker convention matches hallerite/lean-guard.
Changelog
v0.1.2
- Bump
verifierspin from the10b9a6clean-guard merge commit to>=0.1.15.dev2. The release pin includes both lean-guard and the fix forThreadedAsyncSandboxClientacceptingmax_workers=None(the env worker was crashing on startup since v0.1.1).
v0.1.1
- Default
sandbox_client_max_workerstoNoneso the shared sandbox client uses the verifiers default worker cap unless callers explicitly override it.
v0.1.0
- Initial release —
LeanTaskSet+rlm_harnessvia ComposableEnv. Mirrors the structure ofrlm-swewith the SWE taskset swapped for the Lean taskset. - Pinned to the
verifierslean-guard merge commit (#1271,10b9a6c); will swap to a release pin once tagged. LEAN_APPEND_TO_SYSTEM_PROMPTdocuments the lean-guard markers and steers the agent toward marker-preserving edits (text.replace, not wholesalewrite_text(NEW_CONTENTS)).