0

RLM LEAN RL Env (Community)

Fresh

RLM agent on Lean 4 theorem proving via ComposableEnv.

Type
RL Env
License
apache-2.0
Published
Apr 2026

Cite

Notes

Only stored in your browser.

rlm-lean

RLM agent solving Lean 4 theorem proving tasks inside Prime Sandboxes via ComposableEnv.

Overview

  • Environment ID: rlm-lean
  • Agent: RLM — minimalistic CLI agent. Defaults to the ipython builtin (which supports !shell escapes); bash and edit can be enabled via rlm_tools.
  • TaskSet: LeanTaskSet (shared with opencode-lean). Presets: deepseek-prover-v1 (default), minif2f, goedel-pset, numina-lean, kimina, deepseek-proverbench.
  • Scoring: Compile /tmp/proof.lean with lake env lean from /workspace/mathlib4. Reward = 1.0 iff exit 0 and no declaration uses 'sorry'.

Quickstart

# From research-environments root
uv pip install -e ./environments/rlm_lean

# Single debug rollout (requires GH_TOKEN to fill the local RLM cache)
GH_TOKEN=... uv run vf-eval rlm-lean -d -v -n1 -r1

Environment Arguments

ArgumentDefaultDescription
preset"deepseek-prover-v1"Lean dataset preset
dataset_namepreset defaultOverride HF dataset name
dataset_splitpreset defaultOverride HF split
sandbox_docker_imageLeanTaskSet defaultDocker image with Lean 4 + Mathlib
compile_timeout120Per-compile shell timeout (seconds)
ds_keep_in_memoryTrueForwarded to LeanTaskSet dataset loader
ds_num_proc8Forwarded to LeanTaskSet dataset loader
rlm_max_turns100Max tool-calling turns for RLM
rlm_max_turns_in_context-1Max assistant turns retained in live context (-1 disables the limit)
rlm_exec_timeout300Per-tool execution timeout forwarded as RLM_EXEC_TIMEOUT
rlm_repo_urlharness defaultOverride the RLM repo URL
rlm_refharness defaultOverride the RLM Git ref (branch, tag, or commit SHA)
rlm_local_checkoutNoneUse an existing host-side RLM checkout instead of the host cache
rlm_toolsNoneRLM builtins to enable. Harness default is ["ipython"]; also available: bash, edit, summarize. The default ipython tool supports !lake env lean ... shell escapes
append_to_system_promptLean workflow promptLean-specific instructions appended to RLM's default system prompt
gh_token$GH_TOKENGitHub token for the private RLM repo (used host-side only)
max_turns200Max interception server turns
timeout_seconds5400Rollout deadline (90 min); also drives the sandbox lifetime
poll_interval1.0Seconds between intercept-queue polls
sandbox_cpu_cores4CPU cores per sandbox
sandbox_memory_gb4Memory per sandbox
sandbox_disk_size_gb10Disk per sandbox (Mathlib needs space)
sandbox_client_max_workersNoneShared sandbox client worker pool
labels["rlm-lean"]Sandbox labels

Notes

The default ipython-only configuration is enough to drive the proof loop: the agent reads the file via !cat, edits the body below the lean-guard end marker via text.replace(" sorry\n", ...) + Path.write_text(...) (or via the edit tool if enabled), and compiles via !cd /workspace/mathlib4 && lake env lean /tmp/proof.lean. Pass rlm_tools=["ipython", "bash", "edit"] if you want explicit bash and edit tools as separate calls.

Reward-hacking guard

Inherits the LeanTaskSet lean-guard introduced in PrimeIntellect-ai/verifiers#1271: the starter file wraps the theorem signature in -- lean-guard: begin/end protected marker comments, and the rubric refuses reward if the protected region differs from the original after the rollout. The default system prompt tells the agent to only edit lines below the end marker. Marker convention matches hallerite/lean-guard.

Changelog

v0.1.2

  • Bump verifiers pin from the 10b9a6c lean-guard merge commit to >=0.1.15.dev2. The release pin includes both lean-guard and the fix for ThreadedAsyncSandboxClient accepting max_workers=None (the env worker was crashing on startup since v0.1.1).

v0.1.1

  • Default sandbox_client_max_workers to None so the shared sandbox client uses the verifiers default worker cap unless callers explicitly override it.

v0.1.0

  • Initial release — LeanTaskSet + rlm_harness via ComposableEnv. Mirrors the structure of rlm-swe with the SWE taskset swapped for the Lean taskset.
  • Pinned to the verifiers lean-guard merge commit (#1271, 10b9a6c); will swap to a release pin once tagged.
  • LEAN_APPEND_TO_SYSTEM_PROMPT documents the lean-guard markers and steers the agent toward marker-preserving edits (text.replace, not wholesale write_text(NEW_CONTENTS)).