rlm-lean

RLM agent solving Lean 4 theorem proving tasks inside Prime Sandboxes via ComposableEnv.

Overview

Environment ID: rlm-lean
Agent: RLM — minimalistic CLI agent. Defaults to the ipython builtin (which supports !shell escapes); bash and edit can be enabled via rlm_tools.
TaskSet: LeanTaskSet (shared with opencode-lean). Presets: deepseek-prover-v1 (default), minif2f, goedel-pset, numina-lean, kimina, deepseek-proverbench.
Scoring: Compile /tmp/proof.lean with lake env lean from /workspace/mathlib4. Reward = 1.0 iff exit 0 and no declaration uses 'sorry'.

Quickstart

# From research-environments root
uv pip install -e ./environments/rlm_lean

# Single debug rollout (requires GH_TOKEN to fill the local RLM cache)
GH_TOKEN=... uv run vf-eval rlm-lean -d -v -n1 -r1

Environment Arguments

Argument	Default	Description
`preset`	`"deepseek-prover-v1"`	Lean dataset preset
`dataset_name`	preset default	Override HF dataset name
`dataset_split`	preset default	Override HF split
`sandbox_docker_image`	`LeanTaskSet` default	Docker image with Lean 4 + Mathlib
`compile_timeout`	120	Per-compile shell timeout (seconds)
`ds_keep_in_memory`	True	Forwarded to `LeanTaskSet` dataset loader
`ds_num_proc`	8	Forwarded to `LeanTaskSet` dataset loader
`rlm_max_turns`	100	Max tool-calling turns for RLM
`rlm_max_turns_in_context`	-1	Max assistant turns retained in live context (`-1` disables the limit)
`rlm_exec_timeout`	300	Per-tool execution timeout forwarded as `RLM_EXEC_TIMEOUT`
`rlm_repo_url`	harness default	Override the RLM repo URL
`rlm_ref`	harness default	Override the RLM Git ref (branch, tag, or commit SHA)
`rlm_local_checkout`	None	Use an existing host-side RLM checkout instead of the host cache
`rlm_tools`	None	RLM builtins to enable. Harness default is `["ipython"]`; also available: `bash`, `edit`, `summarize`. The default ipython tool supports `!lake env lean ...` shell escapes
`append_to_system_prompt`	Lean workflow prompt	Lean-specific instructions appended to RLM's default system prompt
`gh_token`	`$GH_TOKEN`	GitHub token for the private RLM repo (used host-side only)
`max_turns`	200	Max interception server turns
`timeout_seconds`	5400	Rollout deadline (90 min); also drives the sandbox lifetime
`poll_interval`	1.0	Seconds between intercept-queue polls
`sandbox_cpu_cores`	4	CPU cores per sandbox
`sandbox_memory_gb`	4	Memory per sandbox
`sandbox_disk_size_gb`	10	Disk per sandbox (Mathlib needs space)
`sandbox_client_max_workers`	None	Shared sandbox client worker pool
`labels`	`["rlm-lean"]`	Sandbox labels

Notes

The default ipython-only configuration is enough to drive the proof loop: the agent reads the file via !cat, edits the body below the lean-guard end marker via text.replace(" sorry\n", ...) + Path.write_text(...) (or via the edit tool if enabled), and compiles via !cd /workspace/mathlib4 && lake env lean /tmp/proof.lean. Pass rlm_tools=["ipython", "bash", "edit"] if you want explicit bash and edit tools as separate calls.

Reward-hacking guard

Inherits the LeanTaskSet lean-guard introduced in PrimeIntellect-ai/verifiers#1271: the starter file wraps the theorem signature in -- lean-guard: begin/end protected marker comments, and the rubric refuses reward if the protected region differs from the original after the rollout. The default system prompt tells the agent to only edit lines below the end marker. Marker convention matches hallerite/lean-guard.

Changelog

v0.1.2

Bump verifiers pin from the 10b9a6c lean-guard merge commit to >=0.1.15.dev2. The release pin includes both lean-guard and the fix for ThreadedAsyncSandboxClient accepting max_workers=None (the env worker was crashing on startup since v0.1.1).

v0.1.1

Default sandbox_client_max_workers to None so the shared sandbox client uses the verifiers default worker cap unless callers explicitly override it.

v0.1.0

Initial release — LeanTaskSet + rlm_harness via ComposableEnv. Mirrors the structure of rlm-swe with the SWE taskset swapped for the Lean taskset.
Pinned to the verifiers lean-guard merge commit (#1271, 10b9a6c); will swap to a release pin once tagged.
LEAN_APPEND_TO_SYSTEM_PROMPT documents the lean-guard markers and steers the agent toward marker-preserving edits (text.replace, not wholesale write_text(NEW_CONTENTS)).