RLM Secrets

A puzzle environment designed to thoroughly test RLM (Recursive Language Model) functionality.

Overview

This environment tests all major RLM components through a file-based puzzle:

Root-level tools: decrypt_position, unveil_file_number
Sub-LLM tools: get_code_from_file_data
Sub-LLM calls: via llm_batch
File operations: ls, cat, rm

The Puzzle

Several files with random names exist in the working directory
Each file contains a random UUID as content
Files have a hidden "correct order" (positions 1, 2, 3, ...)
One position is designated as the "file to keep"

Solution Flow

List files: Root LLM runs ls to discover files
Get codes: For each file:
- Read content with cat
- Call sub-LLM via llm_batch asking it to use get_code_from_file_data
- Sub-LLM calls the tool and returns the code
Decrypt positions: Root LLM calls decrypt_position(filename, code) to learn each file's position
Unveil answer: Root LLM calls unveil_file_number([sorted_filenames]) to learn which position to keep
Clean up: Delete all files except the one at the revealed position
Answer: Set RLM_CONTENT to the kept file's position, RLM_READY=1

Tools

Root-Level (called directly by root LLM)

decrypt_position <filename> <code>

Returns the 1-indexed position if the code is valid, error message otherwise.

unveil_file_number --json '{"sorted_filenames": ["file1.txt", "file2.txt", ...]}'

Returns which position's file to keep if order is correct, error message otherwise.

Sub-LLM Level (called by sub-LLMs via tool use)

get_code_from_file_data(filename: str, filecontent: str) -> str

Returns the decryption code if filename and content match, a fake code otherwise.

Usage

prime eval run rlm-secrets

Reward Functions

Both reward functions have equal weight (0.5 each):

correct_answer: 1.0 if final answer matches correct position
correct_filesystem_state: 1.0 if exactly one .txt file remains AND it's the correct one

Configuration

Parameter	Default	Description
`num_train_examples`	100	Training puzzles
`num_files`	4	Files per puzzle
`max_turns`	50	Max REPL iterations
`sub_llm_max_turns`	3	Max tool turns for sub-LLMs
`max_sub_llm_parallelism`	5	Concurrent sub-LLM calls
`code_execution_timeout`	120	Bash execution timeout (seconds)
`**kwargs`	-	Passed on `RLMEnv.__init__`

Note: The eval dataset is not built separately. For evaluation, re-instantiate the environment with a different seed to generate a new synthetic split.

Why This Environment?

This environment is specifically designed to test RLM capabilities:

Forces root-LLM usage: The correct order can only be obtained by calling the root-level tools
Forces sub-LLM usage: The code can only be obtained by having a sub-LLM call get_code_from_file_data
Forces sub-LLM tool use: Sub-LLMs must use their tool to get the code
Tests file operations: Model must list, read, and delete files
Tests information flow: Data must flow: file → sub-LLM → root-LLM → tool → answer

The puzzle is simple enough that models should be able to solve it, while being complex enough to exercise all RLM components.

Changelog

0.1.3: align arg names with simplified RLMEnv (max_iterations → max_turns, sub_tool_max_turns → sub_llm_max_turns)
0.1.2: sandbox labels no longer force in the default label
0.1.1:
- add default "rlm-secrets" label to the sandbox_labels no matter what the user passes ther in the kwargs
- dedupe sandbox_labels if passed via the kwargs