RLM Secrets
A puzzle environment designed to thoroughly test RLM (Recursive Language Model) functionality.
Overview
This environment tests all major RLM components through a file-based puzzle:
- Root-level tools:
decrypt_position,unveil_file_number - Sub-LLM tools:
get_code_from_file_data - Sub-LLM calls: via
llm_batch - File operations:
ls,cat,rm
The Puzzle
- Several files with random names exist in the working directory
- Each file contains a random UUID as content
- Files have a hidden "correct order" (positions 1, 2, 3, ...)
- One position is designated as the "file to keep"
Solution Flow
- List files: Root LLM runs
lsto discover files - Get codes: For each file:
- Read content with
cat - Call sub-LLM via
llm_batchasking it to useget_code_from_file_data - Sub-LLM calls the tool and returns the code
- Read content with
- Decrypt positions: Root LLM calls
decrypt_position(filename, code)to learn each file's position - Unveil answer: Root LLM calls
unveil_file_number([sorted_filenames])to learn which position to keep - Clean up: Delete all files except the one at the revealed position
- Answer: Set
RLM_CONTENTto the kept file's position,RLM_READY=1
Tools
Root-Level (called directly by root LLM)
decrypt_position <filename> <code>
Returns the 1-indexed position if the code is valid, error message otherwise.
unveil_file_number --json '{"sorted_filenames": ["file1.txt", "file2.txt", ...]}'
Returns which position's file to keep if order is correct, error message otherwise.
Sub-LLM Level (called by sub-LLMs via tool use)
get_code_from_file_data(filename: str, filecontent: str) -> str
Returns the decryption code if filename and content match, a fake code otherwise.
Usage
prime eval run rlm-secrets
Reward Functions
Both reward functions have equal weight (0.5 each):
- correct_answer: 1.0 if final answer matches correct position
- correct_filesystem_state: 1.0 if exactly one .txt file remains AND it's the correct one
Configuration
| Parameter | Default | Description |
|---|---|---|
num_train_examples | 100 | Training puzzles |
num_files | 4 | Files per puzzle |
max_turns | 50 | Max REPL iterations |
sub_llm_max_turns | 3 | Max tool turns for sub-LLMs |
max_sub_llm_parallelism | 5 | Concurrent sub-LLM calls |
code_execution_timeout | 120 | Bash execution timeout (seconds) |
**kwargs | - | Passed on RLMEnv.__init__ |
Note: The eval dataset is not built separately. For evaluation, re-instantiate the
environment with a different seed to generate a new synthetic split.
Why This Environment?
This environment is specifically designed to test RLM capabilities:
- Forces root-LLM usage: The correct order can only be obtained by calling the root-level tools
- Forces sub-LLM usage: The code can only be obtained by having a sub-LLM call
get_code_from_file_data - Forces sub-LLM tool use: Sub-LLMs must use their tool to get the code
- Tests file operations: Model must list, read, and delete files
- Tests information flow: Data must flow: file → sub-LLM → root-LLM → tool → answer
The puzzle is simple enough that models should be able to solve it, while being complex enough to exercise all RLM components.
Changelog
- 0.1.3: align arg names with simplified RLMEnv (
max_iterations→max_turns,sub_tool_max_turns→sub_llm_max_turns) - 0.1.2: sandbox labels no longer force in the default label
- 0.1.1:
- add default "rlm-secrets" label to the
sandbox_labelsno matter what the user passes ther in the kwargs - dedupe
sandbox_labelsif passed via the kwargs
- add default "rlm-secrets" label to the