0

Meta Memory State

Fresh

A deterministic multi-turn pocket-ledger state tracking environment.

Type
RL Env
Publisher
Abugoot
Runtime
multi-turn
License
apache-2.0
Size
v0.6.0
Published
Jun 2026

Cite

Notes

Only stored in your browser.

meta-memory-state

meta-memory-state is a small deterministic Verifiers environment for testing multi-turn state tracking.

Each example gives a pocket ledger with signed balances across a small set of accounts. The model receives 2-5 debit, credit, or transfer updates across user turns and must reply after each update with the full ledger inside a <ledger>...</ledger> JSON block.

The recommended first smoke uses only two accounts and one to two turns. Harder multi-turn settings are available through min_turns, max_turns, and account_count once the smoke shows useful reward variance.

The reward is shaped and defensive:

  • per-account balance correctness
  • total correctness
  • schema adherence, including penalties for hallucinated accounts and extra keys inside balances
  • format credit for one valid tagged JSON block
  • anti-stuffing penalty for multiple ledger candidates, repeated tags, long outputs, or code fences
  • malformed, empty, or None outputs return low reward instead of raising

The environment also reports auxiliary metrics for parseability, exact one-ledger format adherence, multiple-candidate outputs, code fences, candidate count, and mean assistant output length. These metrics are intended for diagnosing reward hacking without making held-out evals optimize a different objective.

The environment uses no tools, no sandbox, and no judge model.

Usage

from verifiers import load_environment

env = load_environment(
    "meta-memory-state",
    seed=1337420,
    num_examples=128,
    min_turns=1,
    max_turns=2,
    account_count=2,
)