0

Meta Sparse Shaping

Fresh

A deterministic list-transformation environment for sparse-vs-shaped RL probes.

Type
RL Env
Publisher
Abugoot
License
apache-2.0
Size
v0.1.2
Published
Jun 2026

Cite

Notes

Only stored in your browser.

meta-sparse-shaping

meta-sparse-shaping is a deterministic Verifiers environment for comparing sparse terminal reward with shaped intermediate reward on the same prompts.

Each example gives an integer list and a sequence of list operations. The model must return exactly one JSON object inside a <result>...</result> tag:

<result>{"trace": [[...], [...]], "answer": [...]}</result>

trace is the list after each operation. answer is the final list.

The environment exposes two reward modes:

  • reward_mode="sparse": format/schema credit is constant, but task reward mostly comes from exact final-answer correctness.
  • reward_mode="shaped": the same format/schema credit is used, plus partial credit for final-list element accuracy and exact intermediate trace steps.

This creates paired RL runs with identical data distributions and different reward surfaces.

Usage

from verifiers import load_environment

env = load_environment(
    "meta-sparse-shaping",
    seed=20260615,
    num_examples=128,
    min_length=4,
    max_length=7,
    min_ops=3,
    max_ops=5,
    operation_set="core",
    reward_mode="shaped",
)

Operation Sets

operation_set="core" uses:

  • add
  • subtract
  • multiply
  • reverse
  • rotate_left
  • sort_asc

operation_set="extended" also adds:

  • sort_desc
  • keep_even
  • keep_greater_than

Start with both reward modes on the same seed using Qwen 2B or Llama 1B. The main comparison metrics are reward, final exact answer, answer element accuracy, trace step accuracy, schema validity, and exact-one-result.