meta-sparse-shaping
meta-sparse-shaping is a deterministic Verifiers environment for comparing
sparse terminal reward with shaped intermediate reward on the same prompts.
Each example gives an integer list and a sequence of list operations. The model
must return exactly one JSON object inside a <result>...</result> tag:
<result>{"trace": [[...], [...]], "answer": [...]}</result>
trace is the list after each operation. answer is the final list.
The environment exposes two reward modes:
reward_mode="sparse": format/schema credit is constant, but task reward mostly comes from exact final-answer correctness.reward_mode="shaped": the same format/schema credit is used, plus partial credit for final-list element accuracy and exact intermediate trace steps.
This creates paired RL runs with identical data distributions and different reward surfaces.
Usage
from verifiers import load_environment
env = load_environment(
"meta-sparse-shaping",
seed=20260615,
num_examples=128,
min_length=4,
max_length=7,
min_ops=3,
max_ops=5,
operation_set="core",
reward_mode="shaped",
)
Operation Sets
operation_set="core" uses:
addsubtractmultiplyreverserotate_leftsort_asc
operation_set="extended" also adds:
sort_desckeep_evenkeep_greater_than
Recommended First Probes
Start with both reward modes on the same seed using Qwen 2B or Llama 1B. The main comparison metrics are reward, final exact answer, answer element accuracy, trace step accuracy, schema validity, and exact-one-result.