meta-data-analysis-lite

meta-data-analysis-lite is a deterministic Verifiers environment for small tabular arithmetic questions.

Each example contains a generated CSV with rows for region, product, channel, units, returns, net units, unit price, and revenue. The model answers one question by returning exactly one JSON object inside a <result>...</result> tag:

<result>{"answer": 123}</result>

The initial v0.1 task families are:

total net_units by region
total revenue by product
filtered row counts
highest-revenue region with deterministic tie-breaking
product revenue differences
average unit_price by channel

The reward is deterministic and shaped:

answer correctness, including numeric closeness for arithmetic mistakes
schema adherence for the single required answer key
format credit for valid JSON inside exactly one result tag
penalties for multiple candidates, repeated tags, code fences, or very long outputs

By default the environment uses no tools, no sandbox, and no judge model. Version 0.2.1 can also run with tools=true, which exposes a cheap deterministic analyze_table(csv_text, question) helper. The tool computes the answer from the CSV text and exact generated question, then the model must still finish with one <result>...</result> answer. This is meant to test whether tool access improves computation without hurting final-answer coherence.

Version 0.3.0 adds an explicit tool-routing mix. In addition to the base aggregation tasks, task_families="base,tool_routing" includes direct row lookups and row comparisons where analyze_table is intentionally unsupported. These examples are labeled with tool_policy="avoid" and receive a small penalty for unnecessary tool calls, while base aggregation examples are labeled tool_policy="recommended". Optional distractor columns make the CSVs longer without changing the deterministic answer.

Version 0.3.1 adds a reward penalty for missing the recommended tool on supported aggregation examples. This keeps the routing task from collapsing into an always-avoid-tools policy.

Version 0.3.2 keeps the v0.3.1 routing behavior for tool-enabled runs, but makes the recommended-tool penalty inactive when tools are disabled. This preserves no-tool eval semantics for future ablations.

Usage

from verifiers import load_environment

env = load_environment(
    "meta-data-analysis-lite",
    seed=20260603,
    num_examples=128,
    min_rows=8,
    max_rows=12,
)

Tool mode:

env = load_environment(
    "meta-data-analysis-lite",
    seed=20260603,
    num_examples=128,
    min_rows=8,
    max_rows=12,
    tools=True,
    max_tool_turns=3,
)

Tool-routing mix:

env = load_environment(
    "meta-data-analysis-lite",
    seed=20260603,
    num_examples=128,
    min_rows=14,
    max_rows=20,
    task_families="base,tool_routing",
    tools=True,
    max_tool_turns=3,
    distractor_columns=True,
)