0

Infinity WG RL Env (Vibrantlabsai)

Fresh

Prime Intellect eval environment - airline customer service gym (world-gen)

Type
RL Env
Publisher
Vibrantlabsai
Runtime
tool-use
License
unknown
Size
v0.2.9
Published
Apr 2026

Cite

Notes

Only stored in your browser.

tau2_infinity_wg

Prime Intellect RL environment — airline customer-service gym backed by world-gen failure artifacts.

An agent plays the airline-support agent role against an LLM-simulated customer across a stateful tool environment. Each episode runs against a failure-*.json artifact that pins the initial world state and a set of objectives the agent must satisfy.

Rewards

Four components (ToolRL decomposition):

  • outcome — LLM-as-judge (ORM) scores each objective on a continuous 5-bucket rubric; final score is mean across objectives. See verifier.py for the judge determinism treatment.
  • format — strict tool-call schema compliance, binary.
  • efficiency — tool-call count vs. optimal, decaying.
  • compliance — penalties for parallel calls and unauthorized mutations.

See rewards.py for weights and the "Things explicitly deferred" section.

Artifacts

The artifacts/ directory holds world-gen outputs — each file is a task the agent must solve. Artifacts are loaded verbatim by dataset.py; the env never regenerates them at runtime.

Models

Default judge + customer-simulation model: bedrock/us.anthropic.claude-haiku-4-5-20251001-v1:0. Override via load_environment(judge_model=..., customer_model=...).

Required secrets (or env vars) at runtime: AWS_BEARER_TOKEN_BEDROCK, AWS_REGION.