0

Normal Goblin RL Env (Goblintron)

Fresh

Normal prompt eval with exact goblin hidden reward and a coherence judge

Type
RL Env
Publisher
Goblintron
Runtime
single-turn
License
unknown
Size
v0.2.2
Published
May 2026

Cite

Notes

Only stored in your browser.

normal-goblin

A v1 Taskset/Harness eval over ordinary prompts with a hidden exact-word reward.

Reward

Main reward is the product of two separate metrics:

combined_reward = hidden_reward * judge_reward
  • hidden_reward: 1.0 when the response contains goblin, otherwise 0.0.
  • judge_reward: 1.0 when OpenAI gpt-5.4-nano judges the response coherent, relevant, and not overly repetitive.

The judge sees the original user prompt and model response. It does not see reward metadata. The judge decision is cached in rollout state and reused by combined_reward and the logged judge_reward metric.

Prompts

The prompt set contains 60 ordinary tasks across explanation, arithmetic, rewriting, translation, code, planning, customer support, creative writing, policy, health, travel, and technical settings. The prompt text does not ask for the hidden word.

Requirements

OPENAI_API_KEY is required for the judge.

Config

Judge and reward settings can be configured in TOML. Use the same keys under [eval.taskset] for eval configs or [env.taskset] for RL configs:

[env.taskset]
hidden_word = "goblin"
judge_model = "gpt-5.4-nano"
judge_max_completion_tokens = 512

[env.taskset.scoring.combined_reward]
weight = 1.0

Quickstart

prime env install normal-goblin
prime eval run normal-goblin