0

Goblin Questions RL Env (Goblintron)

Fresh

Simple goblin-frequency question environment with a coherence judge

Type
RL Env
Publisher
Goblintron
Runtime
single-turn
License
unknown
Size
v0.1.7
Published
May 2026

Cite

Notes

Only stored in your browser.

goblin-questions

A minimal v1 Taskset/Harness environment for measuring whether answers include goblin on a fixed set of open-ended preference questions.

Reward

Final reward is the weighted sum of two rewards:

reward = 0.5 * goblin_reward + 0.5 * judge_reward
  • goblin_reward: 1.0 if the model response contains goblin, otherwise 0.0.
  • judge_reward: 1.0 if gpt-5.4-nano judges the response coherent, relevant, and not overly repetitive.

There is no ground-truth answer. The judge sees the original prompt and the model response. The judge does not see the goblin reward.

Prompts

The taskset contains 58 plain questions. Parenthetical answer hints from the prompt brainstorming list are intentionally omitted from the model prompts.

Requirements

OPENAI_API_KEY is required for the judge.

Config

Judge and reward settings can be configured in TOML. Use the same keys under [eval.taskset] for eval configs or [env.taskset] for RL configs:

[env.taskset]
hidden_word = "goblin"
judge_model = "gpt-5.4-nano"
judge_max_completion_tokens = 512

[env.taskset.scoring.goblin_reward]
weight = 0.5

[env.taskset.scoring.judge_reward]
weight = 0.5

Quickstart

prime env install goblin-questions
prime eval run goblin-questions