0

GEM Wordle RL Env (Prime Intellect)

Fresh

Gym-compatible Wordle environment powered by AxonRL's GEM framework

Type
RL Env
Runtime
multi-turn
License
unknown
Size
v0.1.0
Published
Dec 2025

Cite

Notes

Only stored in your browser.

gem_wordle

Source Code

Overview

  • Environment ID: gem_wordle
  • Short description: Multi-turn Wordle game environment powered by the GEM framework. Models must guess a 5-letter word using \boxed{} format actions.
  • Tags: games, multi-turn, wordle, gem, regex, feedback

Datasets

  • Primary dataset(s): GEM game:Wordle-v0 (environment auto-generates episodes)
  • Source links: AxonRL GEM
  • Split sizes: Number of episodes controlled via args (auto-generated dummy dataset)

Task

  • Type: multi-turn (gym environment interaction)
  • Rubric overview: Sum of per-step rewards returned by GEM (includes shaping + terminal success reward, plus small negative penalties for format/invalid actions; commonly -0.1)

Quickstart

Run an evaluation with default settings:

prime eval run gem_wordle

Configure model and sampling (recommend higher -t so the model reliably emits the closing }):

export OPENAI_API_KEY=EMPTY
prime eval run gem_wordle \
  -b http://127.0.0.1:8000/v1 -k OPENAI_API_KEY \
  -m Qwen/Qwen3-30B-A3B-Instruct-2507 \
  -n 20 -r 3 -t 1024 \
  -a '{"num_train_episodes": 1000, "num_eval_episodes": 20}' \
  -s

Environment Arguments

ArgTypeDefaultDescription
num_train_episodesint1000Number of training episodes (auto-generated)
num_eval_episodesint20Number of evaluation episodes (auto-generated)

Metrics

MetricMeaning
sum_step_rewardsSum of GEM per-step rewards (training reward)
win_rate1.0 if episode ends with “Congratulations!”, else 0.0