0

Color Codeword RL Env (Prime Intellect)

Fresh

Multi-turn VLM environment for testing image accumulation across turns

Type
RL Env
Runtime
multi-turn
License
unknown
Size
v0.1.0
Published
Feb 2026

Cite

Notes

Only stored in your browser.

color-codeword

Source Code

Overview

  • Environment ID: color-codeword
  • Short description: Multi-turn VLM environment where colored squares must be decoded into letters and accumulated across turns.
  • Tags: vlm, multi-turn, multimodal

Datasets

  • Primary dataset(s): Synthetically generated at runtime
  • Source links: N/A (procedural generation)
  • Split sizes: Configurable via num_examples parameter (default: 1000)

Task

  • Type: multi-turn
  • Parser: Extracts letter sequences (A-I) from model response
  • Rubric overview: Binary reward - 1.0 for exact match, 0.0 otherwise.

The model sees colored square images across multiple turns and must decode them using a fixed mapping (Red=A, Green=B, Blue=C, Yellow=D, Purple=E, Cyan=F, Orange=G, White=H, Black=I). Each turn adds new squares; the model must output the full accumulated codeword.

Quickstart

prime eval run color-codeword -m Qwen/Qwen3-VL-32B-Instruct -n 100

Harder configuration (10 images across 5 turns, 2 per turn):

prime eval run color-codeword \
  -m Qwen/Qwen3-VL-32B-Instruct \
  -n 100 -t 128 -T 0.7 \
  -a '{"images_per_turn": 2, "max_turns": 5}'

Environment Arguments

ArgTypeDefaultDescription
num_examplesint1000Number of examples to generate
images_per_turnint2Number of colored squares shown each turn
max_turnsint3Number of conversation turns
seedint42Random seed for reproducibility

Total codeword length = images_per_turn × max_turns (default: 6 letters).

Metrics

MetricMeaning
reward1.0 if extracted codeword matches expected, 0.0 otherwise
partial_match_scoreFraction of letters correct at each position

Changelog

v0.1.0

  • Initial release