0

Regex Generation RL Env (Community)

Fresh

An RL environment that trains language models to generate correct Python-compatible regular expressions from natural language descriptions and example test cases.

Type
RL Env
License
mit
Published
Feb 2026

Cite

Notes

Only stored in your browser.

Regex Generation Environment

An RL environment that trains language models to generate correct Python-compatible regular expressions from natural language descriptions and example test cases.

How it works

Each problem gives the model:

  • A natural language description of the pattern to match
  • A set of strings that should match
  • A set of strings that should not match

The model must produce a regex pattern inside <answer> tags. The pattern is tested using re.fullmatch() against all provided examples.

Reward signal

The reward is the fraction of test cases passed (both positive and negative). A score of 1.0 means the regex correctly matches all positive examples and rejects all negative ones. Groups where all rollouts score identically are discarded (no learning signal).

Problem set

The environment ships with 28 hand-crafted regex problems across three difficulty levels:

  • Easy: Basic patterns (digits only, starts with X, exact match)
  • Medium: Emails, dates, phone numbers, hex colors, zip codes
  • Hard: IPv4 addresses, semantic versioning, URLs, repeated words

Problems are split 80/20 into train/test sets.

Running

# Basic training
python regex_env.py serve \
    --env.tokenizer_name="NousResearch/DeepHermes-3-Llama-3-3B-Preview" \
    --openai.base_url http://localhost:9001/v1

# Only easy/medium problems
python regex_env.py serve \
    --env.difficulties='["easy", "medium"]'

Config options

OptionTypeDefaultDescription
difficultieslist[str]["easy", "medium", "hard"]Difficulty levels to include
score_thresholdfloat1.0Min score to count as "correct" in metrics

Standard BaseEnvConfig options (group_size, max_token_length, etc.) also apply.

Eval metrics

MetricDescription
eval/avg_scoreAverage fraction of test cases passed
eval/percent_perfectFraction of problems with all tests passing
eval/percent_valid_regexFraction of responses with syntactically valid regex
train/percent_correctTraining accuracy (problems scoring above threshold)

Dependencies

No extra dependencies beyond what Atropos already provides. Uses only Python's built-in re module for regex validation.