0

ReasoningGym

Fresh

Reasoning Gym is a community-created Python library of procedural dataset generators and algorithmically verifiable reasoning environments for training reasoning models with reinforcement learning (RL).

Type
RL Env
Runtime
ORS
License
unknown
Size
104000 tasks
Published
Jan 2026

Cite

Notes

Only stored in your browser.

Reasoning-Gym-Envs

OpenReward Environment

Description

Reasoning-Gym-Envs is an environment wrapper for the reasoning-gym Python package, providing 105+ procedurally-generated reasoning datasets as OpenReward environments. It covers 12 categories including algebra, algorithmic problems, ARC variants, arithmetic, code execution, cognition, games, geometry, graphs, induction, logic, and probability.

Capabilities

  • Procedurally-generated reasoning tasks
  • Algorithmic answer verification
  • Multi-category reasoning evaluation
  • Deterministic task generation with seeding

Compute Requirements

Agents are given a standard environment with no sandbox or file system access.

License

Apache 2.0.

Tasks

There is one split per dataset in this environment:

  • train: 500 tasks per dataset (default, configurable)

Datasets span 12 categories:

  • Algebra (6): Complex arithmetic, polynomial equations, integration
  • Algorithmic (34): Ciphers, string manipulation, graph problems
  • ARC (3): Abstraction & Reasoning Corpus variants
  • Arithmetic (18): Basic math, GCD, LCM, prime factorization
  • Code (2): Brainfuck execution, code I/O
  • Cognition (7): Rubik's cube, pattern recognition, ASCII art
  • Games (17): Sudoku, chess puzzles, logic games
  • Geometry (2): Basic and advanced geometric calculations
  • Graphs (5): Shortest path, topological sort, relationships
  • Induction (2): Causal reasoning, function learning
  • Logic (7): Knights & Knaves, propositional logic, syllogisms
  • Probability (1): Coin flips and probability reasoning

Reward Structure

This is a single-turn environment. The agent submits an answer via the submit_answer tool. Verification is algorithmic via reasoning-gym's score_answer() function. Most datasets use exact match scoring (0.0 or 1.0), with some supporting partial credit (e.g., Rubik's cube: 0.0-1.0 based on solution quality).

Data

No external data files required. All tasks are procedurally generated in-memory using deterministic seeding from the reasoning-gym package.

Tools

ToolDescription
submit_answerSubmit your answer for algorithmic verification. Ends the episode.

Time Horizon

Single-turn. The agent reads the reasoning problem and submits one answer.

Environment Difficulty

[Put environment difficulty here]

Other Environment Requirements

None. All evaluation is deterministic and procedurally generated.

Safety

Agents in Reasoning-Gym-Envs solve reasoning problems in a standard environment. The environment does not present direct safety risks.

Citation

@misc{stojanovski2025reasoninggymreasoningenvironments,
  title={REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards},
  author={Zafir Stojanovski and Oliver Stanley and Joe Sharratt and Richard Jones and Abdulhakeem Adefioye and Jean Kaddour and Andreas Köpf},
  year={2025},
  eprint={2505.24760},
  archivePrefix={arXiv},
  primaryClass={cs.LG},
  url={https://arxiv.org/abs/2505.24760}
}