Nemotron-RL-ReasoningGym

Description

Nemotron-RL-ReasoningGym is a procedural reasoning environment sourced from NVIDIA's Nemotron-RL-ReasoningGym-v1 dataset. It covers 104 distinct task types across 12 categories including logic puzzles, math problems, games (sudoku, sokoban), graph algorithms, string manipulation, cognitive tasks, and family relationship reasoning.

Capabilities

Solving diverse procedural reasoning tasks
Logic puzzle solving (sudoku, sokoban, etc.)
Graph and string algorithm reasoning
Mathematical problem solving
Family relationship inference

License

CC-BY-4.0.

Tasks

This environment uses task indexing for efficient access.

Split	Tasks
`train`	15,000

Each task presents a reasoning problem with a deterministic, algorithmically verifiable answer.

Reward Structure

This is a sparse, verifiable reward environment. The agent receives a reward of 1.0 for an exact string match with the expected answer and 0.0 otherwise. Answers are generated procedurally, ensuring correctness.

No LLM graders are used.

Data

Data is sourced from nvidia/Nemotron-RL-ReasoningGym-v1 on HuggingFace. Tasks are procedurally generated with ground-truth answers.

Tools

Tool	Description
`submit_answer`	Submit your final answer. Ends the episode.

Time Horizon

This is a single-turn environment. The agent reads the problem and submits one answer.

Other Environment Requirements

No external API keys or secrets are required.

Safety

This environment presents standard reasoning puzzles. There are no safety risks.

Citations

@dataset{nvidia_nemotron_reasoninggym,
  author    = {NVIDIA},
  title     = {Nemotron-RL-ReasoningGym-v1},
  year      = {2026},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/datasets/nvidia/Nemotron-RL-ReasoningGym-v1}
}