AceReason-Math

Description

AceReason-Math is an environment for evaluating mathematical reasoning on a curated set of 49,585 math problems. The dataset was created by NVIDIA as part of the AceReason-Nemotron project, which uses reinforcement learning to advance math and code reasoning capabilities. Problems are sourced from the NuminaMath and DeepScaler-Preview datasets and filtered for quality, excluding multiple sub-questions, multiple-choice problems, true/false questions, proofs, and figure-based problems. Each problem has a short verified answer.

Capabilities

Mathematical reasoning across diverse topics and difficulty levels
Multi-step problem solving
Algebraic manipulation and computation
Numerical answer extraction

Compute Requirements

AceReason-Math is a lightweight, single-turn environment. No sandbox or significant compute resources are required beyond the agent's own inference.

License

CC BY 4.0.

Tasks

There is one split in this environment:

train: 49,585 math problems sourced from NuminaMath and DeepScaler-Preview, filtered for quality by the NVIDIA team.

Each task presents the agent with a math problem statement. The agent must solve the problem and submit its answer using the answer tool.

Reward Structure

AceReason-Math uses a binary, deterministic reward:

1.0 if the submitted answer is correct
0.0 if the submitted answer is incorrect

Grading is performed using the math-verify library, which parses and verifies mathematical expressions for equivalence. No LLM grader is used.

Data

The 49,585 math problems are sourced from the nvidia/AceReason-Math dataset on Hugging Face. Each record contains a problem field (the problem statement) and an answer field (the ground-truth short answer). Data files are stored on the OpenReward platform.

Tools

AceReason-Math exposes a single tool:

Tool	Parameters	Description
`answer`	`answer: str`	Submits the agent's final answer. The answer is parsed and verified against the ground truth using math-verify. This call ends the episode.

Time Horizon

AceReason-Math is a single-turn environment. The agent receives the problem in the prompt and is expected to make exactly one tool call (answer) to submit its solution.

Environment Difficulty

The dataset spans a wide range of difficulty levels, from straightforward arithmetic to competition-level mathematics.

Other Environment Requirements

There are no further environment requirements; AceReason-Math works out of the box with the OpenReward endpoint without any external API keys.

Safety

AceReason-Math is a purely mathematical evaluation environment. The agent solves well-defined math problems with known correct answers and does not interact with external systems, APIs, or other agents. The environment presents no direct or indirect safety risks.

Citation

@article{chen2025acereason,
  title={AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning},
  author={Chen, Yang and Yang, Zhuolin and Liu, Zihan and Lee, Chankyu and Xu, Peng and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei},
  journal={arXiv preprint arXiv:2505.16400},
  year={2025}
}