0

AceReason Math

Fresh

AceReason-Math is an environment for evaluating mathematical reasoning on a curated set of 49,585 math problems. The dataset was created by NVIDIA as part of the AceReason-Nemotron project, which uses reinforcement learning to advance math and code reasoning capabilities.

Type
RL Env
Publisher
NVIDIA
Runtime
ORS
License
unknown
Size
49585 tasks
Published
Mar 2026

Cite

Notes

Only stored in your browser.

AceReason-Math

OpenReward Environment Hugging Face Dataset

Description

AceReason-Math is an environment for evaluating mathematical reasoning on a curated set of 49,585 math problems. The dataset was created by NVIDIA as part of the AceReason-Nemotron project, which uses reinforcement learning to advance math and code reasoning capabilities. Problems are sourced from the NuminaMath and DeepScaler-Preview datasets and filtered for quality, excluding multiple sub-questions, multiple-choice problems, true/false questions, proofs, and figure-based problems. Each problem has a short verified answer.

Capabilities

  • Mathematical reasoning across diverse topics and difficulty levels
  • Multi-step problem solving
  • Algebraic manipulation and computation
  • Numerical answer extraction

Compute Requirements

AceReason-Math is a lightweight, single-turn environment. No sandbox or significant compute resources are required beyond the agent's own inference.

License

CC BY 4.0.

Tasks

There is one split in this environment:

  • train: 49,585 math problems sourced from NuminaMath and DeepScaler-Preview, filtered for quality by the NVIDIA team.

Each task presents the agent with a math problem statement. The agent must solve the problem and submit its answer using the answer tool.

Reward Structure

AceReason-Math uses a binary, deterministic reward:

  • 1.0 if the submitted answer is correct
  • 0.0 if the submitted answer is incorrect

Grading is performed using the math-verify library, which parses and verifies mathematical expressions for equivalence. No LLM grader is used.

Data

The 49,585 math problems are sourced from the nvidia/AceReason-Math dataset on Hugging Face. Each record contains a problem field (the problem statement) and an answer field (the ground-truth short answer). Data files are stored on the OpenReward platform.

Tools

AceReason-Math exposes a single tool:

ToolParametersDescription
answeranswer: strSubmits the agent's final answer. The answer is parsed and verified against the ground truth using math-verify. This call ends the episode.

Time Horizon

AceReason-Math is a single-turn environment. The agent receives the problem in the prompt and is expected to make exactly one tool call (answer) to submit its solution.

Environment Difficulty

The dataset spans a wide range of difficulty levels, from straightforward arithmetic to competition-level mathematics.

Other Environment Requirements

There are no further environment requirements; AceReason-Math works out of the box with the OpenReward endpoint without any external API keys.

Safety

AceReason-Math is a purely mathematical evaluation environment. The agent solves well-defined math problems with known correct answers and does not interact with external systems, APIs, or other agents. The environment presents no direct or indirect safety risks.

Citation

@article{chen2025acereason,
  title={AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning},
  author={Chen, Yang and Yang, Zhuolin and Liu, Zihan and Lee, Chankyu and Xu, Peng and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei},
  journal={arXiv preprint arXiv:2505.16400},
  year={2025}
}