NVIDIA is an org.
Cite
Notes
Only stored in your browser.
Nemotron-RL-Instruction-Following-Calendar-v2 evaluates multi-turn instruction following in calendar scheduling conversations. Each task presents a multi-turn conversation where a user requests calendar events with time constraints. The agent must produce the next assistant re…
This environment is for conversational tool-use and utilises existing expert tool-use trajectories. Each assistant step of the trajectory is posed as a separate behavior cloning problem where the policy model is incentivized to match the tool call choices of the expert model.
AceReason-Math is an environment for evaluating mathematical reasoning on a curated set of 49,585 math problems. The dataset was created by NVIDIA as part of the AceReason-Nemotron project, which uses reinforcement learning to advance math and code reasoning capabilities.
This dataset is designed to train Generative Reward Models (GenRMs). It leverages reinforcement learning at scale to train accurate and robust GenRMs that generalize better than traditional Bradley-Terry models and reduce the risk of reward hacking.
The Nemotron-Cascade-2-RL dataset is a curated reinforcement learning (RL) dataset blend used to train Nemotron-Cascade-2-30B-A3B model. It includes instruction-following RL, multi-domain RL and on-policy distillation data. Note we exclude the SWE-RL data in this ORS implement…
Nemotron-Science-v1 is a synthetic science reasoning dataset with two subsets: an MCQA set that improves on the STEM portion of Nemotron-Post-Training-v1 using GPT-OSS-120B to generate GPQA-style questions and reasoning traces, and an RQA set of synthetic chemistry questions.
The Nemotron-Agentic-Tool-Use-v1 dataset is designed to strengthen models’ capabilities as interactive, tool-using agents.
The Nemotron-RL-coding-competitive_coding dataset is a python-only, reasoning-based, synthetic dataset. It contains competitive coding style problems and their unit test cases. These questions and test cases are collected from CodeContests and Open-R1.
The Nemotron-RL-agent-workplace_assistant is a tool use agentic environment that tests the agent’s ability to execute tasks in a workplace setting. Workbench contains a sandbox environment with five databases, 26 tools, and 690 tasks. These tasks represent common business acti…
The Nemotron-RL-ReasoningGym-v1 dataset is designed to improve reasoning capabilities across a broad range of domains, including algebra, arithmetic, computation, cognition, geometry, graph theory, logic, and many common games. It contains 15,000 samples spanning 104 RL enviro…
Nemotron-Math-Proofs-v1 is a large-scale mathematical reasoning dataset containing ~580k natural language proof problems, ~550k formalizations into theorem statements in Lean 4, and ~900k model-generated reasoning trajectories culminating in Lean 4 proofs. The dataset integrat…
This is a RL dataset for general function-calling by utilizing existing expert tool-use trajectories. We pose each assistant step of the trajectory as a separate behavior cloning problem where the policy model is incentivized to match the tool call choices of the expert model.
The MultiChallenge Dataset is a rigorous benchmark designed to improve large language models in complex multi-turn conversations by explicitly targeting inference memory, instruction retention, version editing, and self-coherence. It employs a unique "model breaking" methodolo…
Nemotron-RL-Instruction-Following-Adversarial-v1 focuses on adversarial prompts designed to explicitly conflict with an AI model’s standard training instincts - such as writing code without comments or refusing standard helpfulness norms - across 8 distinct "anti-convention" p…