NVIDIA

NVIDIA is an org.

Type: org

Cite

Notes

Only stored in your browser.

Evals

Tools

Models

Papers

Boards

People

Tools

Nemotron RL Agent Workplace Assistant

The Nemotron-RL-agent-workplace_assistant is a tool use agentic environment that tests the agent’s ability to execute tasks in a workplace setting. Workbench contains a sandbox environment with five databases, 26 tools, and 690 tasks. These tasks represent common business acti…

RL EnvAI Assistant Evaluation

Nemotron RL Instruction Following MultiTurnChat V1

The MultiChallenge Dataset is a rigorous benchmark designed to improve large language models in complex multi-turn conversations by explicitly targeting inference memory, instruction retention, version editing, and self-coherence. It employs a unique "model breaking" methodolo…

RL EnvMulti Turn Conversation Evaluation

Nemotron RL Coding Competitive Coding

The Nemotron-RL-coding-competitive_coding dataset is a python-only, reasoning-based, synthetic dataset. It contains competitive coding style problems and their unit test cases. These questions and test cases are collected from CodeContests and Open-R1.

RL EnvCompetitive Level Code Reasoning

Nemotron RL ReasoningGym V1

The Nemotron-RL-ReasoningGym-v1 dataset is designed to improve reasoning capabilities across a broad range of domains, including algebra, arithmetic, computation, cognition, geometry, graph theory, logic, and many common games. It contains 15,000 samples spanning 104 RL enviro…

RL EnvReasoning Evaluation

Nemotron RLHF GenRM V1

This dataset is designed to train Generative Reward Models (GenRMs). It leverages reinforcement learning at scale to train accurate and robust GenRMs that generalize better than traditional Bradley-Terry models and reduce the risk of reward hacking.

RL EnvReward Model Evaluation

Nemotron Cascade 2 RL Data

The Nemotron-Cascade-2-RL dataset is a curated reinforcement learning (RL) dataset blend used to train Nemotron-Cascade-2-30B-A3B model. It includes instruction-following RL, multi-domain RL and on-policy distillation data. Note we exclude the SWE-RL data in this ORS implement…

RL EnvInstruction Following

Nemotron Math Proofs V1

Nemotron-Math-Proofs-v1 is a large-scale mathematical reasoning dataset containing ~580k natural language proof problems, ~550k formalizations into theorem statements in Lean 4, and ~900k model-generated reasoning trajectories culminating in Lean 4 proofs. The dataset integrat…

RL EnvMathematical Reasoning

Nemotron Agentic V1

The Nemotron-Agentic-Tool-Use-v1 dataset is designed to strengthen models’ capabilities as interactive, tool-using agents.

RL EnvTool Use in Large Language Models

AceReason Math

AceReason-Math is an environment for evaluating mathematical reasoning on a curated set of 49,585 math problems. The dataset was created by NVIDIA as part of the AceReason-Nemotron project, which uses reinforcement learning to advance math and code reasoning capabilities.

RL EnvMathematical Reasoning

Nemotron RL Agentic Function Calling Pivot V1

This is a RL dataset for general function-calling by utilizing existing expert tool-use trajectories. We pose each assistant step of the trajectory as a separate behavior cloning problem where the policy model is incentivized to match the tool call choices of the expert model.

RL EnvFunction Calling in Conversational AI

Nemotron Science V1

Nemotron-Science-v1 is a synthetic science reasoning dataset with two subsets: an MCQA set that improves on the STEM portion of Nemotron-Post-Training-v1 using GPT-OSS-120B to generate GPQA-style questions and reasoning traces, and an RQA set of synthetic chemistry questions.

RL EnvScientific ReasoningScience

Nemotron RL Instruction Following Calendar V2

Nemotron-RL-Instruction-Following-Calendar-v2 evaluates multi-turn instruction following in calendar scheduling conversations. Each task presents a multi-turn conversation where a user requests calendar events with time constraints. The agent must produce the next assistant re…

RL EnvAgentic AI Evaluation

Nemotron RL Instruction Following Adversarial V1

Nemotron-RL-Instruction-Following-Adversarial-v1 focuses on adversarial prompts designed to explicitly conflict with an AI model’s standard training instincts - such as writing code without comments or refusing standard helpfulness norms - across 8 distinct "anti-convention" p…

RL EnvInstruction Following

Nemotron RL Agentic Conversational Tool Use Pivot V1

This environment is for conversational tool-use and utilises existing expert tool-use trajectories. Each assistant step of the trajectory is posed as a separate behavior cloning problem where the policy model is incentivized to match the tool call choices of the expert model.

RL EnvTool Use in Large Language Models