Nemotron-Science

Description

Nemotron-Science is a single-turn science reasoning environment based on NVIDIA's Nemotron-Science-v1 dataset. It contains two subsets:

MCQ: 174,155 multiple-choice science questions across STEM domains including genetics, pharmacogenomics, synthetic biology, evolutionary biology, and more.
RQA: 52,179 reasoning-based chemistry questions requiring mathematical derivations with boxed answers.

Capabilities

Scientific reasoning across diverse STEM domains
Multiple-choice question answering with structured option selection
Open-ended mathematical/scientific problem solving (chemistry focus)
Multi-step reasoning and derivation

License

CC-BY-4.0.

Tasks

There are two splits in this environment:

Split	Type	Tasks	Description
`mcq`	train	174,155	Multiple-choice science questions (GPQA-style)
`rqa`	train	52,179	Reasoning-based chemistry questions with boxed answers

Reward Structure

This is a sparse, verifiable reward environment. Rewards are binary:

MCQ: Exact letter match against the expected answer. Reward is 1.0 for correct, 0.0 for incorrect.
RQA: LLM-graded equivalence check (gpt-5-mini) comparing submitted answers against expected values, allowing for equivalent notations and minor rounding differences. Reward is 1.0 for correct, 0.0 for incorrect.

Data

Data is sourced from nvidia/Nemotron-Science-v1 on Hugging Face. The dataset consists of synthetic science reasoning data generated using GPT-OSS-120B for training the Nemotron 3 model family.

Each task contains a user question and an assistant response with reasoning traces. The environment extracts the question as the prompt and the answer from the assistant response for grading.

Tools

Tool	Description
`submit_answer`	Submit a final answer. For MCQ, provide a single letter (e.g. `C`). For RQA, provide the answer value.

Time Horizon

Single-turn: one prompt, one tool call, episode ends.

Other Environment Requirements

RQA split requires an OpenAI API key (passed via secrets["openai_api_key"]) for LLM-based answer grading.
MCQ split has no external requirements.

Safety

This environment presents no direct safety risks. Agents answer science questions and receive binary correctness feedback. No external actions, network access, or file system manipulation is involved.

Citations

@dataset{nvidia2026nemotronscience,
  author    = {NVIDIA},
  title     = {Nemotron-Science-v1},
  year      = {2026},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/datasets/nvidia/Nemotron-Science-v1}
}