0

Nemotron RL Instruction Following Calendar V2

Fresh

Nemotron-RL-Instruction-Following-Calendar-v2 evaluates multi-turn instruction following in calendar scheduling conversations. Each task presents a multi-turn conversation where a user requests calendar events with time constraints. The agent must produce the next assistant re…

Type
RL Env
Publisher
NVIDIA
Runtime
ORS
License
unknown
Size
9915 tasks
Published
Mar 2026

Cite

Notes

Only stored in your browser.

Nemotron-RL-Instruction-Following-Calendar-v2

OpenReward Environment Hugging Face Dataset

Description

Nemotron-RL-Instruction-Following-Calendar-v2 evaluates multi-turn instruction following in calendar scheduling conversations. Based on the Nemotron-RL-Instruction-Following-Calendar-v2 dataset from NVIDIA, each task presents a multi-turn conversation where a user requests calendar events with time constraints. The agent must produce the next assistant response containing a correctly scheduled calendar as a JSON list, respecting all constraints (exact times, before/after bounds, time windows, no overlaps, 10am-4pm range).

Capabilities

  • Scheduling calendar events with temporal constraints across multi-turn conversations
  • Conflict resolution when new events overlap with existing ones
  • Maintaining permanent constraints during rescheduling
  • Producing valid JSON calendar output in the required format

Compute Requirements

Nemotron Calendar V2 does not require a sandbox. It has minimal compute requirements.

License

CC-BY-4.0.

Tasks

There are 9,915 tasks across two splits:

SplitTasks
train9,659
validation256

Each task presents a multi-turn conversation (4-50 messages, mean 28.7) containing system instructions, user requests, and prior assistant responses. The agent must provide the next assistant response with the complete updated calendar. The expected calendar state has 1-8 events per task (mean 5.2).

Reward Structure

This is a sparse, binary reward environment. The agent calls the answer tool once with its calendar response. The response is graded deterministically against the expected calendar state:

$$\text{Reward} = \begin{cases} 1 & \text{if all checks pass} \ 0 & \text{otherwise} \end{cases}$$

Checks performed:

  1. No <think> tags in response
  2. Valid JSON list extracted from response
  3. Correct number of events
  4. No overlapping events
  5. All constraints satisfied (duration, time window, before/after/at/between constraints)

Grading logic is ported from NemoGym. We do not use LLM graders for this task.

Data

Conversations are sourced from the Nemotron-RL-Instruction-Following-Calendar-v2 dataset by NVIDIA. Data files are stored on the OpenReward platform.

Tools

ToolDescription
answerSubmit a calendar assistant response. The response is graded deterministically against the expected calendar state. Returns pass/fail with reason. Called once per task.

Time Horizon

Nemotron Calendar V2 is a single-turn environment. The agent receives a multi-turn conversation context and submits one response. Each task requires exactly one tool call.

Other Environment Requirements

There are no further environment requirements. Nemotron Calendar V2 uses deterministic grading and does not require any API keys.

Safety

Agents are asked to respond to calendar scheduling conversations with no access to external systems, tools, or the internet. The environment does not present direct safety risks.

Citations

@misc{nvidia2025nemotron,
  title={Nemotron-RL-Instruction-Following-Calendar-v2},
  author={NVIDIA},
  year={2026},
  publisher={Hugging Face},
  url={https://huggingface.co/datasets/nvidia/Nemotron-RL-Instruction-Following-Calendar-v2}
}