Nemotron-RL-Instruction-Following-Calendar-v2

Description

Nemotron-RL-Instruction-Following-Calendar-v2 evaluates multi-turn instruction following in calendar scheduling conversations. Based on the Nemotron-RL-Instruction-Following-Calendar-v2 dataset from NVIDIA, each task presents a multi-turn conversation where a user requests calendar events with time constraints. The agent must produce the next assistant response containing a correctly scheduled calendar as a JSON list, respecting all constraints (exact times, before/after bounds, time windows, no overlaps, 10am-4pm range).

Capabilities

Scheduling calendar events with temporal constraints across multi-turn conversations
Conflict resolution when new events overlap with existing ones
Maintaining permanent constraints during rescheduling
Producing valid JSON calendar output in the required format

Compute Requirements

Nemotron Calendar V2 does not require a sandbox. It has minimal compute requirements.

License

CC-BY-4.0.

Tasks

There are 9,915 tasks across two splits:

Split	Tasks
train	9,659
validation	256

Each task presents a multi-turn conversation (4-50 messages, mean 28.7) containing system instructions, user requests, and prior assistant responses. The agent must provide the next assistant response with the complete updated calendar. The expected calendar state has 1-8 events per task (mean 5.2).

Reward Structure

This is a sparse, binary reward environment. The agent calls the answer tool once with its calendar response. The response is graded deterministically against the expected calendar state:

$$\text{Reward} = \begin{cases} 1 & \text{if all checks pass} \ 0 & \text{otherwise} \end{cases}$$

Checks performed:

No <think> tags in response
Valid JSON list extracted from response
Correct number of events
No overlapping events
All constraints satisfied (duration, time window, before/after/at/between constraints)

Grading logic is ported from NemoGym. We do not use LLM graders for this task.

Data

Conversations are sourced from the Nemotron-RL-Instruction-Following-Calendar-v2 dataset by NVIDIA. Data files are stored on the OpenReward platform.

Tools

Tool	Description
`answer`	Submit a calendar assistant response. The response is graded deterministically against the expected calendar state. Returns pass/fail with reason. Called once per task.

Time Horizon

Nemotron Calendar V2 is a single-turn environment. The agent receives a multi-turn conversation context and submits one response. Each task requires exactly one tool call.

Other Environment Requirements

There are no further environment requirements. Nemotron Calendar V2 uses deterministic grading and does not require any API keys.

Safety

Agents are asked to respond to calendar scheduling conversations with no access to external systems, tools, or the internet. The environment does not present direct safety risks.

Citations

@misc{nvidia2025nemotron,
  title={Nemotron-RL-Instruction-Following-Calendar-v2},
  author={NVIDIA},
  year={2026},
  publisher={Hugging Face},
  url={https://huggingface.co/datasets/nvidia/Nemotron-RL-Instruction-Following-Calendar-v2}
}