Nemotron-RL-Instruction-Following-Calendar-v2
Description
Nemotron-RL-Instruction-Following-Calendar-v2 evaluates multi-turn instruction following in calendar scheduling conversations. Based on the Nemotron-RL-Instruction-Following-Calendar-v2 dataset from NVIDIA, each task presents a multi-turn conversation where a user requests calendar events with time constraints. The agent must produce the next assistant response containing a correctly scheduled calendar as a JSON list, respecting all constraints (exact times, before/after bounds, time windows, no overlaps, 10am-4pm range).
Capabilities
- Scheduling calendar events with temporal constraints across multi-turn conversations
- Conflict resolution when new events overlap with existing ones
- Maintaining permanent constraints during rescheduling
- Producing valid JSON calendar output in the required format
Compute Requirements
Nemotron Calendar V2 does not require a sandbox. It has minimal compute requirements.
License
Tasks
There are 9,915 tasks across two splits:
| Split | Tasks |
|---|---|
| train | 9,659 |
| validation | 256 |
Each task presents a multi-turn conversation (4-50 messages, mean 28.7) containing system instructions, user requests, and prior assistant responses. The agent must provide the next assistant response with the complete updated calendar. The expected calendar state has 1-8 events per task (mean 5.2).
Reward Structure
This is a sparse, binary reward environment. The agent calls the answer tool once with its calendar response. The response is graded deterministically against the expected calendar state:
$$\text{Reward} = \begin{cases} 1 & \text{if all checks pass} \ 0 & \text{otherwise} \end{cases}$$
Checks performed:
- No
<think>tags in response - Valid JSON list extracted from response
- Correct number of events
- No overlapping events
- All constraints satisfied (duration, time window, before/after/at/between constraints)
Grading logic is ported from NemoGym. We do not use LLM graders for this task.
Data
Conversations are sourced from the Nemotron-RL-Instruction-Following-Calendar-v2 dataset by NVIDIA. Data files are stored on the OpenReward platform.
Tools
| Tool | Description |
|---|---|
answer | Submit a calendar assistant response. The response is graded deterministically against the expected calendar state. Returns pass/fail with reason. Called once per task. |
Time Horizon
Nemotron Calendar V2 is a single-turn environment. The agent receives a multi-turn conversation context and submits one response. Each task requires exactly one tool call.
Other Environment Requirements
There are no further environment requirements. Nemotron Calendar V2 uses deterministic grading and does not require any API keys.
Safety
Agents are asked to respond to calendar scheduling conversations with no access to external systems, tools, or the internet. The environment does not present direct safety risks.
Citations
@misc{nvidia2025nemotron,
title={Nemotron-RL-Instruction-Following-Calendar-v2},
author={NVIDIA},
year={2026},
publisher={Hugging Face},
url={https://huggingface.co/datasets/nvidia/Nemotron-RL-Instruction-Following-Calendar-v2}
}