Nemotron-RL-Agentic-Function-Calling-Pivot-v1

Description

Nemotron-RL-Agentic-Function-Calling-Pivot-v1 is an environment for evaluating agents on function-calling decision-making. It is based on the Nemotron-RL-Agentic-Function-Calling-Pivot-v1 dataset from NVIDIA, released as part of the NeMo Gym framework. The dataset poses each assistant step of an expert tool-use trajectory as a separate behavior cloning problem: the agent sees the conversation history and available tools, then must predict the correct next action -- either calling a specific function with the right arguments, or responding with a message.

Capabilities

Deciding when to call a tool vs. respond with a message
Selecting the correct function from a set of available tools
Generating correct function arguments as JSON
Multi-turn conversation comprehension
Reasoning about tool capabilities relative to user requests

Compute Requirements

Nemotron-FC-Pivot does not require a sandbox. It has minimal compute requirements.

License

CC-BY-4.0.

Tasks

There is one split with 9,620 tasks:

train (9,620 tasks): Function-calling pivot points extracted from expert tool-use trajectories. Each task presents a conversation context and asks the agent to predict the correct next action.

Reward Structure

This is a sparse, binary reward environment matching the NeMo Gym ground truth verification. The agent makes a single submission per task:

Function call tasks: Binary reward (0 or 1). The function name must match exactly. Arguments are compared recursively: dict keys must match, list lengths must match, floats use 1e-6 tolerance, short strings require exact match, longer strings use Jaccard word-count similarity (threshold 0.1). All must pass for reward 1.0.
Message tasks: Binary reward. Any chat message when a message was expected yields reward 1.0.
Wrong action type: Calling a function when a message was expected (or vice versa) yields reward 0.0.

Data

Decision points are sourced from the Nemotron-RL-Agentic-Function-Calling-Pivot-v1 dataset by NVIDIA. The original dataset uses OpenAI Responses API format with expert trajectories. The download_data.py script downloads and normalises the data to a flat parquet format for efficient serving.

Tools

This environment uses task-specific tools. Each task dynamically exposes the actual tools from the dataset (e.g., get_balance_sheet, get_earnings, generateImageUrl) via list_task_tools(). The agent interacts with these tools through native function calling.

In addition, there is one shared tool:

submit_message: Submit a text message response. Use when no function call is appropriate and the agent should respond directly to the user.

Time Horizon

Nemotron-FC-Pivot is a single-turn environment. The agent receives a conversation context and submits one action. Each task requires exactly one tool call.

Other Environment Requirements

Nemotron-FC-Pivot does not require any API keys or secrets. All grading is rule-based.

Safety

Agents in Nemotron-FC-Pivot are asked to predict the next action in a synthetic conversation. The environment does not present direct safety risks, as agents only submit predictions with no access to external systems, real tools, or the internet.

Citations

@dataset{nvidia_nemotron_fc_pivot_v1,
  author    = {NVIDIA Corporation},
  title     = {Nemotron-RL-Agentic-Function-Calling-Pivot-v1},
  year      = {2026},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/datasets/nvidia/Nemotron-RL-Agentic-Function-Calling-Pivot-v1},
  license   = {CC-BY-4.0}
}