0

Nemotron RL Agentic Function Calling Pivot V1

Fresh

This is a RL dataset for general function-calling by utilizing existing expert tool-use trajectories. We pose each assistant step of the trajectory as a separate behavior cloning problem where the policy model is incentivized to match the tool call choices of the expert model.

Type
RL Env
Publisher
NVIDIA
Runtime
ORS
License
unknown
Size
9620 tasks
Published
Mar 2026

Cite

Notes

Only stored in your browser.

Nemotron-RL-Agentic-Function-Calling-Pivot-v1

OpenReward Environment Hugging Face Dataset

Description

Nemotron-RL-Agentic-Function-Calling-Pivot-v1 is an environment for evaluating agents on function-calling decision-making. It is based on the Nemotron-RL-Agentic-Function-Calling-Pivot-v1 dataset from NVIDIA, released as part of the NeMo Gym framework. The dataset poses each assistant step of an expert tool-use trajectory as a separate behavior cloning problem: the agent sees the conversation history and available tools, then must predict the correct next action -- either calling a specific function with the right arguments, or responding with a message.

Capabilities

  • Deciding when to call a tool vs. respond with a message
  • Selecting the correct function from a set of available tools
  • Generating correct function arguments as JSON
  • Multi-turn conversation comprehension
  • Reasoning about tool capabilities relative to user requests

Compute Requirements

Nemotron-FC-Pivot does not require a sandbox. It has minimal compute requirements.

License

CC-BY-4.0.

Tasks

There is one split with 9,620 tasks:

  • train (9,620 tasks): Function-calling pivot points extracted from expert tool-use trajectories. Each task presents a conversation context and asks the agent to predict the correct next action.

Reward Structure

This is a sparse, binary reward environment matching the NeMo Gym ground truth verification. The agent makes a single submission per task:

  • Function call tasks: Binary reward (0 or 1). The function name must match exactly. Arguments are compared recursively: dict keys must match, list lengths must match, floats use 1e-6 tolerance, short strings require exact match, longer strings use Jaccard word-count similarity (threshold 0.1). All must pass for reward 1.0.
  • Message tasks: Binary reward. Any chat message when a message was expected yields reward 1.0.
  • Wrong action type: Calling a function when a message was expected (or vice versa) yields reward 0.0.

Data

Decision points are sourced from the Nemotron-RL-Agentic-Function-Calling-Pivot-v1 dataset by NVIDIA. The original dataset uses OpenAI Responses API format with expert trajectories. The download_data.py script downloads and normalises the data to a flat parquet format for efficient serving.

Tools

This environment uses task-specific tools. Each task dynamically exposes the actual tools from the dataset (e.g., get_balance_sheet, get_earnings, generateImageUrl) via list_task_tools(). The agent interacts with these tools through native function calling.

In addition, there is one shared tool:

  • submit_message: Submit a text message response. Use when no function call is appropriate and the agent should respond directly to the user.

Time Horizon

Nemotron-FC-Pivot is a single-turn environment. The agent receives a conversation context and submits one action. Each task requires exactly one tool call.

Other Environment Requirements

Nemotron-FC-Pivot does not require any API keys or secrets. All grading is rule-based.

Safety

Agents in Nemotron-FC-Pivot are asked to predict the next action in a synthetic conversation. The environment does not present direct safety risks, as agents only submit predictions with no access to external systems, real tools, or the internet.

Citations

@dataset{nvidia_nemotron_fc_pivot_v1,
  author    = {NVIDIA Corporation},
  title     = {Nemotron-RL-Agentic-Function-Calling-Pivot-v1},
  year      = {2026},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/datasets/nvidia/Nemotron-RL-Agentic-Function-Calling-Pivot-v1},
  license   = {CC-BY-4.0}
}