0

TOOL TEST RL Env (Prime Intellect)

Fresh

Test environment for tool use

Type
RL Env
Runtime
single-turn
License
unknown
Size
v0.1.1
Published
Dec 2025

Cite

Notes

Only stored in your browser.

tool-test

Source Code

Overview

  • Environment ID: tool-test
  • Short description: Sanity-check tool-calling environment that asks models to invoke a random subset of dummy tools.
  • Tags: tools, single-turn, function-calling, sanity

Datasets

  • Primary dataset(s): Synthetic HF dataset generated in-memory with prompts specifying required tools
  • Source links: N/A (programmatically generated)
  • Split sizes: Controlled by num_train_examples and num_eval_examples

Task

  • Type: tool use (single-turn ToolEnv)
  • Rubric overview: ToolRubric checks tool execution and adds exact match on the required tool set

Quickstart

Run an evaluation with default settings:

prime eval run tool-test

Configure model and sampling:

prime eval run tool-test \
  -m gpt-4.1-mini \
  -n 20 -r 3 -t 1024 -T 0.7 \
  -a '{"num_train_examples": 1000, "num_eval_examples": 100}'

Environment Arguments

ArgTypeDefaultDescription
num_train_examplesint1000Number of training examples
num_eval_examplesint100Number of evaluation examples

Metrics

MetricMeaning
reward1.0 if called tool set equals required set, else 0.0
ToolRubric metricsTool execution success and format adherence