0

Toxicity Explanation RL Env (Prime Intellect)

Fresh

Explain why a given text is toxic

Type
RL Env
Runtime
single-turn
License
unknown
Size
v0.1.1
Published
Dec 2025

Cite

Notes

Only stored in your browser.

toxicity-explanation

Source Code

Overview

  • Environment ID: toxicity-explanation
  • Short description: Judge-based evaluation for toxicity classification with explanations using Civil Comments.
  • Tags: toxicity, classification, explanation, judge, single-turn

Datasets

  • Primary dataset(s): google/civil_comments mapped to toxicity targets and metadata
  • Source links: Hugging Face Datasets
  • Split sizes: Train split; size optionally limited via max_examples

Task

  • Type: single-turn
  • Rubric overview: JudgeRubric with a numeric (0–10) rubric normalized to 0–1; evaluates correctness and explanation quality

Quickstart

Run an evaluation with default settings:

prime eval run toxicity-explanation

Configure model and sampling:

prime eval run toxicity-explanation \
  -m gpt-4.1-mini \
  -n 20 -r 3 -t 1024 -T 0.7 \
  -a '{"judge_model": "gpt-4.1-mini", "judge_base_url": "https://api.openai.com/v1", "judge_api_key_var": "OPENAI_API_KEY", "max_examples": -1}'

Notes:

  • Use -a / --env-args to configure the judge model/provider and dataset size.

Environment Arguments

ArgTypeDefaultDescription
judge_modelstr"gpt-4.1-mini"Judge model name
judge_base_urlstr"https://api.openai.com/v1"Judge provider base URL
judge_api_key_varstr"OPENAI_API_KEY"Env var containing judge API key
max_examplesint-1If > 0, limit dataset to this many examples

Metrics

MetricMeaning
rewardNormalized judge score (0–1)