0

Continuation Quality RL Env (Prime Intellect)

Fresh

Single-turn quality grades on base model continuations using a judge model.

Type
RL Env
Runtime
single-turn
License
unknown
Size
v0.1.1
Published
Dec 2025

Cite

Notes

Only stored in your browser.

continuation-quality

Source Code

Overview

  • Environment ID: continuation-quality
  • Short description: Single-turn quality grades on base model continuations using a judge model.
  • Tags: single-turn, completions, base-model

Datasets

  • Primary dataset(s): agentlans/wikipedia-paragraphs mapped to prefix/ground-truth continuation
  • Source links: Hugging Face Datasets
  • Split sizes: Train split filtered to adequately-long paragraphs

Task

  • Type: single-turn
  • Rubric overview: Judge model letter grade (gpt-4.1-mini-based by default)

Quickstart

Run an evaluation with default settings:

prime eval run continuation-quality

Configure model and sampling:

prime eval run continuation-quality   -m gpt-4.1-mini   -n 20 -r 3 -t 1024 -T 0.7   -a '{"key": "value"}'  # env-specific args as JSON

Notes:

  • Use -a / --env-args to pass environment-specific configuration as a JSON object.

Environment Arguments

Document any supported environment arguments and their meaning. Example:

ArgTypeDefaultDescription
dataset_namestr"agentlans/wikipedia-paragraphs"Training dataset
dataset_splitstr"train"Training dataset split
dataset_keystr"text"Column in dataset with training text
judge_modelstr"gpt-4.1-mini"Model to judge continuations with
judge_base_urlstr"https://api.openai.com/v1"API base URL for judge model
judge_api_key_varstr"OPENAI_API_KEY"Environment variable containing the judge model API key

Metrics

Summarize key metrics your rubric emits and how they’re interpreted.

MetricMeaning
rewardMain scalar reward (weighted sum of criteria)