self-reward
Overview
- Environment ID:
self-reward - Short description: Single-turn evaluation where a judge model scores responses based on a simple scoring prompt.
- Tags: judge, single-turn, self-reward, openai-compatible
Datasets
- Primary dataset(s): Any HF dataset with
question/answercolumns (specified bydataset_name) - Source links: Hugging Face Datasets
- Split sizes: Uses the dataset’s
trainfile by default
Task
- Type: single-turn
- Rubric overview:
JudgeRubricuses a judge client/model/prompt to produce a 0–1 score
Quickstart
Run an evaluation with default settings (example):
prime eval run self-reward -a '{"dataset_name": "your/dataset", "model_name": "Qwen/Qwen3-0.6B"}'
Configure model and sampling:
prime eval run self-reward \
-m gpt-4.1-mini \
-n 20 -r 3 -t 1024 -T 0.7 \
-a '{"dataset_name": "your/dataset", "judge_model": "Qwen/Qwen3-0.6B", "base_url": "http://0.0.0.0:8000/v1", "api_key_var": "JUDGE_API_KEY"}'
Notes:
- Use
-a/--env-argsto pass environment-specific configuration as a JSON object. - Reports are written under
./environments/self_reward/reports/and auto-embedded below.
Environment Arguments
| Arg | Type | Default | Description |
|---|---|---|---|
dataset_name | str | "PrimeIntellect/Hendrycks-Math" | HF dataset name or path containing question/answer |
judge_model | str | "gpt-4.1-mini" | Judge model name (OpenAI-compatible) |
base_url | str | "http://0.0.0.0:8000/v1" | Judge API base URL |
api_key_var | str | "JUDGE_API_KEY" | Env var for judge API key (optional, defaults to "EMPTY" if not set) |
Metrics
| Metric | Meaning |
|---|---|
reward | Judge-produced score, normalized to 0–1 |