0

Seedeval RL Env (Charvi Pi)

Fresh

Building a benchmark to evaluate the diversity of seeds generated using different tool calls.

Type
RL Env
Publisher
Charvi Pi
Runtime
multi-turn
License
unknown
Size
v0.1.0
Published
Feb 2026

Cite

Notes

Only stored in your browser.

seedeval

Building a benchmark to evaluate the diversity of seeds generated using different tool calls.

Overview

Datasets

  • Primary dataset(s): <Fine Web (Works with any text dataset)>

Architecture

seed-eval-benchmark/
└── environments/
    └── seedeval/
        ├── README.md
        ├── pyproject.toml
        └── seedeval.py

Quickstart

Run an evaluation with default settings:

prime eval run seedeval

Configure model and sampling:

prime eval run seedeval -m gpt-4.1-mini  -n 20 -r 3

Metrics

Summarize key metrics your rubric emits and how they’re interpreted.

MetricMeaning
Diversity ScoreMeasures how semantically different the seeds are from each other
Education ScoreUses an LLM judge to rate overall seed quality from 0–5 (similar to fine-web edu)
Not Proper Noun RatioRatio of words that are not names, dates or places to the total number of generated seeds