0

Thematic Generalization RL Env (Prime Intellect)

Fresh

This benchmark measures how effectively various LLMs can infer a narrow or specific "theme" (category/rule) from a small set of examples and anti-examples, then detect which item truly fits that theme among a collection of misleading candidates.

Type
RL Env
Runtime
single-turn
License
unknown
Size
v0.1.0
Published
Sep 2025

Cite

Notes

Only stored in your browser.

Public scores on this env

1

2 vf-eval reports across 1 model

Open the scoring view →