Skillsbench
Fresh
SkillsBench is an evaluation framework that measures how skills work, and the first dataset that measures how powerful models are at using skills on expert-curated tasks across high-GDP-value, diverse domains.
- Type
- RL Env
- Publisher
- benchflow
- Runtime
ORS- License
- unknown
- Size
- 5 tasks
- Published
- Feb 2026
- Canonical
- openreward.ai/benchflow/skillsbench
Cite
Notes
Only stored in your browser.
Public scores on this env
22 vf-eval reports across 2 models