0

Skillsbench

Fresh

SkillsBench is an evaluation framework that measures how skills work, and the first dataset that measures how powerful models are at using skills on expert-curated tasks across high-GDP-value, diverse domains.

Type
RL Env
Publisher
benchflow
Runtime
ORS
License
unknown
Size
5 tasks
Published
Feb 2026

Cite

Notes

Only stored in your browser.

Public scores on this env

2

2 vf-eval reports across 2 models

Open the scoring view →