0

Skillsbench

SkillsBench is an evaluation framework that measures how skills work, and the first dataset that measures how powerful models are at using skills on expert-curated tasks across high-GDP-value, diverse domains.

Domain
rl-env
License
unknown
Published
Feb 2026

Cite

Notes

Only stored in your browser.

Attribution

Leaderboard scores
OpenReward
Attribution policy →

Top score 45.3 by Claude Opus 4.5 - 2 models reporting (2 frontier)

Score history

2
012.52537.550Nov 25Dec 25Jan 26Feb 26Claude Opus 4.5

Top models

2
SkillsbenchBar chart with 2 bars. Highest value: Claude Opus 4.5 at 45.3.
2 models

FAQ

What is Skillsbench?
SkillsBench is an evaluation framework that measures how skills work, and the first dataset that measures how powerful models are at using skills on expert-curated tasks across high-GDP-value, diverse domains.
What is the current top score on Skillsbench?
The top reported score is 45.3 by Claude Opus 4.5, across 2 models reporting (2 from frontier labs).
What license is Skillsbench under?
Skillsbench is available under unknown.