0

Autonomous Skill Evolution

Frontier

Self-Improving Agent Environment: create, validate, refine, compose, and evolve reusable skills from execution traces. 40 tasks, 5 tiers, 10+ domai...

Domain
rl-env
License
unknown
Published
May 2026

Cite

Notes

Only stored in your browser.

Attribution

Leaderboard scores
prime-hub
Attribution policy →

Top score 3.21 by Claude Opus 4.7 - 9 models reporting (6 frontier)

Score history

7
01.252.53.755Sep 24Jan 25May 25Sep 25Jan 26Llama 3.2 Instruct 1BQwen3 8BClaude Opus 4.6Claude Opus 4.7

Top models

9
Autonomous Skill EvolutionBar chart with 9 bars. Highest value: Claude Opus 4.7 at 3.2.
9 models

Related tools

1
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

FAQ

What is Autonomous Skill Evolution?
Self-Improving Agent Environment: create, validate, refine, compose, and evolve reusable skills from execution traces. 40 tasks, 5 tiers, 10+ domai...
What is the current top score on Autonomous Skill Evolution?
The top reported score is 3.21 by Claude Opus 4.7, across 9 models reporting (6 from frontier labs).
How can a model improve its Autonomous Skill Evolution score?
Tools linked to Autonomous Skill Evolution on Sophon include Skill Evolution RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.
What license is Autonomous Skill Evolution under?
Autonomous Skill Evolution is available under unknown.