0

Backend Bench

Environment to evaluate LLMs on the ability to generate correct and fast GPU kernels, passing tests provided by `Torch`

Domain
rl-env
License
unknown
Published
Sep 2025

Cite

Notes

Only stored in your browser.

Attribution

Leaderboard scores
prime-hub
Attribution policy →

Top score 5.0% by Claude Sonnet 4.5 - 1 model reporting (1 frontier)

Top models

1
Backend BenchBar chart with 1 bar. Highest value: Claude Sonnet 4.5 at 5.
1 model

Related tools

1
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

FAQ

What is Backend Bench?
Environment to evaluate LLMs on the ability to generate correct and fast GPU kernels, passing tests provided by `Torch`
What is the current top score on Backend Bench?
The top reported score is 5.0% by Claude Sonnet 4.5, across 1 model reporting (1 from frontier labs).
How can a model improve its Backend Bench score?
Tools linked to Backend Bench on Sophon include Backend Bench RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.
What license is Backend Bench under?
Backend Bench is available under unknown.