Terminal-Bench (Hard)

Frontier

Hard subset of Terminal-Bench - long-horizon shell + filesystem tasks.

Publisher: Laude Institute
Published: May 2026
Canonical: tbench.ai

Cite

Notes

Only stored in your browser.

Attribution

Leaderboard scores: AA

Attribution policy →

Top score 65.9% by GPT-5.6 Sol - 332 models reporting (70 frontier)

Score history

332

Top models

332

Terminal-Bench (Hard)Bar chart with 21 bars. Highest value: GPT-5.6 Sol at 65.9.

21 models

FAQ

What is Terminal-Bench (Hard)?: Hard subset of Terminal-Bench - long-horizon shell + filesystem tasks.
What is the current top score on Terminal-Bench (Hard)?: The top reported score is 65.9% by GPT-5.6 Sol, across 332 models reporting (70 from frontier labs).