Terminal-Bench (Hard)
Frontier
Hard subset of Terminal-Bench - long-horizon shell + filesystem tasks.
- Publisher
- Laude Institute
- Published
- May 2026
- Canonical
- tbench.ai
Cite
Notes
Only stored in your browser.
Top score 58.3% by Claude Opus 4.8 - 311 models reporting (60 frontier)
Score history
311Top models
311FAQ
- What is Terminal-Bench (Hard)?
- Hard subset of Terminal-Bench - long-horizon shell + filesystem tasks.
- What is the current top score on Terminal-Bench (Hard)?
- The top reported score is 58.3% by Claude Opus 4.8, across 311 models reporting (60 from frontier labs).

