0

Terminal-Bench: A Benchmark for Real-World Terminal-Based Agents

Laude Institute + Stanford benchmark of containerized terminal tasks for evaluating shell-based coding agents - released as a public leaderboard + paper.

Year
2025
Venue
blog
Stars
2.3k
Authors
2
Hosting
External sourcelicense unknown

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
tbench.ai
TL;DR
Semantic Scholar
Attribution policy →

Introduces 2 artifacts - 1 eval, 1 tool

Artifacts

2

Topics

2

Authors

2