Terminal-Bench: A Benchmark for Real-World Terminal-Based Agents
Laude Institute + Stanford benchmark of containerized terminal tasks for evaluating shell-based coding agents - released as a public leaderboard + paper.
- Publisher
- Laude Institute
- Year
- 2025
- Venue
- blog
- URL
- tbench.ai
- Stars
- 2.3k
- Authors
- 2
- Hosting
- External sourcelicense unknown
Cite
Notes
Only stored in your browser.
Introduces 2 artifacts - 1 eval, 1 tool