Terminal-Bench (Verifiers wrapper)
Terminal-Bench tasks wrapped as a Verifiers RL environment - model drives a tmux shell to complete realistic end-to-end terminal jobs (compiling, sysadmin, data science).
- Type
- RL Env
- Publisher
- Harbor
- Capabilities
- Code EditingPlanningTool Calling
- Runtime
verifiers- License
- MIT
- Size
- ~100 hand-crafted tasks (Terminal-Bench 1.0) / 200+ in 2.0
- Published
- Jan 2025
Cite
Notes
Only stored in your browser.
Lift evidence
1| Eval | Tools known to lift | Source paper |
|---|---|---|
| Terminal-Bench | Terminal-Bench (Verifiers wrapper) | - |
Models
Compatible
any