0

Terminal-Bench (Verifiers wrapper)

Terminal-Bench tasks wrapped as a Verifiers RL environment - model drives a tmux shell to complete realistic end-to-end terminal jobs (compiling, sysadmin, data science).

Type
RL Env
Publisher
Harbor
Runtime
verifiers
License
MIT
Size
~100 hand-crafted tasks (Terminal-Bench 1.0) / 200+ in 2.0
Published
Jan 2025

Cite

Notes

Only stored in your browser.

Lift evidence

1
EvalTools known to liftSource paper
Terminal-BenchTerminal-Bench (Verifiers wrapper)-

Models

Compatible

any

Papers

1

Contributors

1