0

HCAST

Active

METR's Human-Calibrated Autonomy Software Tasks - 189 multi-step software tasks calibrated against human time-to-complete, used to measure agent task length capability.

Domain
agentic
Format
Custom
Size
189 tasks
License
MIT
Published
Mar 2025
Notable for
Benchmark for evaluating code generation, planning and tool calling in the agentic domain.

Cite

Notes

Only stored in your browser.

Papers

2

Contributors

3

FAQ

What is HCAST?
METR's Human-Calibrated Autonomy Software Tasks - 189 multi-step software tasks calibrated against human time-to-complete, used to measure agent task length capability.
What capabilities does HCAST test?
HCAST evaluates code generation, planning, tool calling, debugging.
What license is HCAST under?
HCAST is available under MIT.