HCAST
Active
METR's Human-Calibrated Autonomy Software Tasks - 189 multi-step software tasks calibrated against human time-to-complete, used to measure agent task length capability.
- Capabilities
- Code GenerationPlanningTool CallingDebugging
- Domain
- agentic
- Format
- Custom
- Size
- 189 tasks
- License
- MIT
- Published
- Mar 2025
- Notable for
- Benchmark for evaluating code generation, planning and tool calling in the agentic domain.
- Also on
Cite
Notes
Only stored in your browser.
Papers
2Contributors
3FAQ
- What is HCAST?
- METR's Human-Calibrated Autonomy Software Tasks - 189 multi-step software tasks calibrated against human time-to-complete, used to measure agent task length capability.
- What capabilities does HCAST test?
- HCAST evaluates code generation, planning, tool calling, debugging.
- What license is HCAST under?
- HCAST is available under MIT.