AutomationBench
Frontier
Evaluates AI agents on realistic, multi-step business workflows across 47 simulated SaaS tools.
- Domain
- rl-env
- License
- unknown
- Published
- Apr 2026
Cite
Notes
Only stored in your browser.
Top score 56.1% by Claude Opus 4.6 - 12 models reporting (5 frontier)
Score history
11Top models
12Related tools
1Implementations, trainers, datasets and scaffolds linked to this eval.
FAQ
- What is AutomationBench?
- Evaluates AI agents on realistic, multi-step business workflows across 47 simulated SaaS tools.
- What is the current top score on AutomationBench?
- The top reported score is 56.1% by Claude Opus 4.6, across 12 models reporting (5 from frontier labs).
- How can a model improve its AutomationBench score?
- Tools linked to AutomationBench on Sophon include Automationbench RL Env (Zapier) - RL environments, datasets, and scaffolds that target this eval.
- What license is AutomationBench under?
- AutomationBench is available under unknown.