Question 1

What is AutomationBench?

Accepted Answer

Evaluates AI agents on realistic, multi-step business workflows across 47 simulated SaaS tools.

Question 2

What is the current top score on AutomationBench?

Accepted Answer

The top reported score is 56.1% by Claude Opus 4.6, across 15 models reporting (8 from frontier labs).

Question 3

How can a model improve its AutomationBench score?

Accepted Answer

Tools linked to AutomationBench on Sophon include Automationbench RL Env (Zapier) - RL environments, datasets, and scaffolds that target this eval.

Question 4

What license is AutomationBench under?

Accepted Answer

AutomationBench is available under unknown.

AutomationBench

Score history