Automationbench RL Env (Zapier)
Fresh
Evaluates AI agents on realistic, multi-step business workflows across 47 simulated SaaS tools.
- Type
- RL Env
- Publisher
- Zapier
- Runtime
tool-use- License
- unknown
- Size
- v1.0.0
- Published
- Apr 2026
Cite
Notes
Only stored in your browser.
Public scores on this env
1212 vf-eval reports across 12 models
1Claude Opus 4.6Anthropic56.1%2Claude Sonnet 4.6Anthropic47.6%3MiMo-V2.5-ProXiaomi39.4%4GPT-5.4 MiniOpenAI37.0%5GLM 5.1Zai35.9%6Gemini 3 Flash PreviewGoogle (Alphabet Inc.)32.7%7MiMo-V2.5Xiaomi32.5%8GPT-5.4 NanoOpenAI26.2%9Nemotron 3 Super 120B A12BNVIDIA20.7%10GPT-5.4OpenAI13.1%11Gemini 3.1 Pro PreviewGoogle (Alphabet Inc.)12.3%12MiniMax M2.7Minimax5.7%
Open the scoring view →