DIFF Bench RL Env (Community)
Fresh
Benchmark for evaluating agents on Slack, Linear, Box, Calendar via Bash & Python
- Type
- RL Env
- Runtime
multi-turn- License
- unknown
- Size
- v0.1.16
- Published
- Feb 2026
Cite
Notes
Only stored in your browser.
Public scores on this env
66 vf-eval reports across 6 models
1DeepSeek V3.2DeepSeek74.7%2Grok 4.1 FastxAI62.0%3Step 1145.1%4Step 739.3%5Ministral 3 14BMistral AI32.2%6Step 1930.3%
Open the scoring view →