DIFF Bench RL Env (Community)

Fresh

Benchmark for evaluating agents on Slack, Linear, Box, Calendar via Bash & Python

Type: RL Env
Tags: API Tool Use Coding Multi Service
Runtime: multi-turn
License: unknown
Size: v0.1.16
Published: Feb 2026
Canonical: app.primeintellect.ai/dashboard/environments/hubert-marek/agent-diff-bench

Cite

Notes

Only stored in your browser.

Attribution

README: api.primeintellect.ai/api/v1/environmentshub/hubert-marek/agent-diff-bench/@0.1.16/inspect
Scores: prime-hub

Attribution policy →

Public scores on this env

6

6 vf-eval reports across 6 models

1DeepSeek V3.2DeepSeek74.7%2Grok 4.1 FastxAI62.0%3Step 1145.1%4Step 739.3%5Ministral 3 14BMistral AI32.2%6Step 1930.3%

Open the scoring view →

Contributors

1