Taubench

Fresh

tau-bench challenges agents to coordinate, guide, and assist users in achieving shared objectives across complex enterprise domains.

Type: RL Env
Publisher: General Reasoning
Tags: Enterprise Agentic System Evaluation
Runtime: ORS
License: unknown
Size: 165 tasks
Published: Jan 2026
Canonical: openreward.ai/GeneralReasoning/taubench

Cite

Notes

Only stored in your browser.

Attribution

README: openreward.ai/GeneralReasoning/taubench
Scores: OpenReward

Attribution policy →

Public scores on this env

12

40 vf-eval reports across 12 models

1GPT-4oOpenAI61.2 2GPT-4 TurboOpenAI57.7 3GPT 4 32kOpenAI56.5 4Claude Opus 3Anthropic44.2 5Mistral LargeMistral AI30.7 6Claude Sonnet 3Anthropic26.3 7Gemini 1.5 Pro (Sep '24)Google (Alphabet Inc.)21.7 8GPT-3.5 TurboOpenAI20 9Claude 3 HaikuAnthropic19 10Mixtral 8X22BMistral AI17.7 11Gemini 1.5 Flash (Sep '24)Google (Alphabet Inc.)17.4 12meta-llama-3-70BMeta Platforms14.8

Open the scoring view →