τ²-bench (Tau²-bench)
Saturated
Sierra's dual-control extension of τ-bench - now the user is also an LLM and both agents share access to the same tool-driven environment.
- Publisher
- Sierra
- Capabilities
- Tool CallingMulti Turn DialogPlanning
- Domain
- agentic
- Format
- Custom
- Size
- 220 tasks
- License
- MIT
- Published
- Jun 2025
- Notable for
- Benchmark for evaluating tool calling, multi turn dialog and planning in the agentic domain.
Cite
Notes
Only stored in your browser.
Top score 99.1% by JT-35B-Flash - 320 models reporting (60 frontier)
Score history
319Top models
320Related tools
5Implementations, trainers, datasets and scaffolds linked to this eval.
FAQ
- What is τ²-bench (Tau²-bench)?
- Sierra's dual-control extension of τ-bench - now the user is also an LLM and both agents share access to the same tool-driven environment.
- What capabilities does τ²-bench (Tau²-bench) test?
- τ²-bench (Tau²-bench) evaluates tool calling, multi turn dialog, planning.
- What is the current top score on τ²-bench (Tau²-bench)?
- The top reported score is 99.1% by JT-35B-Flash, across 320 models reporting (60 from frontier labs).
- How can a model improve its τ²-bench (Tau²-bench) score?
- Tools linked to τ²-bench (Tau²-bench) on Sophon include TAU 2 Bench RL Env (Community), TAU 2 Bench RL Env (Prime Intellect), TAU 2 Bench RL Env (Community), TAU 2 Synth RL Env (Prime) - RL environments, datasets, and scaffolds that target this eval.
- What license is τ²-bench (Tau²-bench) under?
- τ²-bench (Tau²-bench) is available under MIT.



