τ-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains
Sierra benchmark that puts an agent between a simulated user and a customer-service API (retail / airline), measuring multi-turn tool-using policy adherence.
- Publisher
- Sierra
- Year
- 2024
- Venue
- preprint
- Authors
- 4
- Hosting
- External sourcelicense unknown
Cite
Notes
Only stored in your browser.
Introduces 1 artifact - 1 eval
Artifacts
1Evals