MT-Bench
80 two-turn open-ended questions across 8 categories, graded by GPT-4 as judge to score multi-turn dialogue quality.
- Publisher
- LMArena
- Format
- Custom
- Size
- 80 tasks
- License
- Apache-2.0
- Published
- Jun 2023
- Notable for
- Benchmark for evaluating multi turn dialog, instruction following and llm judging.
Cite
Notes
Only stored in your browser.
Related tools
11Implementations, trainers, datasets and scaffolds linked to this eval.
Papers
2Contributors
2FAQ
- What is MT-Bench?
- 80 two-turn open-ended questions across 8 categories, graded by GPT-4 as judge to score multi-turn dialogue quality.
- What capabilities does MT-Bench test?
- MT-Bench evaluates multi turn dialog, instruction following, llm judging.
- How can a model improve its MT-Bench score?
- Tools linked to MT-Bench on Sophon include Argilla distilabel Capybara-DPO, Capybara, HelpSteer2, Magpie - RL environments, datasets, and scaffolds that target this eval.
- What license is MT-Bench under?
- MT-Bench is available under Apache-2.0.