0

MT-Bench

80 two-turn open-ended questions across 8 categories, graded by GPT-4 as judge to score multi-turn dialogue quality.

Publisher
LMArena
Format
Custom
Size
80 tasks
License
Apache-2.0
Published
Jun 2023
Notable for
Benchmark for evaluating multi turn dialog, instruction following and llm judging.

Cite

Notes

Only stored in your browser.

Related tools

11
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

Papers

2

Contributors

2

FAQ

What is MT-Bench?
80 two-turn open-ended questions across 8 categories, graded by GPT-4 as judge to score multi-turn dialogue quality.
What capabilities does MT-Bench test?
MT-Bench evaluates multi turn dialog, instruction following, llm judging.
How can a model improve its MT-Bench score?
Tools linked to MT-Bench on Sophon include Argilla distilabel Capybara-DPO, Capybara, HelpSteer2, Magpie - RL environments, datasets, and scaffolds that target this eval.
What license is MT-Bench under?
MT-Bench is available under Apache-2.0.