0

MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

TIGER-Lab benchmark that upgrades MMLU with harder reasoning-heavy questions, 10 answer choices, and de-noised options for a higher ceiling.

Publisher
TIGER-Lab
Year
2024
Venue
NeurIPS
Authors
17
Hosting
External sourcelicense unknown

Cite

Notes

Only stored in your browser.

Introduces 1 artifact - 1 eval

TL;DR

Semantic Scholar

MMLU-Pro is introduced, an enhanced dataset designed to extend the mostly knowledge-driven MMLU benchmark by integrating more challenging, reasoning-focused questions and expanding the choice set from four to ten options, indicating that MMLU-Pro includes more complex reasoning questions.

Artifacts

1

Authors

17