MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
TIGER-Lab benchmark that upgrades MMLU with harder reasoning-heavy questions, 10 answer choices, and de-noised options for a higher ceiling.
- Publisher
- TIGER-Lab
- Year
- 2024
- Venue
- NeurIPS
- Authors
- 17
- Hosting
- External sourcelicense unknown
Cite
Notes
Only stored in your browser.
Introduces 1 artifact - 1 eval
TL;DR
Semantic Scholar
MMLU-Pro is introduced, an enhanced dataset designed to extend the mostly knowledge-driven MMLU benchmark by integrating more challenging, reasoning-focused questions and expanding the choice set from four to ten options, indicating that MMLU-Pro includes more complex reasoning questions.
Artifacts
1Evals