Massive Multitask Language Understanding (MMLU)
57-subject multiple-choice exam testing broad world knowledge and reasoning across academic and professional domains.
- Publisher
- University of California, Berkeley
- Capabilities
- Factual RecallScientific Reasoning
- Format
- HF Dataset
- Size
- 15908 tasks
- License
- MIT
- Published
- Sep 2020
- Notable for
- Benchmark for evaluating factual recall and scientific reasoning.
- Canonical
- github.com/hendrycks/test
Cite
Notes
Only stored in your browser.
Top score 83.1 by Tülu 3 70B - 2 models reporting (1 frontier)
Score history
2Top models
2Where it's ranked
2Related tools
10Implementations, trainers, datasets and scaffolds linked to this eval.
Papers
2Contributors
1FAQ
- What is Massive Multitask Language Understanding (MMLU)?
- 57-subject multiple-choice exam testing broad world knowledge and reasoning across academic and professional domains.
- What capabilities does Massive Multitask Language Understanding (MMLU) test?
- Massive Multitask Language Understanding (MMLU) evaluates factual recall, scientific reasoning.
- What is the current top score on Massive Multitask Language Understanding (MMLU)?
- The top reported score is 83.1 by Tülu 3 70B, across 2 models reporting (1 from frontier labs).
- How can a model improve its Massive Multitask Language Understanding (MMLU) score?
- Tools linked to Massive Multitask Language Understanding (MMLU) on Sophon include MMLU RL Env (Prime Community), MMLU RL Env (Community), VF Openbench RL Env (Community), Openmed Medknowledge RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.
- What license is Massive Multitask Language Understanding (MMLU) under?
- Massive Multitask Language Understanding (MMLU) is available under MIT.
