0

Massive Multitask Language Understanding (MMLU)

57-subject multiple-choice exam testing broad world knowledge and reasoning across academic and professional domains.

Format
HF Dataset
Size
15908 tasks
License
MIT
Published
Sep 2020
Notable for
Benchmark for evaluating factual recall and scientific reasoning.

Cite

Notes

Only stored in your browser.

Attribution

Leaderboard scores
Vaultprime-hub
Attribution policy →

Top score 83.1 by Tülu 3 70B - 2 models reporting (1 frontier)

Score history

2
0255075100Nov 24Jan 25Mar 25Tülu 3 70B

Top models

2
Massive Multitask Language Understanding (MMLU)Bar chart with 2 bars. Highest value: Tülu 3 70B at 83.1.
2 models

Where it's ranked

2

Related tools

10
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

Papers

2

Contributors

1

FAQ

What is Massive Multitask Language Understanding (MMLU)?
57-subject multiple-choice exam testing broad world knowledge and reasoning across academic and professional domains.
What capabilities does Massive Multitask Language Understanding (MMLU) test?
Massive Multitask Language Understanding (MMLU) evaluates factual recall, scientific reasoning.
What is the current top score on Massive Multitask Language Understanding (MMLU)?
The top reported score is 83.1 by Tülu 3 70B, across 2 models reporting (1 from frontier labs).
How can a model improve its Massive Multitask Language Understanding (MMLU) score?
Tools linked to Massive Multitask Language Understanding (MMLU) on Sophon include MMLU RL Env (Prime Community), MMLU RL Env (Community), VF Openbench RL Env (Community), Openmed Medknowledge RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.
What license is Massive Multitask Language Understanding (MMLU) under?
Massive Multitask Language Understanding (MMLU) is available under MIT.