MMLU: Measuring Massive Multitask Language Understanding
Active
Evaluate models on 57 tasks including elementary mathematics, US history, computer science, law, and more.
- Publisher
- University of California, Berkeley
- Domain
- Knowledge
- License
- mit
- Published
- May 2026
- Notable for
- Benchmark for evaluating Knowledge.
Cite
Notes
Only stored in your browser.
Related tools
3Implementations, trainers, datasets and scaffolds linked to this eval.
Papers
1FAQ
- What is MMLU: Measuring Massive Multitask Language Understanding?
- Evaluate models on 57 tasks including elementary mathematics, US history, computer science, law, and more.
- How can a model improve its MMLU: Measuring Massive Multitask Language Understanding score?
- Tools linked to MMLU: Measuring Massive Multitask Language Understanding on Sophon include MMLU RL Env (Prime Community), MMLU RL Env (Community), Openmed Medknowledge RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.
- What license is MMLU: Measuring Massive Multitask Language Understanding under?
- MMLU: Measuring Massive Multitask Language Understanding is available under mit.