0

MMLU: Measuring Massive Multitask Language Understanding

Active

Evaluate models on 57 tasks including elementary mathematics, US history, computer science, law, and more.

Domain
Knowledge
License
mit
Published
May 2026
Notable for
Benchmark for evaluating Knowledge.

Cite

Notes

Only stored in your browser.

Related tools

3
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

Papers

1

FAQ

What is MMLU: Measuring Massive Multitask Language Understanding?
Evaluate models on 57 tasks including elementary mathematics, US history, computer science, law, and more.
How can a model improve its MMLU: Measuring Massive Multitask Language Understanding score?
Tools linked to MMLU: Measuring Massive Multitask Language Understanding on Sophon include MMLU RL Env (Prime Community), MMLU RL Env (Community), Openmed Medknowledge RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.
What license is MMLU: Measuring Massive Multitask Language Understanding under?
MMLU: Measuring Massive Multitask Language Understanding is available under mit.