0

MMLU

Fresh

Massive Multitask Language Understanding (MMLU) is a popular benchmark for evaluating the capabilities of large language models. It inspired several other versions and spin-offs, such as MMLU-Pro, MMMLU and MMLU-Redux.

Type
RL Env
Runtime
ORS
License
unknown
Size
115700 tasks
Published
Jan 2026

Cite

Notes

Only stored in your browser.

Public scores on this env

1

1 vf-eval report across 1 model

Open the scoring view →