MMLU
Fresh
Massive Multitask Language Understanding (MMLU) is a popular benchmark for evaluating the capabilities of large language models. It inspired several other versions and spin-offs, such as MMLU-Pro, MMMLU and MMLU-Redux.
- Type
- RL Env
- Publisher
- General Reasoning
- Runtime
ORS- License
- unknown
- Size
- 115700 tasks
- Published
- Jan 2026
- Canonical
- openreward.ai/GeneralReasoning/MMLU
Cite
Notes
Only stored in your browser.
Public scores on this env
11 vf-eval report across 1 model