MMLU Pro
Fresh
MMLU-Pro dataset is a more robust and challenging massive multi-task understanding dataset tailored to more rigorously benchmark large language models' capabilities. This dataset contains 12K complex questions across various disciplines.
- Type
- RL Env
- Publisher
- General Reasoning
- Runtime
ORS- License
- unknown
- Size
- 12102 tasks
- Published
- Jan 2026
Cite
Notes
Only stored in your browser.
Public scores on this env
44 vf-eval reports across 4 models
1Qwen3.6 PlusAlibaba88.52Qwen3.5 397B A17BAlibaba87.83Kimi K2.5Moonshot AI87.14Qwen 3 Coder NextAlibaba80.52
Open the scoring view →