MMLU Pro

Fresh

MMLU-Pro dataset is a more robust and challenging massive multi-task understanding dataset tailored to more rigorously benchmark large language models' capabilities. This dataset contains 12K complex questions across various disciplines.

Type: RL Env
Publisher: General Reasoning
Tags: Question Answering
Runtime: ORS
License: unknown
Size: 12102 tasks
Published: Jan 2026
Canonical: openreward.ai/GeneralReasoning/MMLU-Pro

Cite

Notes

Only stored in your browser.

Attribution

README: openreward.ai/GeneralReasoning/MMLU-Pro
Scores: OpenReward

Attribution policy →

Public scores on this env

4 vf-eval reports across 4 models

1Qwen3.6 PlusAlibaba88.5 2Qwen3.5 397B A17BAlibaba87.8 3Kimi K2.5Moonshot AI87.1 4Qwen 3 Coder NextAlibaba80.52

Open the scoring view →