MMLU-Pro: Advanced Multitask Knowledge and Reasoning Evaluation
Active
An advanced benchmark that tests both broad knowledge and reasoning capabilities across many subjects, featuring challenging questions and multiple-choice answers with increased difficulty and complexity.
- Publisher
- TIGER-Lab
- Domain
- Knowledge
- License
- mit
- Published
- Oct 2024
- Notable for
- Benchmark for evaluating Knowledge.
Cite
Notes
Only stored in your browser.
Related tools
1Implementations, trainers, datasets and scaffolds linked to this eval.
Papers
1FAQ
- What is MMLU-Pro: Advanced Multitask Knowledge and Reasoning Evaluation?
- An advanced benchmark that tests both broad knowledge and reasoning capabilities across many subjects, featuring challenging questions and multiple-choice answers with increased difficulty and complexity.
- How can a model improve its MMLU-Pro: Advanced Multitask Knowledge and Reasoning Evaluation score?
- Tools linked to MMLU-Pro: Advanced Multitask Knowledge and Reasoning Evaluation on Sophon include MMLU PRO RL Env (Prime Intellect) - RL environments, datasets, and scaffolds that target this eval.
- What license is MMLU-Pro: Advanced Multitask Knowledge and Reasoning Evaluation under?
- MMLU-Pro: Advanced Multitask Knowledge and Reasoning Evaluation is available under mit.