0

MMLU-Pro: Advanced Multitask Knowledge and Reasoning Evaluation

Active

An advanced benchmark that tests both broad knowledge and reasoning capabilities across many subjects, featuring challenging questions and multiple-choice answers with increased difficulty and complexity.

Publisher
TIGER-Lab
Domain
Knowledge
License
mit
Published
Oct 2024
Notable for
Benchmark for evaluating Knowledge.

Cite

Notes

Only stored in your browser.

Related tools

1
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

Papers

1

FAQ

What is MMLU-Pro: Advanced Multitask Knowledge and Reasoning Evaluation?
An advanced benchmark that tests both broad knowledge and reasoning capabilities across many subjects, featuring challenging questions and multiple-choice answers with increased difficulty and complexity.
How can a model improve its MMLU-Pro: Advanced Multitask Knowledge and Reasoning Evaluation score?
Tools linked to MMLU-Pro: Advanced Multitask Knowledge and Reasoning Evaluation on Sophon include MMLU PRO RL Env (Prime Intellect) - RL environments, datasets, and scaffolds that target this eval.
What license is MMLU-Pro: Advanced Multitask Knowledge and Reasoning Evaluation under?
MMLU-Pro: Advanced Multitask Knowledge and Reasoning Evaluation is available under mit.