MMLU-Pro
Harder, reasoning-focused successor to MMLU with 10 answer choices and curated questions resistant to lucky guessing.
- Publisher
- TIGER-Lab
- Capabilities
- Factual RecallScientific Reasoning
- Format
- HF Dataset
- Size
- 12032 tasks
- License
- MIT
- Published
- Jun 2024
- Notable for
- Benchmark for evaluating factual recall and scientific reasoning.
Cite
Notes
Only stored in your browser.
Top score 89.8% by Gemini 3 Pro - 342 models reporting (65 frontier)
Sample tasks
5from the eval dataset
Typical advertising regulatory bodies suggest, for example that adverts must not: encourage _________, cause unnecessary ________ or _____, and must not cause _______ offence.
- ASafe practices, Fear, Jealousy, Trivial
- BUnsafe practices, Distress, Joy, Trivial
- CSafe practices, Wants, Jealousy, Trivial
- DSafe practices, Distress, Fear, Trivial
- EUnsafe practices, Wants, Jealousy, Serious
- FSafe practices, Distress, Jealousy, Serious
- GSafe practices, Wants, Fear, Serious
- HUnsafe practices, Wants, Fear, Trivial
- IUnsafe practices, Distress, Fear, Serious
from original MMLU · business ethics
Show 4 more examples
Managers are entrusted to run the company in the best interest of ________. Specifically, they have a duty to act for the benefit of the company, as well as a duty of ________ and of _______.
- AShareholders, Diligence, Self-interest
- BShareholders, Self-interest, Care and Skill
- CStakeholders, Care and skill, Self-interest
- DStakeholders, Diligence, Care and Skill
- ECustomers, Care and Skill, Diligence
- FShareholders, Care and Skill, Diligence
- GShareholders, Self-interest, Diligence
- HEmployees, Care and Skill, Diligence
- IStakeholders, Self-interest, Diligence
- JStakeholder, Care and Skill, Diligence
from original MMLU · business ethics
There are two main issues associated with _____ sizing. _______ is a key issue as due to the information policy of the corporation it can be argued that employees have a right to know if they are being made redundant. _______ is a second issue, particularly the ________ package that employees receive when laid off.
- ADown, Autonomy, Remuneration, Benefit
- BDown, Involvement, Independence, Benefit
- CUp, Independence, Involvement, Benefit
- DDown, Privacy, Autonomy, Benefit
- EUp, Involvement, Autonomy, Compensation
- FDown, Independence, Autonomy, Compensation
- GUp, Involvement, Remuneration, Severance
- HUp, Privacy, Remuneration, Severance
- IUp, Autonomy, Remuneration, Compensation
- JDown, Involvement, Remuneration, Compensation
from original MMLU · business ethics
_______ locate morality beyond the sphere of rationality in an emotional 'moral impulse' towards others.
- AEthical egoism
- BEthics of duty
- CPostmodern ethics
- DConsequentialist ethics
- EUtilitarian ethics
- FDeontological ethics
- GVirtue ethics
- HEthics of care
- IEthics of rights
- JRelativist ethics
from original MMLU · business ethics
Some of key differences between Islamic finance and conventional finance include - prohibition of charging and paying _______, prohibition on ______ and ______ transactions, prohibition of sinful investment and requirement for all financial products to be backed by __________.
- AInterest, Certain, Assured, Both tangible and intangible assets
- BInterest, Uncertain, Assured, Both tangible and intangible assets
- CInterest, Uncertain, Speculative, Intangible assets
- DInterest, Certain, Assured, Tangible assets
- EInterest, Uncertain, Assured, Intangible assets
- FProfit, Uncertain, Speculative, Tangible assets
- GInterest, Uncertain, Speculative, Tangible assets
- HInterest, Certain, Speculative, Intangible assets
- IProfit, Certain, Assured, Tangible assets
- JInterest, Certain, Speculative, Both tangible and intangible assets
from original MMLU · business ethics
Score history
306Top models
342Where it's ranked
2Related tools
5Implementations, trainers, datasets and scaffolds linked to this eval.
Papers
2Contributors
2FAQ
- What is MMLU-Pro?
- Harder, reasoning-focused successor to MMLU with 10 answer choices and curated questions resistant to lucky guessing.
- What capabilities does MMLU-Pro test?
- MMLU-Pro evaluates factual recall, scientific reasoning.
- What is the current top score on MMLU-Pro?
- The top reported score is 89.8% by Gemini 3 Pro, across 342 models reporting (65 from frontier labs).
- How can a model improve its MMLU-Pro score?
- Tools linked to MMLU-Pro on Sophon include MMLU PRO RL Env (Community), MMLU PRO RL Env (Prime Intellect), M ARC RL Env (Medarc), PRO Health RL Env (Medarc) - RL environments, datasets, and scaffolds that target this eval.
- What license is MMLU-Pro under?
- MMLU-Pro is available under MIT.