0

MMLU-Pro

Frontier

Harder, reasoning-focused successor to MMLU with 10 answer choices and curated questions resistant to lucky guessing.

Publisher
TIGER-Lab
Format
HF Dataset
Size
12032 tasks
License
MIT
Published
Jun 2024
Notable for
Benchmark for evaluating factual recall and scientific reasoning.

Cite

Notes

Only stored in your browser.

Attribution

Leaderboard scores
AAOpenLLMprime-hub
Attribution policy →

Top score 89.8% by Gemini 3 Pro - 342 models reporting (65 frontier)

Sample tasks

5

from the eval dataset

business

Typical advertising regulatory bodies suggest, for example that adverts must not: encourage _________, cause unnecessary ________ or _____, and must not cause _______ offence.

  • ASafe practices, Fear, Jealousy, Trivial
  • BUnsafe practices, Distress, Joy, Trivial
  • CSafe practices, Wants, Jealousy, Trivial
  • DSafe practices, Distress, Fear, Trivial
  • EUnsafe practices, Wants, Jealousy, Serious
  • FSafe practices, Distress, Jealousy, Serious
  • GSafe practices, Wants, Fear, Serious
  • HUnsafe practices, Wants, Fear, Trivial
  • IUnsafe practices, Distress, Fear, Serious

from original MMLU · business ethics

Show 4 more examples
business

Managers are entrusted to run the company in the best interest of ________. Specifically, they have a duty to act for the benefit of the company, as well as a duty of ________ and of _______.

  • AShareholders, Diligence, Self-interest
  • BShareholders, Self-interest, Care and Skill
  • CStakeholders, Care and skill, Self-interest
  • DStakeholders, Diligence, Care and Skill
  • ECustomers, Care and Skill, Diligence
  • FShareholders, Care and Skill, Diligence
  • GShareholders, Self-interest, Diligence
  • HEmployees, Care and Skill, Diligence
  • IStakeholders, Self-interest, Diligence
  • JStakeholder, Care and Skill, Diligence

from original MMLU · business ethics

business

There are two main issues associated with _____ sizing. _______ is a key issue as due to the information policy of the corporation it can be argued that employees have a right to know if they are being made redundant. _______ is a second issue, particularly the ________ package that employees receive when laid off.

  • ADown, Autonomy, Remuneration, Benefit
  • BDown, Involvement, Independence, Benefit
  • CUp, Independence, Involvement, Benefit
  • DDown, Privacy, Autonomy, Benefit
  • EUp, Involvement, Autonomy, Compensation
  • FDown, Independence, Autonomy, Compensation
  • GUp, Involvement, Remuneration, Severance
  • HUp, Privacy, Remuneration, Severance
  • IUp, Autonomy, Remuneration, Compensation
  • JDown, Involvement, Remuneration, Compensation

from original MMLU · business ethics

business

_______ locate morality beyond the sphere of rationality in an emotional 'moral impulse' towards others.

  • AEthical egoism
  • BEthics of duty
  • CPostmodern ethics
  • DConsequentialist ethics
  • EUtilitarian ethics
  • FDeontological ethics
  • GVirtue ethics
  • HEthics of care
  • IEthics of rights
  • JRelativist ethics

from original MMLU · business ethics

business

Some of key differences between Islamic finance and conventional finance include - prohibition of charging and paying _______, prohibition on ______ and ______ transactions, prohibition of sinful investment and requirement for all financial products to be backed by __________.

  • AInterest, Certain, Assured, Both tangible and intangible assets
  • BInterest, Uncertain, Assured, Both tangible and intangible assets
  • CInterest, Uncertain, Speculative, Intangible assets
  • DInterest, Certain, Assured, Tangible assets
  • EInterest, Uncertain, Assured, Intangible assets
  • FProfit, Uncertain, Speculative, Tangible assets
  • GInterest, Uncertain, Speculative, Tangible assets
  • HInterest, Certain, Speculative, Intangible assets
  • IProfit, Certain, Assured, Tangible assets
  • JInterest, Certain, Speculative, Both tangible and intangible assets

from original MMLU · business ethics

Score history

306
0%25%50%75%100%Nov 22Aug 23May 24Feb 25Nov 25GPT-3.5 TurboClaude 2.0GPT-4 TurboClaude Opus 3Claude 3.5 Sonnet (June '24)Claude 3.5 Sonnet (Oct '24)Gemini 2.5 Pro Preview (Mar' 25)Grok 4Gemini 3 Pro

Top models

342
MMLU-ProBar chart with 21 bars. Highest value: Gemini 3 Pro at 89.8.
21 models

Where it's ranked

2

Related tools

5
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

Papers

2

Contributors

2

FAQ

What is MMLU-Pro?
Harder, reasoning-focused successor to MMLU with 10 answer choices and curated questions resistant to lucky guessing.
What capabilities does MMLU-Pro test?
MMLU-Pro evaluates factual recall, scientific reasoning.
What is the current top score on MMLU-Pro?
The top reported score is 89.8% by Gemini 3 Pro, across 342 models reporting (65 from frontier labs).
How can a model improve its MMLU-Pro score?
Tools linked to MMLU-Pro on Sophon include MMLU PRO RL Env (Community), MMLU PRO RL Env (Prime Intellect), M ARC RL Env (Medarc), PRO Health RL Env (Medarc) - RL environments, datasets, and scaffolds that target this eval.
What license is MMLU-Pro under?
MMLU-Pro is available under MIT.