How can a model improve its Humanity's Last Exam score?

Tools linked to Humanity's Last Exam on Sophon include HLE RL Env (Prime Intellect), WEB PY RL Env (Prime Community), WEB PY RL Env (Prime Intellect) - RL environments, datasets, and scaffolds that target this eval.

What license is Humanity's Last Exam under?

Humanity's Last Exam is available under mit.

Humanity's Last Exam

Active

Humanity's Last Exam (HLE) is a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. Humanity's Last Exam consists of 3,000 questions across dozens of subjects, including mathem

Open

Publisher: Center for AI Safety (CAIS)
Domain: Knowledge
License: mit
Published: Feb 2025
Notable for: Benchmark for evaluating Knowledge.
Canonical: github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/hle

Cite

Notes

Only stored in your browser.

Attribution

README: github.com/UKGovernmentBEIS/inspect_evals/blob/main/src/inspect_evals/hle/README.mdMIT

Attribution policy →

Related tools

View all

Implementations, trainers, datasets and scaffolds linked to this eval.

HLE RL Env (Prime Intellect)

Prime Intellect

Humanity's Last Exam evaluation environment

ImplementationRL EnvMulti ModalTool UseAcademic

WEB PY RL Env (Prime Community)

Prime Community

Humanity's Last Examination (HLE) benchmark environment for Prime Community Environments

Trains towardRL EnvHleMulti ModalTool Use

WEB PY RL Env (Prime Intellect)

Prime Intellect

Humanity's Last Examination (HLE) benchmark environment for prime-environments

Trains towardRL EnvHleMulti ModalTool Use

Papers

Humanity's Last Exam

preprint · 2025

CAIS + Scale AI benchmark of ~3,000 expert-authored questions spanning every academic subject, designed to be the hardest closed-ended exam for frontier models.

FAQ

What is Humanity's Last Exam?: Humanity's Last Exam (HLE) is a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. Humanity's Last Exam consists of 3,000 questions across dozens of subjects, including mathem
How can a model improve its Humanity's Last Exam score?: Tools linked to Humanity's Last Exam on Sophon include HLE RL Env (Prime Intellect), WEB PY RL Env (Prime Community), WEB PY RL Env (Prime Intellect) - RL environments, datasets, and scaffolds that target this eval.
What license is Humanity's Last Exam under?: Humanity's Last Exam is available under mit.