APEX: An Expert-Authored Benchmark for Real-World Expert Workflows

Mercor's benchmark of high-difficulty, expert-authored tasks drawn from real professional workflows (consulting, finance, legal, medical research), graded by domain experts.

Open

Preview
Publisher: Mercor
Year: 2025
Venue: preprint
ArXiv: arxiv.org/abs/2509.25721
Authors: 1
Hosting: External sourcelicense unknown

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text: arxiv.org/abs/2509.25721
TL;DR: semanticscholar.org/paper/4c3b286f1bbb936e971e4ae842a34fd00e735fc2

Attribution policy →

Introduces 1 artifact - 1 eval

TL;DR

Semantic Scholar

An extended version of the AI Productivity Index (APEX-v1-extended), a benchmark for assessing whether frontier models are capable of performing economically valuable tasks in four jobs, shows that frontier models still have substantial limitations when performing typical professional tasks.

Artifacts

Evals

APEX

Authors

Mercor Team