0

APEX: An Expert-Authored Benchmark for Real-World Expert Workflows

Mercor's benchmark of high-difficulty, expert-authored tasks drawn from real professional workflows (consulting, finance, legal, medical research), graded by domain experts.

Publisher
Mercor
Year
2025
Venue
preprint
Authors
1
Hosting
External sourcelicense unknown

Cite

Notes

Only stored in your browser.

Introduces 1 artifact - 1 eval

TL;DR

Semantic Scholar

An extended version of the AI Productivity Index (APEX-v1-extended), a benchmark for assessing whether frontier models are capable of performing economically valuable tasks in four jobs, shows that frontier models still have substantial limitations when performing typical professional tasks.

Artifacts

1

Evals

Authors

1