APEX: An Expert-Authored Benchmark for Real-World Expert Workflows
Mercor's benchmark of high-difficulty, expert-authored tasks drawn from real professional workflows (consulting, finance, legal, medical research), graded by domain experts.
- Publisher
- Mercor
- Year
- 2025
- Venue
- preprint
- Authors
- 1
- Hosting
- External sourcelicense unknown
Cite
Notes
Only stored in your browser.
Introduces 1 artifact - 1 eval
TL;DR
Semantic Scholar
An extended version of the AI Productivity Index (APEX-v1-extended), a benchmark for assessing whether frontier models are capable of performing economically valuable tasks in four jobs, shows that frontier models still have substantial limitations when performing typical professional tasks.
Artifacts
1Evals