0

APEX

Active

Mercor's expert-graded eval - domain experts (doctors, lawyers, engineers) grade model responses on long-form professional tasks they would actually be paid to do.

Publisher
Mercor
Format
Manual
License
Closed
Published
Sep 2025
Notable for
Benchmark for evaluating instruction following, factual recall and llm judging.

Cite

Notes

Only stored in your browser.

Related tools

1
View all

Implementations, trainers, datasets and scaffolds linked to this eval.

Papers

2

FAQ

What is APEX?
Mercor's expert-graded eval - domain experts (doctors, lawyers, engineers) grade model responses on long-form professional tasks they would actually be paid to do.
What capabilities does APEX test?
APEX evaluates instruction following, factual recall, llm judging.
How can a model improve its APEX score?
Tools linked to APEX on Sophon include APEX Agents RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.
What license is APEX under?
APEX is available under Closed.