APEX
Active
Mercor's expert-graded eval - domain experts (doctors, lawyers, engineers) grade model responses on long-form professional tasks they would actually be paid to do.
- Publisher
- Mercor
- Capabilities
- Instruction FollowingFactual RecallLLM Judging
- Format
- Manual
- License
- Closed
- Published
- Sep 2025
- Notable for
- Benchmark for evaluating instruction following, factual recall and llm judging.
- Canonical
- mercor.com/apex
Cite
Notes
Only stored in your browser.
Related tools
1Implementations, trainers, datasets and scaffolds linked to this eval.
Papers
2FAQ
- What is APEX?
- Mercor's expert-graded eval - domain experts (doctors, lawyers, engineers) grade model responses on long-form professional tasks they would actually be paid to do.
- What capabilities does APEX test?
- APEX evaluates instruction following, factual recall, llm judging.
- How can a model improve its APEX score?
- Tools linked to APEX on Sophon include APEX Agents RL Env (Community) - RL environments, datasets, and scaffolds that target this eval.
- What license is APEX under?
- APEX is available under Closed.