factual recall
- Slug
factual-recall- Evals
- 19
- Tools
- 48
- Models
- 480
- Papers
- 11
Evals testing this capability
19Tools lifting evals here
48Top models on this capability
480by avg parsed score across evals here
Papers in this area
11introducesAA-Omniscience: A Benchmark for Long-Tail Factual KnowledgeintroducesAPEX: An Expert-Authored Benchmark for Real-World Expert WorkflowsintroducesBeyond the Imitation Game: Quantifying and extrapolating the capabilities of language modelsintroducesGDPval: Evaluating AI Model Performance on Real-World Economically Valuable TasksintroducesGPQA: A Graduate-Level Google-Proof Q&A BenchmarkintroducesHolistic Evaluation of Language ModelsintroducesHumanity's Last ExamintroducesMMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding BenchmarkintroducesMeasuring Massive Multitask Language UnderstandingintroducesNeedle In A Haystack - Pressure Testing LLMsintroducesTruthfulQA: Measuring How Models Mimic Human Falsehoods
