0

LLMpedia: A Transparent Framework to Materialize an LLM's Encyclopedic Knowledge at Scale

Benchmarks like MMLU suggest flagship language models approach factuality saturation above 90\%. \emph{LLMpedia} shows this picture is incomplete. We materialize ${\sim}$1.3M encyclopedia articles entirely from parametric memory across three model families, then audit every…

Preview
Year
2026
Hosting
Full text hostedCC-BY-4.0

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2603.24080CC-BY-4.0
TL;DR
Semantic Scholar
Attribution policy →

Abstract

Benchmarks like MMLU suggest flagship language models approach factuality saturation above 90%. LLMpedia shows this picture is incomplete. We materialize {\sim}1.3M encyclopedia articles entirely from parametric memory across three model families, then audit every claim against Wikipedia and curated web evidence. For gpt-5-mini, the verifiable true rate is 68.4% on Wikipedia-covered subjects - more than 21,pp below MMLU - and the gap is driven by unverifiability (30.5%), not refutation (1.2%). Beyond Wikipedia, frontier articles audited against curated web evidence reach 57.6%; Wikipedia covers only 56.7% of model-surfaced subjects, and three model families overlap in just 7.3% of subject choices. In a retrieval-trap benchmark inspired by prior analysis of Grokipedia, LLMpedia is more factual at roughly half the textual similarity to Wikipedia. Every prompt, article, and verdict is released. Data, code, interface: https://llmpedia.net.