0

FactReview: Evidence-Grounded Peer Review with Execution-Based Claim Verification

LLM-based reviewing systems typically take only the manuscript as input, leaving literature and code-based claims hard to verify. We present FactReview, a system that extracts review-relevant claims, grounds them in related work, and, when code is available, executes released…

Year
2026
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2604.04074ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

LLM-based reviewing systems typically take only the manuscript as input, leaving literature and code-based claims hard to verify. We present FactReview, a system that extracts review-relevant claims, grounds them in related work, and, when code is available, executes released artifacts under a fixed repair budget to audit empirical claims. Across 35 ML papers and 463 benchmark major claims, FactReview covers 84% of claims. Under an evidence-aware rubric, its reviews score 4.86/5 in overall quality, 0.7 above DeepReview-v2 and 1.5 above matched OpenReview comments. Removing execution evidence changes 17% of claim statuses, more than any other single evidence source. In a reviewer-assistance study, FactReview reduces mean review time by 58% while raising benchmark claim coverage from 87% to 99%. We argue that LLM reviewers should audit empirical claims, not make accept-reject decisions. The code is public at: https://github.com/DEFENSE-SEU/FactReview.