ClaimFlow: Tracing the Evolution of Scientific Claims in NLP

Scientific papers advance claims that later work supports, extends, or sometimes refutes. Yet existing methods for citation and claim analysis capture only fragments of this dialogue. In this work, we make these interactions explicit at the level of individual scientific claims. We introduce ClaimFlow, a claim-centric view of the NLP literature, built from 1{,}617 ACL Anthology papers (1979 - 2025) that are manually annotated with 5{,}689 claims and 4{,}871 cross-paper claim relations, indicating whether a citing paper supports, extends, qualifies, refutes, or references a cited claim as background. Building on ClaimFlow, we define a new task -- Claim Relation Classification -- which requires models to infer the scientific stance toward a cited claim from the text and citation context. Evaluating neural models and large language models on this task, we report baseline performance of 0.81 macro-F1, suggesting that the task is tractable while leaving room for improvement. We then scale this framework to \sim13k NLP papers to study claim evolution across decades of NLP research. We show that 63.5% claims are never reused; only 11.1% are ever challenged. Widely propagated claims are more often reshaped through qualification and extension than supported or refuted. Overall, ClaimFlow offers a lens for examining how ideas shift and mature within NLP.