0

Large language models reshape the language of science

Scientific language is a central infrastructure of knowledge production, but it remains unclear whether large language models (LLMs) are altering not only how scientists write, but also how scientific knowledge is communicated and accessed.

Year
2025
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2504.12317ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

Scientific language is a central infrastructure of knowledge production, but it remains unclear whether large language models (LLMs) are altering not only how scientists write, but also how scientific knowledge is communicated and accessed. Here we analyze 21.36 million scientific abstracts published between 2020 and 2024, together with historical records from major journals, to trace recent changes in the language of science. We identify a marked turning point in 2024, when scientific writing shows a sharp increase in lexical complexity alongside a decline in syntactic complexity. This shift is pervasive across disciplines and journal tiers, and is more pronounced in texts by scholars working in non-native English contexts, especially those from language backgrounds that differ more typologically from English. Controlled polishing experiments confirm that LLMs reproduce this pattern by favoring more lexically dense and syntactically compressed expression. We further show why this linguistic shift matters: it may widen the distance between scientific discourse and public-facing language, while also helping scholars in non-native English contexts navigate English-language publishing requirements. These findings suggest that LLMs may broaden participation in scientific authorship while narrowing the accessibility of scientific communication, making them a new force in the linguistic infrastructure of science.