Scientific language is a central infrastructure of knowledge production, but it remains unclear whether large language models (LLMs) are altering not only how scientists write, but also how scientific knowledge is communicated and accessed. Here we analyze 21.36 million scientific abstracts published between 2020 and 2024, together with historical records from major journals, to trace recent changes in the language of science. We identify a marked turning point in 2024, when scientific writing shows a sharp increase in lexical complexity alongside a decline in syntactic complexity. This shift is pervasive across disciplines and journal tiers, and is more pronounced in texts by scholars working in non-native English contexts, especially those from language backgrounds that differ more typologically from English. Controlled polishing experiments confirm that LLMs reproduce this pattern by favoring more lexically dense and syntactically compressed expression. We further show why this linguistic shift matters: it may widen the distance between scientific discourse and public-facing language, while also helping scholars in non-native English contexts navigate English-language publishing requirements. These findings suggest that LLMs may broaden participation in scientific authorship while narrowing the accessibility of scientific communication, making them a new force in the linguistic infrastructure of science.
Large language models reshape the language of science
Scientific language is a central infrastructure of knowledge production, but it remains unclear whether large language models (LLMs) are altering not only how scientists write, but also how scientific knowledge is communicated and accessed.
- Year
- 2025
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2504.12317ARXIV-DEFAULT
- TL;DR
- Semantic Scholar