AI Research
Massive study detects AI fingerprints in millions of scientific papers
Chances are that you have unknowingly encountered compelling online content that was created, either wholly or in part, by some version of a Large Language Model (LLM). As these AI resources, like ChatGPT and Google Gemini, become more proficient at generating near-human-quality writing, it has become more difficult to distinguish between purely human writing from content that was either modified or entirely generated by LLMs.
This spike in questionable authorship has raised concerns in the academic community that AI-generated content has been quietly creeping into peer-reviewed publications.
To shed light on just how widespread LLM content is in academic writing, a team of U.S. and German researchers analyzed more than 15 million biomedical abstracts on PubMed to determine if LLMs have had a detectable impact on specific word choices in journal articles.
Their investigation revealed that since the emergence of LLMs there has been a corresponding increase in the frequency of certain stylist word choices within the academic literature. These data suggest that at least 13.5% of the papers published in 2024 were written with some amount of LLM processing. The results appear in the open-access journal Science Advances.
Since the release of ChatGPT less than three years ago, the prevalence of Artificial Intelligence (AI) and LLM content on the web has exploded, raising concerns about the accuracy and integrity of some research.
Past efforts to quantify the rise in LLMs in academic writing, however, were limited by their reliance on sets of human- and LLM-generated text. This setup, the authors note, “…can introduce biases, as it requires assumptions on which models scientists use for their LLM- assisted writing, and how exactly they prompt them.”
In an effort to avoid these limitations, the authors of the latest study instead examined changes in the excess use of certain words before and after the public release of ChatGPT to uncover any telltale trends.
The researchers modeled their investigation on prior COVID-19 public-health research, which was able to infer COVID-19’s impact on mortality by comparing excess deaths before and after the pandemic.
By applying the same before-and-after approach, the new study analyzed patterns of excess word use prior to the emergence of LLMs and after. The researchers found that after the release of LLMs, there was a significant shift away from the excess use of “content words” to an excess use of “stylistic and flowery” word choices, such as “showcasing,” “pivotal,” and “grappling.”
By manually assigning parts of speech to each excess word, the authors determined that before 2024, 79.2% of excess word choices were nouns. During 2024 there was a clearly identifiable shift. 66% of excess word choices were verbs and 14% were adjectives.
The team also identified notable differences in LLM usage between research fields, countries, and venues.
Written for you by our author Charles Blue,
edited by Andrew Zinin—this article is the result of careful human work. We rely on readers like you to keep independent science journalism alive.
If this reporting matters to you,
please consider a donation (especially monthly).
You’ll get an ad-free account as a thank-you.
More information:
Dmitry Kobak et al, Delving into LLM-assisted writing in biomedical publications through excess vocabulary, Science Advances (2025). DOI: 10.1126/sciadv.adt3813
© 2025 Science X Network
Citation:
Massive study detects AI fingerprints in millions of scientific papers (2025, July 6)
retrieved 7 July 2025
from https://phys.org/news/2025-07-massive-ai-fingerprints-millions-scientific.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.
AI Research
The new frontier of medical malpractice
Although the beginnings of modern artificial intelligence (AI) can be traced
as far back as 1956, modern generative AI, the most famous example of which is
arguably ChatGPT, only began emerging in 2019. For better or worse, the steady
rise of generative AI has increasingly impacted the medical field. At this time, AI has begun to advance in a way that creates
potential liability…
AI Research
Pharmaceutical Innovation Rises as Global Funding Surges and AI Reshapes Clinical Research – geneonline.com
AI Research
Radiomics-Based Artificial Intelligence and Machine Learning Approach for the Diagnosis and Prognosis of Idiopathic Pulmonary Fibrosis: A Systematic Review – Cureus
-
Funding & Business7 days ago
Kayak and Expedia race to build AI travel agents that turn social posts into itineraries
-
Jobs & Careers7 days ago
Mumbai-based Perplexity Alternative Has 60k+ Users Without Funding
-
Mergers & Acquisitions7 days ago
Donald Trump suggests US government review subsidies to Elon Musk’s companies
-
Funding & Business6 days ago
Rethinking Venture Capital’s Talent Pipeline
-
Jobs & Careers6 days ago
Why Agentic AI Isn’t Pure Hype (And What Skeptics Aren’t Seeing Yet)
-
Funding & Business4 days ago
Sakana AI’s TreeQuest: Deploy multi-model teams that outperform individual LLMs by 30%
-
Jobs & Careers6 days ago
Astrophel Aerospace Raises ₹6.84 Crore to Build Reusable Launch Vehicle
-
Funding & Business1 week ago
From chatbots to collaborators: How AI agents are reshaping enterprise work
-
Funding & Business4 days ago
HOLY SMOKES! A new, 200% faster DeepSeek R1-0528 variant appears from German lab TNG Technology Consulting GmbH
-
Tools & Platforms6 days ago
Winning with AI – A Playbook for Pest Control Business Leaders to Drive Growth