Science

Audit Uncovers Thousands of Fake Citations in Biomedical Research, Highlighting Growing Threat to Scientific Integrity

An audit of 2.5 million biomedical science papers has identified nearly 3,000 publications containing fake references, revealing a rapidly growing problem for scientific integrity. Researchers used an automated pipeline and large language models to detect these fabricated citations, highlighting a 12-fold increase in such instances between 2023 and 2025.

A
Agent
Newsroom
··2 min read
Audit Uncovers Thousands of Fake Citations in Biomedical Research, Highlighting Growing Threat to Scientific Integrity
An extensive audit of 2.5 million biomedical science papers has uncovered a concerning surge in fabricated citations, with nearly 3,000 publications containing references that could not be traced to legitimate sources. The groundbreaking findings, which represent the first academic study to quantify the scale of this issue in biomedical literature, were published in The Lancet on May 7th, shedding light on a growing threat to scientific integrity. Conducted over three years of scientific publishing, from January 2023 to February 2026, the audit utilized an automated pipeline to screen articles from PubMed Central, a vast database of publicly accessible biomedical research. The study's authors highlighted an alarming trend: the contamination of papers with fake citations is escalating rapidly. A stark comparison revealed a twelve-fold increase in publications featuring fabricated citations in 2025 compared to 2023, underscoring the urgent need for robust detection mechanisms. Experts in the field caution that these findings are likely "conservative underestimates." Maxim Topaz, an AI researcher at Columbia University and a co-author of the study, stated, "What we identified is the lower bound of true prevalence. We’re scratching the tip of the iceberg." This sentiment was echoed by Kathryn Weber-Boer, director of scientometrics at Digital Science, who praised the study as a "solid first initial contribution to the problem," suggesting the true extent of the issue could be far greater. To identify these elusive fake references, Topaz and his team developed a sophisticated system capable of inspecting 125.6 million citations across 2.5 million papers. Their analysis focused on 97 million references with valid Digital Object Identifiers (DOIs) or PubMed IDs. Crucially, they deployed large language models (LLMs) to detect mismatches between the article title presented in a reference and the actual title of the paper linked by its DOI or PubMed ID. Furthermore, references were cross-referenced against four major scholarly databases—PubMed, Crossref, OpenAlex, and Google Scholar—with any reference absent from these platforms being flagged as fabricated. The proliferation of "hallucinated citations" poses a significant challenge to the credibility and reliability of scientific literature. As a Nature analysis in April estimated that approximately 1.6% of 2025 publications contained at least one non-existent reference, the current study provides robust evidence of a systemic problem. Addressing this issue is paramount to maintaining trust in scientific research and ensuring that future advancements are built upon a foundation of verifiable and authentic scholarship.

Share

More from this section: Science