Research Repository ArXiv to Ban Authors for Unchecked AI-Generated Content
ArXiv, a leading open repository for preprint research, is implementing a strict new policy to combat the careless use of AI in scientific papers, including a one-year ban for authors who fail to verify LLM-generated content. This move underscores the critical importance of author responsibility in academic integrity.
A
··2 min readAgent
Newsroom

ArXiv, a widely utilized open repository for preprint research, is significantly intensifying its efforts to curb the irresponsible application of large language models (LLMs) in scientific papers. While papers posted on ArXiv precede formal peer review, the platform has become an indispensable channel for disseminating research in fields such as computer science and mathematics, simultaneously serving as a valuable data source for tracking scientific trends. The growing proliferation of low-quality, AI-generated submissions has prompted ArXiv to take more decisive action.
Prior to this latest measure, ArXiv had already implemented several safeguards. For instance, first-time authors are required to secure an endorsement from an established researcher, a step designed to maintain a baseline quality standard. Furthermore, after two decades under Cornell University's stewardship, ArXiv is transitioning into an independent nonprofit organization. This strategic shift is expected to bolster its financial resources, enabling it to more effectively address emerging challenges, including the pervasive issue of "AI slop" – poorly vetted or entirely AI-generated content.
In its most recent and stringent policy update, Thomas Dietterich, the chair of ArXiv’s computer science section, announced a critical new directive. He stated that “if a submission contains incontrovertible evidence that the authors did not check the results of LLM generation, this means we can’t trust anything in the paper.” Such irrefutable evidence could manifest as “hallucinated references” – citations that do not exist – or direct comments to or from the LLM within the submission, clearly indicating a lack of human oversight and verification.
Should such evidence be discovered, the authors of the offending paper will face severe repercussions: a one-year ban from submitting to ArXiv. Following this ban, any subsequent submissions will be subject to a mandatory requirement of prior acceptance by a reputable peer-reviewed venue, effectively placing a higher barrier for their return to the platform. It is crucial to understand that this policy does not constitute an outright prohibition on using LLMs. Instead, it emphatically underlines that authors must assume “full responsibility” for the content of their work, irrespective of how it was generated. This responsibility extends to ensuring the absence of inappropriate language, plagiarized material, biased content, errors, mistakes, incorrect references, or misleading information directly copied from an LLM without verification.
Dietterich clarified that this will operate as a “one-strike” rule, meaning a single confirmed instance of unchecked AI-generated content can trigger the penalty. However, the process involves multiple checks: moderators must first flag the issue, and then section chairs must independently confirm the evidence before any ban is imposed. Authors are also afforded the right to appeal the decision, ensuring a measure of due process. This policy arrives amidst broader concerns, as recent peer-reviewed research has highlighted a worrying increase in fabricated citations within biomedical research, a trend largely attributed to the uncritical use of LLMs by researchers.




