Earlier this year, computer scientist Guillaume Cabanac received a notification from Google Scholar saying that one of his papers had been cited in a dental journal. That didn’t make sense. His work focuses on detecting fake research papers, not dentistry.
When he looked into it, he realised something was wrong. The citation looked similar to a preprint he had written in 2021, but it listed the journal as Nature and included a DOI that didn’t lead to his original work. It wasn’t just a minor error, the reference never existed. He quickly suspected it had been generated by AI.
And this isn’t a one-off. There is growing evidence that researchers are using large language models to help write papers that are accidentally including references that simply aren’t real.
A problem that’s accelerating
Recent studies show just how quickly this is spreading.
One analysis of nearly 18,000 computer science conference papers found that in 2024, around 0.3% contained at least one suspicious reference. By 2025, that figure had jumped to 2.6%. That might sound small, but at scale it’s significant.
Another study puts the number higher, suggesting that between 2% and 6% of papers contain references that can’t be verified or have been altered.
An investigation by Nature, working with Grounded AI, suggests the issue could be much larger. Their estimate is that tens of thousands of papers published in 2025 may include invalid or AI-generated references.
That’s forced publishers to react. Many are now introducing tools to try and catch these issues before papers are published.
But there’s concern about where this leads. As political scientist Alison Johnston put it bluntly, “We’re going to see a flood of fake references.”
This isn’t just sloppiness anymore
Errors in citations are nothing new. Researchers have always made small mistakes. Misspelled author names, incorrect dates, wrong journal titles, or broken DOIs.
But we’re no longer dealing with slightly inaccurate references. Entire citations are being fabricated from scratch.
Editors are already feeling the impact. Johnston reported rejecting a quarter of submissions in a single month because they contained fake references. That’s a significant change in workload. Tasks that used to be unnecessary ; manually checking citations, running plagiarism tools , are now becoming routine.
How AI ends up inventing references
The uncomfortable truth is that AI can generate citations that look completely convincing but are false.
In one experiment using GPT-4o, nearly 20% of references produced were entirely fabricated. Even among the genuine ones, almost half contained errors.
Sometimes the model blends elements of real papers together into what researchers call “Frankenstein citations” , combinations of real authors, titles, and journals that don’t correspond to any actual publication.
It can also invent DOIs or attach the wrong DOI to a real paper. To a casual reader, everything looks legitimate. Under scrutiny, it falls apart.
How big could this get?

To get a clearer picture, Nature and Grounded AI analysed more than 4,000 publications from major publishers.
Among the 100 most suspicious papers, 65 contained at least one invalid reference. Not every flagged case was wrong ,some real citations were incorrectly identified, but the signal was clear.
From this, they estimate that over 110,000 papers published in 2025 could contain invalid references. And that may be conservative, particularly in fields like computer science where AI tools are used more heavily.
The response from publishers
Publishers aren’t ignoring this. Many are developing their own systems to flag fake or irrelevant citations, as well as references to retracted papers. Others are relying on external tools, including Grounded AI’s “Veracity” system.
These tools are starting to make a difference. They’re catching a meaningful number of issues before publication. But they’re not perfect, and some problems still slip through.
Why it’s harder than it sounds
Detecting fake references isn’t straightforward. Different journals use different formatting styles. Some references don’t include DOIs. Many legitimate papers sit in smaller or non-English journals that aren’t well indexed. And even the best databases don’t cover everything.
Because of this, automated systems can struggle. They sometimes flag genuine references as suspicious, while missing cleverly fabricated ones. But right now, the uncomfortable truth is that no one really knows how widespread this problem is , only that it’s growing faster than the systems designed to catch it.
There’s broad agreement on one thing: this is a real and growing problem. This makes it absolutely essential that DNA reports are not generated by LLMs who have not curated the studies and understood them. At Genemetrics, our reports are backed by rigorous scientific studies and algorithms to counter invalid claims and statements. This ensures that hallucinated statements never make it into the hands of a patient to act on incorrect health information. That goes back to one of the cornerstones of the hippocratic oath- “First, do no harm.”
