
Investigating a claim that 100 made-up claims were found in AI papers
Loading summary
LinkedIn Ads Announcer
This BBC podcast is supported by ads outside the UK. The best B2B marketing gets wasted on the wrong people. So when you want to reach the right professionals, use LinkedIn ads. LinkedIn has grown to a network of over 1 billion professionals, including 130 million decision makers. And that's where it stands apart from other ad buyers. You can target your buyers by job title, industry, company role, seniority skills, company revenue so you can stop wasting budget on the wrong audience. It's why LinkedIn Ads generates the highest B2B return on ad spend of major ad networks. Spend $250 on your first campaign on LinkedIn Ads and get $250 credit for the next one. Just go to LinkedIn.com Broadcast that's LinkedIn.com Broadcast. Terms and conditions apply.
MyFico Advertiser
Dreaming of buying your first car or a new home? Knowing your FICO score is the first step in making it real. With MyFico you can check your score for free and it won't hurt your credit. You'll get your FICO score, full credit reports and real time alert all in one simple app. Your credit score is more than just numbers. It's the key to building the future you've been working toward. Visit myfico.com free or download the MyFico app and take the mystery out of your FICO score.
Tom Coles
Hello and thanks for downloading the More or Less podcast. We're the program that looks at the numbers in the news and in life and in AI hallucinations. I'm Tom Coles. As a small print warns you, if you ever ask ChatGPT to help your kid with their maths homework, AI can make mistakes. Despite having all the confidence of your overconfident friend, some of the stuff that AI engines like ChatGPT, Gemini, Grok, or Claude confidently tells you is essentially made up. I mean, to be totally fair, everything a large language model like this tells you is just what it thinks is the most likely answer. But much of the time the most likely thing is factually accurate. Sometimes it's totally fictitious, and this totally fictitious or false stuff is sometimes called a hallucination. Whether these hallucinations matter depend on what you're using AI for and whether they are spotted and sorted out. So the team on More or less were slightly surprised to see the following headline in Fortune magazine.
MyFico Advertiser
One of the world's top academic AI conferences accepted research papers with 100 plus AI hallucinated citations.
Tom Coles
You might think that the top AI researchers in the world would be careful about using AI to write their research papers so is this number right and what does it mean? If it is,
Alex Tway
People have started to kind of share that they're getting like citations from these big hallucinations and it's a mixture of, I think, pride and bewilderment.
Tom Coles
This is Alex Tway, the CTO and co founder of GPT0, the company that found these 100 plus AI hallucinations in research papers.
Alex Tway
They're like, hey, like this. LLM knows so much about my research, I think I wrote all these papers. I didn't. In some ways it's a weird point of pride, I think, to be hallucinated by an AI. That's definitely one sign that you've made it in the industry.
Tom Coles
If you're new to this subject, it might sound strange to talk about a computer program hallucinating. They're not out in the desert taking mind altering substances after all. The reality is a little more prosaic.
Alex Tway
What happens is a researcher might say, oh, can you write this section of the paper for me and make sure to add a lot of citations? And the AI will do that, but it's essentially doing it without any references. And so it has to start making things up to make them look like real high quality citations, but they're actually not corresponding to anything real.
Tom Coles
Alex certainly has skin in this game. The company he runs offers a service to organisations publishing these papers to help root out AI slob. But no one is denying that there are AI hallucinations in these papers. So what exactly did they find? First, the context. The papers in question were published as part of a big AI conference known as Neurips.
Alex Tway
It's essentially the premier event for machine learning.
Tom Coles
This get together attracts the brightest minds in AI, or machine learning as Alex calls it. From academic researchers at top universities to industry researchers at the big tech companies like Meta and Google. Alex says that in this booming industry, getting your paper published really matters.
Alex Tway
Having a couple of papers in these conferences can get you an OpenAI job. If you're a startup company, having a couple of these papers in these conferences can mean raising $100 million from investors.
Tom Coles
This means there's a massive incentive for researchers to pump out a lot of papers for consideration at conferences like this one.
Alex Tway
Once a year they'll receive about, let's say around 20,000 submissions and then they'll accept about 5,000 of them.
Tom Coles
This is where Alex's company comes in. They took those 5,000 odd papers that were selected for publication and oh wait a minute, asked AI to take a look.
Alex Tway
Yeah, so that's the funny thing. The most accurate methods to do this are using AI itself, but extremely specialized for this purpose. And so to make our specialized AI be able to find hallucinations, we have to train it on countless examples where we've actually labeled like hey, this is a hallucination, this isn't. And it drastically improves in that task compared to an off the shelf chatgpt or something like that. So essentially we ran our hallucination detector on these papers for this experiment.
Tom Coles
They didn't look at the text of the papers themselves, but just the citations, the references to other papers which lodged the research in the wider web of academic publishing. These are easier to verify, as wrong as you can easily check against the real thing.
Alex Tway
And we would go through each citation and try to verify whether or not it exists, searching through like massive buck scale search engines, academic databases and so on. And so we might get a bunch of potential matches.
Tom Coles
Most of the time the details of the paper cited, author title and date published did match a real scientific paper. Sometimes they really didn't.
Alex Tway
We were able to pretty easily find like at least 100 hallucinations over 50 papers.
Tom Coles
This isn't an exhaustive list by the way, they just stopped counting when they found a suitable round number. And these hallucinations took a variety of forms.
Alex Tway
About 39 were just completely non existent publications then. The other 61 had a combination of fabricated authors, people who don't exist or exist but never wrote a paper like that, fake titles, fake links or URLs and so on.
Tom Coles
Some of the dodgy citations contained real authors, but who didn't write those papers. Some had wonky names or odd titles, but others were just completely made up. This one is my absolute favourite.
Alex Tway
The authors were first name, last name and others, which I imagine is quite a coincidence that all of those three were real people.
Tom Coles
We asked professor first name and doctor last name for comment, but didn't hear back. Look, it seems clear from this research that incredibly busy under pressure, researchers are using AI to write the boring bits of their papers. It's quite funny really that AI researchers aren't immune to its charming overconfidence. But beyond the obvious irony, why does this matter?
Alex Tway
In the culture of computer science, it often can be challenging to reproduce some of these experiments. And the reviewers, they don't have time to do so in their own capacity. And so there's so much trust that goes into like, did you write your code correctly? Is your data correct? They're often at scales that are impossible to verify without a lot of extra work. And so if we can't trust that your paper is even a human reviewed, so the AI is making mistakes in your paper and you're not catching it, then how can you trust that everything else created by the researcher was also reviewed by human and not hallucinated by AI?
Tom Coles
For their part, the organizers of the Neurips conference told us that while they accept researchers are using AI to write their papers and hallucinations can get through the review process, they do not believe the research in these papers would necessarily be invalidated by the discovery of AI hallucinations. At the same time, they're continuously refining their guidance for both the authors and reviewers as the use of AI rapidly evolves. There is an implication for society in all of this too. Alex says that because of the biases in the stuff the AI has learned from citations from non anglophone researchers seem to go wrong more than for others. Although this wasn't from a Neurips paper, there are some pretty odd things going on.
Alex Tway
We found that it would just start chaining together highly likely names of researchers such as you'd start chaining Chinese initials like hyx, xz, N, blah blah blah blah blah blah. Just a string of 10 three letter acronyms and you could just tell that the LLM thinks oh, if I had to make up a citation, all I have to do is just write Chinese names.
Tom Coles
Thanks to Alex Tway that's it for this week. If you've seen a number in the news you think we should take a look at email more or lessbc.co.uk. we'll be back next week. Until then, goodbye.
MyFico Advertiser
If you're an H Vac technician and a call comes in, Grainger knows that you need a partner that helps you find the right product fast and hassle free. And you know that when the first problem of the day is a clanking blower motor, there's no need to break a sweat. With Grainger's easy to use website and product details, you're confident you'll soon have everything humming right along. Call 1-800-GRAINGER clickgrainger.com or just stop by Grainger for the ones who get it done.
Ready to buy a car, a home, or just want to take control of your money? Your FICO score matters and 90% of top lenders use it to make decisions. Checking your FICO score for free today without hurting your credit score? Visit myfico.com free or download the MyFico app today. MyFico gives you the score lenders use most, plus credit reports and real time alerts to help keep you on top of your credit, visit myfico.com free and take the mystery out of your FICO score.
Release date: February 21, 2026
Host: Tom Coles
Guest: Alex Tway, CTO and co-founder of GPT0
This episode of More or Less investigates claims that "100+ AI hallucinated citations" made their way into papers accepted at one of the world’s leading AI research conferences, NeurIPS. Host Tom Coles, with guest Alex Tway from GPT0, explores how these hallucinations occurred, why they're significant, and what they reveal about the intersection of AI technology and academic rigor. The discussion demystifies AI ‘hallucinations’—confident but false outputs generated by large language models—and places the issue within the high-pressure world of AI research publishing.
This episode offers a concise yet nuanced look into the infiltration of AI-generated errors within elite scientific publishing. While the overall risk to research integrity may be low for now, it exposes vulnerabilities at the intersection of academic culture and rapidly evolving AI capabilities. The hosts use wit and clear analogies to make technical issues accessible and relevant for a broad audience, reminding us to temper our trust in AI—especially when it comes to the fine print.
For questions or suggestions on numbers in the news, listeners are encouraged to contact moreorless@bbc.co.uk.