Loading summary
Sean Ramis
Artificial intelligence is scraping the Internet. It's gorging all the websites to give you what you want. It's actually kind of gorging everything to give you what you want. And the makers of everything are not very happy about it. Sarah Silverman is suing. Sony is suing. Dow Jones is suing. The New York Times is suing. Authors are suing. But in one author lawsuit, AI kind of won. Specifically, specifically Anthropic's AI who goes by Claude.
Jason Kebler
Well, Claude's not cool, but Claude's uncool the same way I'm uncool. See. So.
Sean Ramis
Claude's win in court is scaring the makers of everything. And we're gonna talk about why on today Explained.
John Herman
Avoiding your unfinished home projects because you're not sure where to start. Thumbtack knows homes so you don't have to don't know the difference between matte paint, finish and satin or what that clunking sound from your dryer is. With Thumbtack you don't have to be a home pro, you just have to hire one. You can hire top rated pros, see price estimates and read reviews all on the app download today.
Sean Ramis
Pro savings days are back at Lowes right now. Get a four piece GE kitchen suite for under $2,000 plus get a free DeWalt 20 volt max XR8amp hour battery when you buy a select DeWalt tool. Save big with deals that work as hard as you do. Shop Pro savings days in store or online. Today Lowes we help you Save valid through 8:1 Selection Varies by location while supplies last discount taking the time of purchase. See sales associate for details Today explained from Vox. I'm Sean Ramis from here with Jason Kebler, tech reporter and co founder of 404 Media.
Unnamed Journalist
I am a journalist who covers AI, but I'm also a business owner because we have our own small publication. And so I'm very interested in what is going to happen with all of these AI companies getting sued on copyright grounds. There's dozens of lawsuits at this point and I'm concerned about it both as a journalist who has had my work scraped, but also as someone who has like a direct financial interest in it. And so about a month ago there was this decision in a case against Anthropic which makes the AI tool called claude. And it's not necessarily that this is the biggest AI copyright case, but is the first real major decision where we get a judge sort of pointing at how he is thinking about these issues of massive AI companies scraping authors work Scraping artists work. Scraping musicians work.
Sean Ramis
And who sued Anthropic?
Unnamed Journalist
Yeah, so it's three authors. Their names are Andrea Bart, Charles Graeber, and Kirk Wallace Johnson.
Jason Kebler
Three authors claim Anthropic built a multibillion dollar business by misusing copyrighted works and pirated writings without permission and without paying the authors for their work. This lawsuit is really just the latest. As many other authors, journalists, record labels.
Unnamed Journalist
Artists, creators, they try to wrestle back.
Jason Kebler
Control of their work.
Unnamed Journalist
To be totally honest, I didn't know them before this lawsuit.
Sean Ramis
To be totally honest, I still don't.
Unnamed Journalist
They sued them because they learned that their books were included in this Data set called Books 3, which is this really controversial at this point, data set that contains a few hundred thousand books. And the Atlantic at one point got a copy of Books 3 and then published, like this search tool that allowed authors to see, is your book in this data set?
Jason Kebler
Author Drew Hayden Taylor had no idea.
Unnamed Journalist
Wow.
Jason Kebler
That nine of his works were part.
Sean Ramis
Of Books three, a massive data set.
Jason Kebler
Used by tech companies to train artificial intelligence.
Sean Ramis
Well, it's a combination of being flattered and being concerned.
Jason Kebler
We're all just like little ants who don't mean anything to the big billionaires. They don't want to pay us for our words. They'd rather just take it. I'm so mad if your book is on here. I'm so sorry. I'm just, like, so sad for so.
Unnamed Journalist
Many authors today, these authors learned that their books were in Books three, Anthropic trained on Books three, and therefore anthropic trained on their copyrighted works. And so that formed, like, the basis of this lawsuit. So the really interesting thing is that in the early days of this debate, and it's like one of the hottest debates at the moment between artists, journalists, authors, and like the AI boosters and companies and maximalists, is is it fair use to scrape this stuff en masse, run it through a large language model, like, turn it into a huge data set, and then use large language model technology to create these tools. And at first, the AI companies were very skittish about saying that they had trained on copyrighted work at all.
Sean Ramis
AI should be allowed to read the Internet and learn. Shouldn't be regurgitating, shouldn't be, you know, violating any copyright laws, but on individuals, private work.
John Herman
Yeah, we try not to train on that stuff.
Sean Ramis
We really don't want to be here upsetting people.
Unnamed Journalist
But as these cases started going to court and as they entered discovery and as it became clear that every major AI company was training on copyrighted work. Their argument went from being, well, we can't say what we trained on because it is proprietary to of course we trained on copyrighted work. We had to. And it's legal. And it's legal because our use of it is transformative and therefore is protected by the Fair use tenet of Copyright Law. Section 107 of the Copyright act reads, transformative uses are more likely to be considered fair. Transformative uses are those that add something new with a further purpose or different character and do not substitute for the original use of the work. That's what they argued and that's what the judge ultimately decided. What he decided in this case was the scraping of these three authors books was considered fair use under copyright law. But there is a huge caveat here where he decided that the way that Anthropic went about acquiring the books in the first place was piracy.
Sean Ramis
Okay? So the judge essentially hands down a split decision saying that, yes, this is fair use to use these authors work this way, but also it wasn't totally fair how you got this stuff because it was pirated. So I don't know, what does that mean? Does everyone go home unhappy or was this like a huge win for Anthropic? Doesn't feel like a huge win for the authors?
Unnamed Journalist
Yeah, I mean, I don't think it's a huge win for anyone yet. And I think that the people who are saying this is a slam dunk for Anthropic, which many people in the AI world are saying it's a huge win for Anthropic, I think they're wrong. And the reason that I think they're wrong is because the judge determined essentially that it was not copyright infringement to train Claude on copyrighted material that was legally obtained. But then they also downloaded books from this website called Libgen, which is a piracy website that has millions of books on it, and then also from a website called Pirate Library Mirror, which is another piracy site that has millions of books on it. And the judge said that obtaining the books in this way was pretty much like cut and dry copyright infringement. And I think the really important thing to note is that every major AI company has trained on copyrighted works that they obtained. In a similar fashion, we have done reporting at 404 Media where, you know, entire YouTube channels were scraped. You know, Netflix, like the entirety of Netflix was scraped. And so the specifics about how these companies obtained these works is potentially going to be really important. And a lot of that scraping has already been done. A lot of that piracy has Already been done.
Sean Ramis
These companies are literally some of the richest companies on earth are affiliated with some of the richest people on earth. Did they really just steal all these books? Could they not have just gone to Amazon and bought like some books or is that just too much work for them?
Unnamed Journalist
Well, so the super interesting thing about this lawsuit and something that like, really like, I was like, holy shit, like, how did they do this? Why did this happen? Is in the beginning, Anthropic pirated all these books. They downloaded huge amounts of torrents, they scraped these piracy websites, and they did that specifically because they didn't want to slow down. Like, there's an email that is part of this lawsuit where the CEO Daria Amadeh says, you know, we don't want to get into, he calls them legal practice slash business slog. And so they were basically like, let's do all of this, let's pirate all the books, let's put it into our model and then let's go buy copies of a lot of other books. And so what Anthropic did was they had a whole team of people who was dedicated to buying used books from used bookstores that were going out of business from ebay, from these online marketplaces. And they bought a huge, huge number of books, like physical books. They tore the covers off of them and they had this like giant scanning operation where they would scan the books and then create a digital copy of the books and then fed that into their model. And the judge said that all of those books that were bought from used bookstores, no problem. And I think that goes to show that these AI companies are grabbing data from wherever they can find it. It's like a, it's a huge arms race to see who can get the most data from the most number of places. And so they're doing like the low hanging fruit, which is downloading, stealing, everything sets. Yeah, but then they're like scouring the planet looking for like bookstores that are going out of business. Like I've, I've heard of AI companies looking for like, like huge physical archives of like VHS movies and things like that and then digitizing those. And so really they're just trying to find data wherever they can. And it seems like when they're able to get it legally by purchasing a copy, they're willing to do so, but they're also willing to take it for free when they can.
Sean Ramis
Did we learn anything from this lawsuit that might implicate those other ones?
Unnamed Journalist
Yeah, I mean, I think that the piracy aspect of this is really important. And we've seen in the past, like if you are a 13 year old kid who's pirating Metallica songs on Napster, like you can be liable for hundreds of thousands of dollars worth of damage. Lars will find you for just like a few songs. And like in this case you have 7 million books. And so like it will be very interesting to see whether a judge levies like a huge financial penalty here or whether it's more of a slap on the wrist. And I tend to think it will probably be more of a slap on the wrist because all of Silicon Valley, all of America's largest companies sort of have a huge amount of investment riding on the widespread adoption of AI. And AI is now a huge part of the American economy. It's become part of like geopolitics as well, where you have the Trump administration and really the Biden administration was saying the same thing. Come on man. Saying that the United States can't fall behind China in the quest to innovate in AI. And to have like widespread AI adoption. I'll be very curious to see whether there are like actual like serious punishments for these companies that have scraped all of this data or whether they, you know, wiggle out of it with a slap on the wrist or get out of it with a series of settlements or what have you. But I tend to think that there's probably no stopping this industry from a legal perspective. I think that it feels too big to fail to me at this point.
Sean Ramis
404 Media C Co is where you can find and support Jason Kebler's work instead of, you know, just stealing it. AI companies aren't just stealing everyone's intellectual property. They're also kind of killing the Internet as we know it right before our eyes. We're gonna talk about that when we're back on Today Explained.
Jason Kebler
What if you could make that stop? With LPL Financial, we remove the things holding you back and provide the services to help push you forward. If you're a financial advisor, what if you could have more freedom but also more support, ready to invest? What if you could have an advisor that really understood you when it comes to your finances, your business, your future? At LPL Financial, we believe the only question should be what if you could Pitt Advertisement Anna Kendrick is not a client of LPL Financial LLC and receives compensation to promote lpl. Investing involves risk, including potential officer principal LPL Financial LLC Member Finras IPC With a Venmo debit card, you can Venmo everything, your favorite band's merch. You can Venmo this or their next show.
Sean Ramis
You can Venmo that.
Jason Kebler
Visit Venmo Me Debit to learn more. The Venmo MasterCard is issued by the Bancorp bank and a pursuant to license by Mastercard International Incorporated. The card may be used everywhere MasterCard is accepted. Venmo purchase restrictions apply.
Unnamed Journalist
Ryan Reynolds here from Mint Mobile. With the price of just about everything going up, we thought we'd bring our prices down. So to help us, we brought in a reverse auctioneer, which is apparently a.
John Herman
Thing Mint Mobile Unlimited premium wireless everybody get 30, 30 better get 30, better get 20, 2020 better get 20, 20 everybody get 15 just 15 bucks a month.
Unnamed Journalist
So give it a try@mintmobile.com Switch upfront.
Jason Kebler
Payment of $45 for three month plan equivalent to $15 per month required new customer offer for first three months only. Speed slow after 35 gigabytes of network's busy taxes and fees extra. See mintmobile.com.
Sean Ramis
Today. Explain is back with John Herman. Now he's a tech columnist at New York Magazine. John, in the first half of the show we're talking about how this anthropic case and judgment, you know, may or may not change the extent to which these big AI models can scrape the Internet. But I want to talk to you about how all this scraping has already in some ways broken the Internet as we know it and how we use it. You wrote about how AI has broken maybe like you know, the front page of the Internet for a lot of people. Google.com, tell us how Google could not.
John Herman
Be closer to the center of like this recent AI boom. On one hand, they are a company that has really deep roots in that space. They published like the foundational research for what then became generative AI as we know it. They've put it in all their products. If you use any Google thing, you are seeing like chatbots everywhere.
Jason Kebler
Take notes with Gemini. Summarize this file, summarize a folder, refine this document. Find inspiration easily. Fresh ideas, elevate your writing, get clear, constructive, improved sentence flow, word choice.
John Herman
They are all in on AI. Google search in particular has AI overviews at the top. There's a new AI search mode that works like a chatbot instead of a search engine. Google making a rare change to its homepage, the most visited website in the world, pushing its AI mode tool directly into the hands of its billions of users.
Jason Kebler
With this latest move, it is changing what billions of people see when they open their browser. Still the on ramp for the entire Internet.
Unnamed Journalist
Meet AI Mode.
John Herman
Ask detailed Questions for better responses. AI on Google search can provide information. While that was all happening, AI was also sort of accelerating this feeling of decline in the Google product, which over the years, through this back and forth battle between the company and search engine optimizers and companies trying to get an edge on Google and this sort of long running dynamic had become a little spammy, a little overloaded with ads.
Sean Ramis
Have you noticed that Google sucks lately?
John Herman
I'm talking about their search.
Jason Kebler
It sucks.
John Herman
Why is it so hard to find.
Unnamed Journalist
Anything on Google search? Google search is terrible. It's bought and it's sold. Five or six links up top, all paid for. It's just garbage, pure unadulterated garbage.
John Herman
But I think a lot of people would agree that using Google in say 2023 was a kind of a degraded experience compared to 10 years prior. It was kind of cluttered, there was more just junk in it, there were more ads all over the interface. But also the stuff you were getting in search was a lot of low quality, cheaply made, aggregated content, stuff that was taken from somewhere else in an effort to sell a product or just serve up some ads. The arrival of generative AI tools which enable the creation of basically infinite passable content almost for free, really accelerated that issue. So on one side you have the big ecosystem that Google guides people to that is in a sort of collapse because of this massive shock of new AI generated content. On the other side you have Google the product becoming more and more AI centric, and in the middle you have kind of a complicated thing story. And honestly for search users and regular people, kind of a strange experience.
Sean Ramis
Do they have a plan to make money off of this? Obviously they want to make money. Has anyone asked what their long term plan is?
John Herman
So there are obvious risks to throwing away this cluttered but lucrative product and replacing it with a totally clean chatbot or whatever. That's not what they're doing. They are incorporating AI answers into the main search page, which they say people like quite a bit. So this last quarter has been really good for them. It also arrived in the context of lots of like really strong data suggesting that the way people use Google search now with these AI tools means that they don't really leave it anymore. They don't really click out and go to anything. An AI overview might summarize three articles, archival resource, some expert opinions, but the number of people that actually then click through to those opinions or to those articles is minuscule. So Google's relationship to the web around it is pretty dramatically different.
Sean Ramis
If Google's like eating up the rest of the Internet. If Gemini is eating up the rest of the Internet right now, and companies like ours, let's say, are no longer meeting their traffic goals, are no longer getting any traffic from Google at all, does Gemini have nothing to eat? You know what I mean? Because everything dies. Who's going to be feeding Gemini all the right answers in like, 10 years?
John Herman
We're sort of like glorifying the web a bit in this conversation. No matter how great and incredible it is as this big resource, it really doesn't go that deep. And the idea that it is now being sort of like trawled and overfished and just sort of consumed like a resource by these AI companies really does, I think, raise the specter of collapse. I do think that they could find that their products are being made worse by this dynamic and by their relationship with the web. I do think that's a real problem. And you can see this in some of the deals that these companies make with publishers, including our parent company which has a deal with OpenAI, for example.
Sean Ramis
Remind people out there or me why companies like ours make deals with companies like ChatGPT.
John Herman
The context is every media company is struggling for visitors. Even before the Google traffic really started to collapse, it was sort of unstable. And so in addition to like a weak advertising market, every media company is looking for any sort of additional source of revenue. And if you're a media executive, OpenAI showing up and saying, here is this many millions of dollars for this many years, it looks like free money. Of course, if you're like, producing the content or if you're even just thinking longer term about how a media company or website fits into this AI picture, you recognize that you're sort of, you know, giving access away to something that these companies are explicitly trying to automate. You know, you're sort of like in a, in a institutional sense, training your replacement.
Jason Kebler
You're listening to AI explains today, but.
John Herman
It is a deal made not quite under duress, but something close to that.
Sean Ramis
For people who miss that old version of the Internet, who miss going to Google, typing in a query, getting a bunch of results, clicking on a few of them, getting answers that felt credible, where do they go for that experience now?
John Herman
I think there's like a funny, polarized answer to this. I just did a story on Reddit which is having a huge moment right now. It's been around for 20 years. It's growing hugely. Part of it is just a response to social media fatigue, the sense that other communities on the web don't really exist anymore, that everything else on the web is too commercial and whatever. Also, huge part of that growth is just traffic from Google. They're having the fastest growth they've had in almost their entire existence. Because Google is just shoveling so many people into rev Reddit because everything else is not really working. So you have that, you have a community of communities. You have something that feels kind of like it's of the old web.
Sean Ramis
It seems like eventually we're going to get to the point where it's like you either want to talk to one of these large language models or you just go back to calling up your friend. I don't even know where it gets you just walk into the street and yell, does anyone know of a good barber?
John Herman
Yeah, I mean, it's like a real. The mutual suspicion about who's using AI is really pervasive and especially online, but also in person. But yeah, I do think that the way that the AI training paradigm and some of the stuff that you were talking about with Anthropic, but also just the way that Google incorporates all this stuff, it really does kind of break the deal with the whole idea of the public web. Like, all right, we'll all just do this stuff in public. We'll talk to each other. People will build all these businesses around this to sort of connect everything and it'll all sort of work together and whatever. When you have like these massive sort of predatory companies just consuming all of that, harvesting all of that and saying, all right, we are no longer part of this arrangement, we are doing something else. More people are on discord, more people are in group chats. More people are are either just purely consuming on social networks and not posting or just talking privately with their friends. And I do think that this fits quite well with that trend and probably accelerates it.
Sean Ramis
John Herman, you can read and subscribe to New york magazine@nymag.com Gabrielle Burbay produced. Aminah Al Saadi edited. Rebecca Ibarra Fact checked. Patrick Boyd and Andrea Christensdotter mixed. And by the way, Vox's Future Perfect is funded in part by the BEC foundation, whose major funder was also an early investor in Anthropic. And none of them have any editorial input into the stuff we make here at Vox. Speaking of stuff, we hope you enjoyed the 1700th episode. If you did, you can say something nice about us most anywhere you listen. And if you didn't, well, there's always episode 1701 tomorrow.
Today, Explained Podcast Summary: "AI is Killing the Internet"
Release Date: July 30, 2025
Host/Author: Vox
Guests: Jason Kebler (Tech Reporter and Co-founder of 404 Media), John Herman (Tech Columnist at New York Magazine), Unnamed Journalist
[00:00] Sean Ramis:
Artificial intelligence is aggressively scraping the internet, consuming vast amounts of website data to deliver personalized content. This unchecked ingestion has led to significant backlash from content creators across various industries.
[00:36] Jason Kebler:
A notable player in this controversy is Anthropic's AI, Claude, which recently won a court case—a decision that has alarmed many content creators.
[02:10] Unnamed Journalist:
As a journalist and small publication owner, the speaker expresses deep concern over AI companies facing numerous lawsuits regarding copyright infringement. The focus is on the recent case against Anthropic, where three authors—Andrea Bart, Charles Graeber, and Kirk Wallace Johnson—accuse the company of misusing their copyrighted works without permission or compensation.
[03:32] Unnamed Journalist:
The crux of the lawsuit revolves around "Books3," a controversial dataset containing hundreds of thousands of books. The Atlantic exposed this dataset, allowing authors to check if their works were included. Many authors discovered their books were used to train AI models without their consent, forming the basis of their legal actions.
[05:28] Unnamed Journalist:
The key legal argument centers on whether mass scraping and using copyrighted material for training large language models constitutes fair use under Copyright Law, specifically Section 107. The judge ruled that while the usage was transformative and thus fair, the method of acquiring the books via piracy was unlawful.
[07:29] Unnamed Journalist:
Despite the ruling favoring Anthropic on the fair use aspect, the court found that the acquisition of the books through piracy violated copyright laws. This nuanced decision leaves both AI companies and authors in a precarious position, highlighting the complexities of AI training practices.
[09:14] Unnamed Journalist:
Anthropic’s strategy involved mass-pirating books from websites like Libgen and Pirate Library Mirror, as well as purchasing physical books from used bookstores to digitize and incorporate into their models. This aggressive data acquisition sets a concerning precedent, as all major AI companies employ similar tactics to gather data.
[11:35] Unnamed Journalist:
The lawsuit underscores a broader issue: AI companies are heavily reliant on vast datasets, often obtained through dubious means. The potential legal repercussions could be severe, but given the economic and geopolitical significance of AI, substantial penalties may be unlikely.
[15:12] Sean Ramis:
John Herman discusses how Google, traditionally the central hub of internet searches, is transforming its platform to integrate AI, thereby altering how users interact with the web.
[16:10] John Herman:
Google has deeply integrated AI into its search engine, introducing features like Gemini that function as chatbots rather than traditional search interfaces. This shift aims to provide more direct and personalized answers but has led to a decline in the quality and reliability of search results.
[17:16] Sean Ramis:
The hosts note a growing frustration among users regarding the cluttered and ad-heavy nature of Google’s search results, exacerbated by the influx of low-quality, AI-generated content.
[17:31] John Herman:
The integration of AI into Google's search functionality has significantly reduced user engagement with external websites. People rely more on AI-generated summaries and less on clicking through to diverse sources, diminishing traffic to content creators and traditional websites.
[20:11] John Herman:
The pervasive scraping by AI companies is likened to overfishing an ecosystem. Just as ecosystems collapse when resources are overexploited, the internet may suffer a similar fate as AI companies consume vast amounts of data, potentially degrading the quality and diversity of online content.
[21:55] Jason Kebler:
Media companies are caught in a dilemma, struggling with declining traffic and seeking new revenue streams. Partnerships with AI companies like OpenAI offer financial relief but risk training AI to replicate and potentially replace human-driven content creation.
[22:21] John Herman:
As AI becomes more integrated into daily internet use, traditional platforms like Reddit are experiencing growth. Users are migrating to community-driven sites as they seek alternatives to AI-dominated search and content consumption.
The episode concludes by highlighting the transformative impact of AI on the internet. While AI technologies like Claude and Google's Gemini offer innovative solutions, they also pose significant challenges to content creators, traditional businesses, and the overall quality of online information. The legal landscape remains uncertain, and the balance between technological advancement and intellectual property rights continues to be a contentious battleground.
Notable Quotes:
"AI is Killing the Internet" delves deep into the intricate relationship between AI development and internet content creation. The episode underscores the urgent need for clearer regulations and ethical guidelines to ensure that AI advancements do not come at the expense of creators' rights and the integrity of the internet.