Today, Explained Podcast Summary: "AI is Killing the Internet"
Release Date: July 30, 2025
Host/Author: Vox
Guests: Jason Kebler (Tech Reporter and Co-founder of 404 Media), John Herman (Tech Columnist at New York Magazine), Unnamed Journalist
Introduction to the AI Scraping Controversy
[00:00] Sean Ramis:
Artificial intelligence is aggressively scraping the internet, consuming vast amounts of website data to deliver personalized content. This unchecked ingestion has led to significant backlash from content creators across various industries.
[00:36] Jason Kebler:
A notable player in this controversy is Anthropic's AI, Claude, which recently won a court case—a decision that has alarmed many content creators.
Legal Battles: Authors vs. AI Companies
[02:10] Unnamed Journalist:
As a journalist and small publication owner, the speaker expresses deep concern over AI companies facing numerous lawsuits regarding copyright infringement. The focus is on the recent case against Anthropic, where three authors—Andrea Bart, Charles Graeber, and Kirk Wallace Johnson—accuse the company of misusing their copyrighted works without permission or compensation.
[03:32] Unnamed Journalist:
The crux of the lawsuit revolves around "Books3," a controversial dataset containing hundreds of thousands of books. The Atlantic exposed this dataset, allowing authors to check if their works were included. Many authors discovered their books were used to train AI models without their consent, forming the basis of their legal actions.
[05:28] Unnamed Journalist:
The key legal argument centers on whether mass scraping and using copyrighted material for training large language models constitutes fair use under Copyright Law, specifically Section 107. The judge ruled that while the usage was transformative and thus fair, the method of acquiring the books via piracy was unlawful.
[07:29] Unnamed Journalist:
Despite the ruling favoring Anthropic on the fair use aspect, the court found that the acquisition of the books through piracy violated copyright laws. This nuanced decision leaves both AI companies and authors in a precarious position, highlighting the complexities of AI training practices.
Implications for the AI Industry and Internet Ecosystem
[09:14] Unnamed Journalist:
Anthropic’s strategy involved mass-pirating books from websites like Libgen and Pirate Library Mirror, as well as purchasing physical books from used bookstores to digitize and incorporate into their models. This aggressive data acquisition sets a concerning precedent, as all major AI companies employ similar tactics to gather data.
[11:35] Unnamed Journalist:
The lawsuit underscores a broader issue: AI companies are heavily reliant on vast datasets, often obtained through dubious means. The potential legal repercussions could be severe, but given the economic and geopolitical significance of AI, substantial penalties may be unlikely.
Google’s Shift to AI and the Decline of Traditional Search
[15:12] Sean Ramis:
John Herman discusses how Google, traditionally the central hub of internet searches, is transforming its platform to integrate AI, thereby altering how users interact with the web.
[16:10] John Herman:
Google has deeply integrated AI into its search engine, introducing features like Gemini that function as chatbots rather than traditional search interfaces. This shift aims to provide more direct and personalized answers but has led to a decline in the quality and reliability of search results.
[17:16] Sean Ramis:
The hosts note a growing frustration among users regarding the cluttered and ad-heavy nature of Google’s search results, exacerbated by the influx of low-quality, AI-generated content.
[17:31] John Herman:
The integration of AI into Google's search functionality has significantly reduced user engagement with external websites. People rely more on AI-generated summaries and less on clicking through to diverse sources, diminishing traffic to content creators and traditional websites.
The Internet’s Transformation and Future Prospects
[20:11] John Herman:
The pervasive scraping by AI companies is likened to overfishing an ecosystem. Just as ecosystems collapse when resources are overexploited, the internet may suffer a similar fate as AI companies consume vast amounts of data, potentially degrading the quality and diversity of online content.
[21:55] Jason Kebler:
Media companies are caught in a dilemma, struggling with declining traffic and seeking new revenue streams. Partnerships with AI companies like OpenAI offer financial relief but risk training AI to replicate and potentially replace human-driven content creation.
[22:21] John Herman:
As AI becomes more integrated into daily internet use, traditional platforms like Reddit are experiencing growth. Users are migrating to community-driven sites as they seek alternatives to AI-dominated search and content consumption.
Conclusion: The Future of the Internet in an AI-Driven World
The episode concludes by highlighting the transformative impact of AI on the internet. While AI technologies like Claude and Google's Gemini offer innovative solutions, they also pose significant challenges to content creators, traditional businesses, and the overall quality of online information. The legal landscape remains uncertain, and the balance between technological advancement and intellectual property rights continues to be a contentious battleground.
Notable Quotes:
- Sean Ramis [00:00]: "Artificial intelligence is scraping the Internet. It's gorging all the websites to give you what you want."
- Unnamed Journalist [03:32]: "The judge determined... that the scraping of these three authors' books was considered fair use under copyright law."
- John Herman [16:10]: "Google search in particular has AI overviews at the top. There's a new AI search mode that works like a chatbot instead of a search engine."
- Unnamed Journalist [09:14]: "These AI companies are grabbing data from wherever they can find it. It's like a huge arms race to see who can get the most data from the most number of places."
Final Thoughts
"AI is Killing the Internet" delves deep into the intricate relationship between AI development and internet content creation. The episode underscores the urgent need for clearer regulations and ethical guidelines to ensure that AI advancements do not come at the expense of creators' rights and the integrity of the internet.
