Anthropic’s Legal Wake-Up Call - The Last Invention is AI

Summary5 min read

Podcast Summary

Joe Rogan Experience for AI

Episode: Anthropic’s Legal Wake-Up Call
Date: September 15, 2025
Host: Joe Rogan Experience for AI

Overview

This episode tackles the significant legal development in the AI world: Anthropic's $1.5 billion copyright settlement with writers. The host explores both sides of the issue—why some see this as progress for technology companies, while others view it as inadequate support for writers. Central themes include the precedent this sets for AI training on copyrighted works, the implications for future lawsuits, and the ongoing debate over fair compensation and fair use in the age of generative AI.

Key Discussion Points & Insights

1. The Settlement: Background and Impact

Historic Settlement:
- Anthropic agreed to a $1.5 billion settlement, the largest payout in US copyright history, with writers whose works were included in Anthropic’s AI training data.
- ~500,000 writers are eligible, each could receive around $3,000.
Divergent Reactions:
- The host acknowledges split opinions: "Not everyone is happy about this, but for AI companies, this is, you know, positive for the industry." [01:00]
- TechCrunch’s take: “Screw the money. Anthropic's $1.5 billion copyright settlement sucks for writers.”
Underlying issues:
- Dispute stems from Anthropic using both legitimately purchased books and vast quantities of pirated content ("shadow libraries") for training its AI, specifically its Claude language model.
- The host reflects on the irony: “All the writers I know use Claude because, like, yeah, the tone's way better. And that's because they grabbed a copy, a pirate copy of every single book.” [05:12]

2. How Anthropic Built Their Dataset

Internet Data Exhausion:
- Early days of AI saw companies "scraping the whole internet." Once that ran dry, books became the next sought-after source.
Books from Shadow Libraries and Legit Purchases:
- Anthropic used “pirated sources called, quote, unquote, shadow libraries,” containing millions of books.
- Later, to cover themselves, they bought physical copies of “like every book in the world,” and scanned and transcribed them via robots.
Legal Implications:
- The settlement distinguishes between the legality of pirated books (not permitted) and purchased books (permitted for model training).

3. The Legal Ruling and Precedent

Key Judicial Decision:
- Judge William Allsupport’s June ruling: It is legal to train AI on copyrighted material if transformative, citing fair use from the Copyright Act of 1976.
- Notable quote: “Like any reader aspiring to be a writer, Anthropics LLMs train on... works not to race ahead and replicate or supplement them, but to turn a hard corner and create something different.” [18:48]
Fair Use Doctrine:
- Scanning and learning from purchased works: allowed, as it’s seen as analogous to a person reading and synthesizing a book’s knowledge.
- Using pirated works: not allowed and basis for settlement.
Industry Precedent:
- Multiple similar lawsuits pending against OpenAI, Meta, Google, and others.
- This case is likely to serve as the benchmark moving forward: “I think Anthropic is going to come out ahead for this. And I think a lot of people are happy with the precedent because now they, they know, like, the right way they can do this.” [14:51]

4. Financial Impact on Anthropic

Recent Fundraising:
- Anthropic’s recent $13 billion raise cushions the blow: “So paying out 1.5 is not going to kill them.” [15:50]
Scale of Penalty:
- Contextualizes $1.5B versus Anthropic’s previous funding ($3.5B): the timing of recent financing makes a huge difference.

5. Compensation Debates and Technical Limits

Calls for Ongoing Royalties:
- Some argue for perpetual, recurring compensation for authors as long as models are trained on their books.
- Host’s skepticism: "I think the cat's out of the bag... it's too late now anyways, because basically if you have the pirated copies in the model, you could just use the old model to train a new model." [09:13]
Tracking and Attribution Challenges:
- The difficulty (or impossibility) of tracking model outputs back to specific sources makes Spotify-like royalty models (pay-per-use) unworkable.
- Adobe’s approach with Firefly (upfront compensation, not ongoing) is discussed as a possible industry standard.

6. Industry Ramifications and Future Outlook

Precedent for Other Sectors (e.g., Music, Images):
- Host predicts future lawsuits and settlements in music and other creative fields.
- Suggests that companies will eventually standardize on buying copies for training rather than dealing with endless litigation.
Host’s Stance:
- Prefers moving forward rather than “realistic to set up systems where... you can use that model to spit out more outputs that other models can use to train on.” [26:03]
- Calls for a practical resolution, accepting that models are now more useful than harmful.

Notable Quotes & Memorable Moments

On the core controversy:
- “Not everyone is happy about this, but for AI companies, this is, you know, positive for the industry.” [01:00]
On data sourcing realities:
- “Everyone basically scraped the Internet at the very beginning. OpenAI scraped the whole Internet at the beginning. Everyone did.” [03:12]
On the legitimization of transformative use:
- “Like any reader aspiring to be a writer, Anthropics LLMs train on... works not to race ahead and replicate or supplement them, but to turn a hard corner and create something different.” (Judge William Allsupport) [18:48]
On irreversible piracy impact:
- “I think the cat's out of the bag. It's kind of too late, honestly, with the shadow libraries, it's too late now anyways…” [09:13]
On practical limitations of digital royalty tracking:
- “It's impossible to know like what the original source was... I think it's, I don't think it's realistic to set up systems where like, once you're included in a data set, now all of a sudden you can use that model to spit out more outputs that other models can use to train on. It's just really, it's, it's lost.” [25:47]

Important Timestamps

01:00 — Host sets up the controversy and references differing opinions.
03:12 — Context on widespread early internet scraping by AI companies.
05:12 — Discussion of shadow libraries and why Claude’s tone is so effective.
09:13 — Reflections on the irreversibility of pirated data being in models.
14:51 — Legal precedent and industry implications are discussed.
15:50 — Anthropic’s finances and impact of the settlement.
18:48 — Judge’s key quote on fair use and transformative application.
25:47 — Technical challenges with per-use or perpetual compensation models.

Conclusion

The episode offers an incisive, balanced look at Anthropic’s copyright settlement and its ripple effects for the AI industry, copyright holders, and model training practices. The host conveys a cautiously optimistic outlook for AI companies while remaining realistic about legal and practical challenges in tracking and compensating creators. This legal milestone, the host suggests, is both a wake-up call for AI firms and a marker for how the AI copyright landscape will evolve.

Loading summary

Transcript1 lines

[00:00]
A
Anthropic has just reached a historic $1.5 billion settlement with writers for copyright. Now, this is basically a lot of people are excited about this, and there's also a lot of people that are not excited about this. Personally, I'll fall in the camp that is happy with the direction of the settlement and the basically the verdict of it. I'll break down both sides, but there's definitely people that are unhappy. If you go over to TechCrunch, they have a whole article that says, screw the money. Anthropic's $1.5 billion copyright settlement sucks for writers. So. So I'll. I'll put it out there. Not everyone is happy about this, but for AI companies, this is, you know, positive for the industry. So let's dive into all of it. Before we do, I wanted to say if you want to try any of these models I talk about on the show, including all of the cloud models from Anthropic, I'd love for you to go check out my own startup, which is AI box. AI. We have the top 40 different AI models all in one place. So you pay 20 bucks a month and you get access to 40 different AI models, so you don't have to have subscriptions to everything. But. But in addition, we just launched our AI app builder, which is basically a little box like ChatGPT. And you type in a tool that you want to create, and it will chain together multiple AI models, it will put prompts in, and it will basically build you a tool. You can go and customize it. We're really excited about this, and this is what we've really been working on for the last two and a half years. So if you want to go try out the no Code AI App builder on AI Box, there's a link in the description. And. And I would love to hear your thoughts as we're actively fixing things, adding things, and it's. It's an exciting time for us over here. All right, let's get into the episode. So basically, if you go over to TechCrunch, like I mentioned, they're not super excited about this, but how this is rolling out is that about 500,000 writers are going to be eligible for about $3,000 in one. In this $1.5 billion settlement, a group of writers brought this lawsuit against Anthropic. It's kind of interesting because it's not just the Anthropic trained off of their data. And that' of what a lot of people are complaining about with this lawsuit is how that was how. That is kind of the shakeout on that. But basically this is the largest payout in the history of US copyright law. And it is, I think, really exciting. So for me, anyways. But some people do not think this is a win for authors. It is just a win for. It is by TechCrunch. It is, quote, yet another win for tech companies. Everyone is trying to get as much data as you possibly can to train the models. I think we all know this. Everyone basically scraped the Internet at the very beginning. OpenAI scraped the whole Internet at the beginning. Everyone did. And then people complained about that. Oh, you scraped the blog post. So that was kind of like a thing. What actually ended up happening is these AI model companies ran out of data. They wanted more data. They ran out, like they scraped the Internet. And so a really interesting untapped source was books, right? Because books a lot of times are not actually online. Google has kind of their Google Scholar, I believe, a kind of project that has like photocopied books and put all the pages on. I don't think they allow people access to basically use that. And there's not all books on there. There's a ton that are not. So what Anthropic ended up doing was, and this is what got them in trouble is they went to a bunch of pirated sources. They're called, quote, unquote, shadow libraries. So there's millions of books in there and they're. They're pirated sources. But it's not just a photocopy. Like, they. People have like, uploaded the full book. There's the transcripts you copy and paste, right? So it's very easy data for these AI models to ingest. And it's virtually impossible to have gotten that, that data set as fast as it did. So they were able to grab all of these pirated libraries, throw them into the model, and Claude got way better. I think this is one of the reasons why, even compared to OpenAI from the early days, Claude has always had a much better tone and how it talks and writes. It's been way better for writing, basically. It's kind of ironic, but all the writers I know use Claude because, like, yeah, the tone's way better. And that's because they grabbed a copy, a pirate copy of every single book. Now, I think they kind of knew they were in hot water with this. They were in trouble. Maybe they're trying to cover their tracks or cover their back. And what they ended up doing was going and buying one of like every book in the world. Like something crazy, right? And when you have billions of dollars, it's just a cost of doing business. Then they basically had a robot that would take each of these books, would flip through the pages, scan the pages, and then transcribe the pages and then like basically take a picture and they can read the picture and then include that into the model data training. They did both of those things. And when the lawsuit came, they got in trouble for obviously, the millions of books on a pirated library. And this is actually what the, the, the, you know, billion dollar settlement is coming from. Now, the judge actually ruled in this case that, you know, what they were doing where they were scanning all of the books and uploading them. And like, basically they purchased the book and then they include it in the days that they said that is allowed because it's the same as if a person goes and buys the book and reads the book, has the knowledge and they go write like some sort of paper or some sort of essay on it and that's monetized. Like you're allowed to do that because you gained knowledge. And so this is kind of like what they did. They paid for the book and they gained knowledge. So you're not allowed to use pirated books, but if you buy the book, you can include it in your data set. And so a lot of people are upset because they're like, you know, those authors should have reoccurring compensation forever if you want. If they want to be included, they should be able to be pulled in and out. I think the cat's out of the bag. It's kind of too late, honestly, with the shadow libraries, it's too late now anyways, because basically if you have the pirated copies in the model, you could just use the old model to train a new model. And so even if you're like, okay, we're not using the old model anymore, the data is already in there, the tone's already in there. It's kind of too late at this point. And so it's now just like, what's their fine? So the fine was $1.5 billion. What's interesting is there is dozens of lawsuits filed against companies like Meta, Google, OpenAI and midjourney over basically all the legalities of training AI on copyrighted work. So this isn't the first. I don't think this is going to be the last copyright one that we see come out. I think Anthropic is going to come out ahead for this. And I think a lot of people are happy with the precedent because now they, they know, like, the right way they can do this. I think everyone's kind of solved this problem at this point, but it's nice to know that for them anyways, that the way they've solved it is something that they can continue to do into the future and they're not going to get in trouble for. So right now, writers are basically getting a settlement if their work was included in all of the. In all of the pirated stuff. So what's interesting is Anthropic actually just raised $13 billion. Did a podcast on that, if you're interested, on the dynamics of that. But they just raised $13 billion. So paying out 1.5 is not going to kill them. Right. Their Last raise was 3.5. And I imagine if they had to pay out 1.5 billion of 3.5 just for that one lawsuit, that would be hurting them quite a lot. I think with this fresh round of funding, they can move forward and they'll be fine. But all of this happened because in June, federal judge William Allsupport sided with Anthropic and ruled that it is legal to train AI on copyrighted material. He argued that this use case is transformative enough to be protected by the fair use doctrine that is carved out of copyright law that was set back in 1976. So he said, quote, like any reader aspiring to be a writer, Anthropics LLMs train on open work upon works, not to race ahead and replicate or supplement them, but but to turn a hard corner and create something different. The piracy obviously was a completely different problem. And that's why he let the case go to trial, was because of that. And this is what Anthropic said about this whole thing. They said, quote, today's settlement, if approved, will resolve the plaintiff's remaining legacy claims. This is Aparna Sridhar, who is their deputy general counsel at Anthropic. And then they also said, we remain committed to developing safe AI systems that help people and organizations extend their capabilities and advance scientific discovery and solve complex problems. So there are tons more cases that are currently being litigated right now between AI and copyrighted works. But because of this, I think this, you know, Barts versus Anthropic is going to be basically a precedent in all of these things. And so I think that there's. Some people think that because of the ramifications of it, maybe judges are going to arrive at a different conclusion. But I think basically this precedent is going to hold and we're going to start to see that A lot of these cases are, I won't move forward. If you're using pirated stuff, obviously you can get in trouble. But if you purchased whatever the original work was, and like, you kind of think of like music generators, which I, you know, this would be funny, and they'll probably all get slaps on the wrist too. But like, you'd have to go buy one copy of like every top song ever recorded for the last hundred years. And then, I don't know, whatever a billion dollars you spend on that, you know, all the music in the world, then you can feed that into your AI model. So, so to. To make music. So I think basically we have a precedent for how it should go. In my opinion. I think this might be the way you have to go. Because one of the big problems that like Adobe tried to solve with image generation was they were like, they're like, look, we'll pay people if you include your images in our data set. They paid the original photographers for Adobe Firefly images. But they, it's impossible to know, like when I say, you know, generate a picture of a green plant on a stand with a flag in the background, like, what data was used to create that image, like, what was needed. So it's not like you could do it like Spotify, where if you listen to a song, they get, you know, you, they just listen to that song. So now you give them a couple cents, you give them a penny in streaming revenue. It's impossible to know like, what the original source was. So basically Adobe did it where they just took in a huge data set of images and they're like, look, you know, we took in a million images. So let's say there's a million photographers each put in one image. They all get like one, one millionth of the, of the royalty or revenue or whatever. And so it's, it's basically like the same thing with a lot of these, where it's impossible to. I think it's, I don't think it's realistic to set up systems where like, once you're included in a data set, now all of a sudden you can use that model to spit out more outputs that other models can use to train on. It's just really, it's, it's lost. So I think it's impossible to track everyone's copyrighted data forever. And we probably should just move forward. If we all agree that these AI models are more useful for us than, than harmful, let's just move forward. And yeah, that's my opinion, but I know everyone has different opinions on this. In any case, thank you so much for tuning into the podcast today. Make sure you go check out AI box. There is an amazing new no code AI app builder that we just integrated and launched and I'm super excited about it. I'd love to hear your thoughts on it. Thank you so much and hope you have a fantastic rest of your day.