Summary5 min read

Podcast Summary: The Joe Rogan Experience of AI

Episode Title: Anthropic’s $1.5B Copyright Settlement
Air Date: September 15, 2025
Host: The Joe Rogan Experience of AI

Overview

This episode examines Anthropic’s landmark $1.5 billion settlement with writers over copyright infringement, making it the largest payout in US copyright history. The host analyzes the context, debates surrounding the settlement, technological and legal nuances, and future implications for writers, AI companies, and copyright law. The discussion is interspersed with pointed commentary, references to current news articles, and explanations about the mechanics of AI model training on copyrighted data.

Key Discussion Points & Insights

1. Introduction to the Settlement and Industry Reaction

General Overview (02:00): Anthropic agreed to a $1.5 billion payout to settle claims that it trained AI models on copyrighted works, specifically from books obtained through 'shadow libraries' or pirated sources.
Mixed Reception:
- Some hail the settlement as a positive industry precedent.
- Others, especially writers and advocacy groups, feel it's insufficient — “Not everyone is happy about this, but for AI companies, this is, you know, positive for the industry.” (A, 00:18)
- Citing TechCrunch: “Screw the money. Anthropic's $1.5 billion copyright settlement sucks for writers.” (A, 00:26)

2. Details of the Lawsuit and Data Usage

Scale of the Payout:
- 500,000 writers are eligible for about $3,000 each from the settlement.
- Described as the “largest payout in the history of US copyright law.” (A, 01:35)
AI Model Training & Data Acquisition:
- Early model training involved indiscriminate scraping of the internet, which eventually led to a shortage of new data.
- Books became a new target due to their depth and lack of digitization.
- “They were able to grab all of these pirated libraries, throw them into the model, and Claude got way better.” (A, 03:34)

3. Pirated vs. Purchased Content

Shadow Libraries:
- Anthropic reportedly scraped millions of books from pirated databases, not just photocopies but full transcriptions.
Attempt to Mitigate:
- Anthropic also purchased massive quantities of books, using robots to scan, transcribe, and include them in their training datasets—described as just “the cost of doing business” for well-funded AI startups. (A, 04:17)
Legal Ruling Distinction:
- Judge permitted usage of legitimately purchased books for model training, treating it like a person reading and internalizing a book’s knowledge for their own work.

4. The Legal Precedent

June Ruling:
- Federal Judge William Allsupport's ruling: Training AI on copyrighted work is "transformative enough" to fall under fair use (A, 10:55).
  - Quote: “Like any reader aspiring to be a writer, Anthropic’s LLMs train on works... not to race ahead and replicate or supplement them, but to turn a hard corner and create something different. The piracy obviously was a completely different problem.” (Judge Allsupport, 11:14)
Piracy as the Core Issue:
- The court distinguished between using pirated content (illegal) and legitimately purchased content (legal for training).

5. Compensation Critiques and Practical Limitations

Critique from Authors and Media:
- Dissatisfaction because the compensation isn’t ongoing, nor do authors have ongoing control over use of their work.
- “A lot of people are upset because... those authors should have reoccurring compensation forever if they want.” (A, 06:41)
Irrevocability of Data Inclusion:
- Once a model is trained on data—especially pirated data—it can’t be “untrained”; “the cat’s out of the bag.” (A, 06:53)
Impossibility of Micro-Payments for Generative AI:
- It's not feasible to track and compensate every data contributor per generated output, especially as models remix vast datasets to produce results.
- Example: Adobe Firefly pays photographers for dataset inclusion, not per generated output; similar logic applies for text and music generation. (A, 17:35)

6. Broader Industry Implications

Precedent Setting:
- The Anthropic settlement is seen as a major precedent for other ongoing copyright lawsuits against AI companies (Meta, Google, OpenAI, MidJourney, etc.).
- Likely outcome: If companies use pirated material, they'll face penalties; if they buy and scan works, it’s fair game under the new precedent.
Funding and Business Resilience:
- Anthropic can weather the financial hit thanks to its $13 billion funding round (A, 09:45), viewing the settlement as a manageable cost.

7. Host’s Perspective

Host Approval:
- Supports moving forward with models trained on purchased works and sees ongoing attempts at granular compensation as technologically unfeasible.
- “If we all agree that these AI models are more useful for us than, than harmful, let's just move forward.” (A, 19:29)
Ongoing Debate:
- Acknowledges deep division among stakeholders and predicts further litigation but sees the direction as pragmatic.

Notable Quotes & Memorable Moments

“This is the largest payout in the history of US copyright law. And it is, I think, really exciting. So for me, anyways. But some people do not think this is a win for authors. It is just a win for... tech companies.” (A, 01:35)
On AI writer tools: “It's kind of ironic, but all the writers I know use Claude because, like, yeah, the tone's way better. And that's because they grabbed a copy, a pirate copy of every single book.” (A, 03:55)
On fair use ruling: “He argued that this use case is transformative enough to be protected by the fair use doctrine that was set back in 1976.” (A, 10:45)
Federal Judge’s view: “Like any reader aspiring to be a writer, Anthropic’s LLMs train on works... not to race ahead and replicate or supplement them, but to turn a hard corner and create something different.” (Judge William Allsupport, 11:14)
On the impossibility of per-output tracking: “It's impossible to know... what data was used to create that image... It's not like you could do it like Spotify.” (A, 18:03)

Timestamps for Key Segments

00:00 – 01:35: Introduction and context; initial reactions from media and industry.
03:00 – 05:00: Methods used by Anthropic to acquire data; pirated vs purchased books.
06:00 – 08:00: Details on the legal distinction; why piracy sparked litigation.
09:45 – 11:40: Anthropic’s funding; Judge Allsupport’s reasoning; fair use doctrine.
15:30 – 19:00: Adobe Firefly analogy; technical challenges in per-contributor compensation.
19:10 – 20:00: Closing thoughts and host perspective on moving forward.

Conclusion

The episode delivers a thorough yet accessible breakdown of Anthropic’s record-setting copyright settlement, untangling the legal, technical, and ethical issues at play. With a conversational, critical, and pragmatic tone, the host guides listeners through the evolving landscape of AI, copyright, and authorship—highlighting that while a legal path forward is emerging, the debate is far from over.

Loading summary

Transcript1 lines

[00:00]
A
Anthropic has just reached a historic $1.5 billion settlement with writers for copyright. Now, this is basically a lot of people are excited about this, and there's also a lot of people that are not excited about this. Personally, I'll fall in the camp that is happy with the direction of the settlement and the basically the verdict of it. I'll break down both sides, but there's definitely people that are unhappy. If you go over to TechCrunch, they have a whole article that says, screw the money. Anthropic's $1.5 billion copyright settlement sucks for writers. So. So I'll. I'll put it out there. Not everyone is happy about this, but for AI companies, this is, you know, positive for the industry. So let's dive into all of it. Before we do, I wanted to say if you want to try any of these models I talk about on the show, including all of the cloud models from Anthropic, I'd love for you to go check out my own startup, which is AI box. AI. We have the top 40 different AI models all in one place. So you pay 20 bucks a month and you get access to 40 different AI models, so you don't have to have subscriptions to everything. But. But in addition, we just launched our AI app builder, which is basically a little box like ChatGPT. And you type in a tool that you want to create, and it will chain together multiple AI models, it will put prompts in, and it will basically build you a tool. You can go and customize it. We're really excited about this, and this is what we've really been working on for the last two and a half years. So if you want to go try out the no Code AI App builder on AI Box, there's a link in the description. And. And I would love to hear your thoughts as we're actively fixing things, adding things, and it's. It's an exciting time for us over here. All right, let's get into the episode. So basically, if you go over to TechCrunch, like I mentioned, they're not super excited about this, but how this is rolling out is that about 500,000 writers are going to be eligible for about $3,000 in one. In this $1.5 billion settlement, a group of writers brought this lawsuit against Anthropic. It's kind of interesting because it's not just the Anthropic trained off of their data. And that' of what a lot of people are complaining about with this lawsuit is how that was how. That is kind of the shakeout on that. But basically this is the largest payout in the history of US copyright law. And it is, I think, really exciting. So for me, anyways. But some people do not think this is a win for authors. It is just a win for. It is by TechCrunch. It is, quote, yet another win for tech companies. Everyone is trying to get as much data as you possibly can to train the models. I think we all know this. Everyone basically scraped the Internet at the very beginning. OpenAI scraped the whole Internet at the beginning. Everyone did. And then people complained about that. Oh, you scraped the blog post. So that was kind of like a thing. What actually ended up happening is these AI model companies ran out of data. They wanted more data. They ran out, like they scraped the Internet. And so a really interesting untapped source was books, right? Because books a lot of times are not actually online. Google has kind of their Google Scholar, I believe, a kind of project that has like photocopied books and put all the pages on. I don't think they allow people access to basically use that. And there's not all books on there. There's a ton that are not. So what Anthropic ended up doing was, and this is what got them in trouble is they went to a bunch of pirated sources. They're called, quote, unquote, shadow libraries. So there's millions of books in there and they're. They're pirated sources. But it's not just a photocopy. Like, they. People have like, uploaded the full book. There's the transcripts you copy and paste, right? So it's very easy data for these AI models to ingest. And it's virtually impossible to have gotten that, that data set as fast as it did. So they were able to grab all of these pirated libraries, throw them into the model, and Claude got way better. I think this is one of the reasons why, even compared to OpenAI from the early days, Claude has always had a much better tone and how it talks and writes. It's been way better for writing, basically. It's kind of ironic, but all the writers I know use Claude because, like, yeah, the tone's way better. And that's because they grabbed a copy, a pirate copy of every single book. Now, I think they kind of knew they were in hot water with this. They were in trouble. Maybe they're trying to cover their tracks or cover their back. And what they ended up doing was going and buying one of like every book in the world. Like something crazy, right? And when you have billions of dollars, it's just a cost of doing business. Then they basically had a robot that would take each of these books, would flip through the pages, scan the pages, and then transcribe the pages and then like basically take a picture and they can read the picture and then include that into the model data training. They did both of those things. And when the lawsuit came, they got in trouble for obviously, the millions of books on a pirated library. And this is actually what the, the, the, you know, billion dollar settlement is coming from. Now, the judge actually ruled in this case that, you know, what they were doing where they were scanning all of the books and uploading them. And like, basically they purchased the book and then they include it in the days that they said that is allowed because it's the same as if a person goes and buys the book and reads the book, has the knowledge and they go write like some sort of paper or some sort of essay on it and that's monetized. Like you're allowed to do that because you gained knowledge. And so this is kind of like what they did. They paid for the book and they gained knowledge. So you're not allowed to use pirated books, but if you buy the book, you can include it in your data set. And so a lot of people are upset because they're like, you know, those authors should have reoccurring compensation forever if you want. If they want to be included, they should be able to be pulled in and out. I think the cat's out of the bag. It's kind of too late, honestly, with the shadow libraries, it's too late now anyways, because basically if you have the pirated copies in the model, you could just use the old model to train a new model. And so even if you're like, okay, we're not using the old model anymore, the data is already in there, the tone's already in there. It's kind of too late at this point. And so it's now just like, what's their fine? So the fine was $1.5 billion. What's interesting is there is dozens of lawsuits filed against companies like Meta, Google, OpenAI and midjourney over basically all the legalities of training AI on copyrighted work. So this isn't the first. I don't think this is going to be the last copyright one that we see come out. I think Anthropic is going to come out ahead for this. And I think a lot of people are happy with the precedent because now they, they know, like, the right way they can do this. I think everyone's kind of solved this problem at this point, but it's nice to know that for them anyways, that the way they've solved it is something that they can continue to do into the future and they're not going to get in trouble for. So right now, writers are basically getting a settlement if their work was included in all of the. In all of the pirated stuff. So what's interesting is Anthropic actually just raised $13 billion. Did a podcast on that, if you're interested, on the dynamics of that. But they just raised $13 billion. So paying out 1.5 is not going to kill them. Right. Their Last raise was 3.5. And I imagine if they had to pay out 1.5 billion of 3.5 just for that one lawsuit, that would be hurting them quite a lot. I think with this fresh round of funding, they can move forward and they'll be fine. But all of this happened because in June, federal judge William Allsupport sided with Anthropic and ruled that it is legal to train AI on copyrighted material. He argued that this use case is transformative enough to be protected by the fair use doctrine that is carved out of copyright law that was set back in 1976. So he said, quote, like any reader aspiring to be a writer, Anthropics LLMs train on open work upon works, not to race ahead and replicate or supplement them, but but to turn a hard corner and create something different. The piracy obviously was a completely different problem. And that's why he let the case go to trial, was because of that. And this is what Anthropic said about this whole thing. They said, quote, today's settlement, if approved, will resolve the plaintiff's remaining legacy claims. This is Aparna Sridhar, who is their deputy general counsel at Anthropic. And then they also said, we remain committed to developing safe AI systems that help people and organizations extend their capabilities and advance scientific discovery and solve complex problems. So there are tons more cases that are currently being litigated right now between AI and copyrighted works. But because of this, I think this, you know, Barts versus Anthropic is going to be basically a precedent in all of these things. And so I think that there's. Some people think that because of the ramifications of it, maybe judges are going to arrive at a different conclusion. But I think basically this precedent is going to hold and we're going to start to see that A lot of these cases are, I won't move forward. If you're using pirated stuff, obviously you can get in trouble. But if you purchased whatever the original work was, and like, you kind of think of like music generators, which I, you know, this would be funny, and they'll probably all get slaps on the wrist too. But like, you'd have to go buy one copy of like every top song ever recorded for the last hundred years. And then, I don't know, whatever a billion dollars you spend on that, you know, all the music in the world, then you can feed that into your AI model. So, so to. To make music. So I think basically we have a precedent for how it should go. In my opinion. I think this might be the way you have to go. Because one of the big problems that like Adobe tried to solve with image generation was they were like, they're like, look, we'll pay people if you include your images in our data set. They paid the original photographers for Adobe Firefly images. But they, it's impossible to know, like when I say, you know, generate a picture of a green plant on a stand with a flag in the background, like, what data was used to create that image, like, what was needed. So it's not like you could do it like Spotify, where if you listen to a song, they get, you know, you, they just listen to that song. So now you give them a couple cents, you give them a penny in streaming revenue. It's impossible to know like, what the original source was. So basically Adobe did it where they just took in a huge data set of images and they're like, look, you know, we took in a million images. So let's say there's a million photographers each put in one image. They all get like one, one millionth of the, of the royalty or revenue or whatever. And so it's, it's basically like the same thing with a lot of these, where it's impossible to. I think it's, I don't think it's realistic to set up systems where like, once you're included in a data set, now all of a sudden you can use that model to spit out more outputs that other models can use to train on. It's just really, it's, it's lost. So I think it's impossible to track everyone's copyrighted data forever. And we probably should just move forward. If we all agree that these AI models are more useful for us than, than harmful, let's just move forward. And yeah, that's my opinion, but I know everyone has different opinions on this. In any case, thank you so much for tuning into the podcast today. Make sure you go check out AI box. There is an amazing new no code AI app builder that we just integrated and launched and I'm super excited about it. I'd love to hear your thoughts on it. Thank you so much and hope you have a fantastic rest of your day.