Summary7 min read

Scaling Laws: AI Copyright Lawsuits with Pam Samuelson

The Lawfare Podcast & University of Texas School of Law | September 19, 2025
Host: Alan Rosenstein (Associate Professor of Law, University of Minnesota; Research Director, Lawfare)
Guest: Pam Samuelson (Richard M. Sherman Distinguished Professor of Law, UC Berkeley School of Law)

Episode Overview

This episode explores the rapidly evolving legal landscape at the intersection of artificial intelligence (AI) and copyright law. Host Alan Rosenstein speaks with IP law expert Pam Samuelson about the recent wave of copyright lawsuits involving AI companies such as Anthropic and Meta, recent pivotal court rulings, and the U.S. Copyright Office's controversial report on AI and fair use. The discussion covers core copyright doctrines, transformative use, market harm, remedies, and policy questions shaping the future of creative industries in the AI era.

Key Discussion Points & Insights

1. The Stakes of AI and Copyright Law (03:48–09:15)

Legal Background: Rosenstein and Samuelson set the stage by explaining why copyright issues matter for AI, particularly with large language models (LLMs) and generative systems.
- Copyright Owners’ Rights: Copyright grants creators exclusive rights, especially the right to reproduce their work; using works for AI training creates prima facie infringement, unless covered by fair use.
  
  "When people make fair uses, then even if it was a prima facie infringement, it's not an actual infringement..." — Pam Samuelson (04:34)
- Fair Use Factors: The discussion introduces the four statutory fair use factors, with most debates centering on 'transformativeness' and effect on the market.
- Balancing Incentives: The U.S. sees copyright as ultimately serving the public by incentivizing creation; the major fear is AI undermining creators’ incentives, leading to a "nightmare scenario" where new works dry up.
  
  "Hanging over all of these AI debates is this kind of nightmare scenario where AI essentially displaces all of this incentive to create new copyrighted work because you can’t make money anymore..." — Alan Rosenstein (08:10)
- Empirical Uncertainty: Samuelson argues that while these fears are prominent, there isn't clear evidence yet that AI is destroying market demand for new works.

2. Supreme Court Framework: Warhol v. Goldsmith (11:35–19:49)

Case Recap: The 2023 Supreme Court case involved Andy Warhol’s use of photographer Lynn Goldsmith’s work; ultimately, the Court found that licensing Warhol’s “Orange Prince” portrait for magazine use was not fair use because it substituted for the original’s commercial market.
- The "transformative use" question is central but complicated; market substitution now weighs heavily.
- Practical effect: Courts may be stricter regarding potential for lost economic opportunity due to unauthorized transformative uses.
  
  "[Courts] are going to really squint quite hard and look for any potential substitution effect, and that will weigh reasonably heavily on fair use analysis." — Alan Rosenstein (18:24)
- Samuelson’s Take: She downplays "doom and gloom" about a tightening doctrine, noting that post-Warhol decisions have still allowed a range of traditional fair uses (e.g., documentaries using short clips).

3. Current Copyright Litigation: Anthropic & Meta Cases

Bartz v. Anthropic (22:48–33:01)

Court’s Holding (Judge Alsup):
- Major Ruling: Using copyrighted works as training data for AI models is fair use if it's highly transformative and non-expressive:
  
  "Using in-copyright works as training data for constructing a model for a generative AI system–that’s fair use because it’s transformative, highly transformative." — Pam Samuelson (22:48)
- Authors’ Market Control: The judge found that authors do not have a right to control the market for transformative (training data) uses of their works.
- Acquisition of Content: Digitizing lawfully acquired books is also fair use.
- Contrast – Pirated Books: Judge Alsup ruled against Anthropic for using pirated book copies as training data.
  
  "He certified the class as all of the copyright legal or beneficial owners of copyright in books that had, number one, been registered with the Copyright Office and number two, had an ISBN or an Amazon number associated with it." — Pam Samuelson (34:28)
- Settlement Issues: The proposed $1.5B settlement (about $3,000 per book) is controversial since it may benefit publishers more than the suing authors; the judge is skeptical about fairness.
- Remedy Limitations: Due to practical concerns about crippling AI companies, courts likely won't impose injunctive relief that would destroy models; damages seem more likely.

Kadre (Cadre) v. Meta (42:27–47:30)

Outcome (Judge Chhabria):
- Found in favor of Meta; dismissed claims based on unauthorized training data (including pirated books).
- Focused on Meta’s technical guardrails to prevent verbatim regurgitation of copyrighted content; no evidence of direct market harm.
- Key Legal Theories:
  - Market for licenses of training data: Rejected as a right the authors control.
  - Market dilution theory: Raised as a possible avenue (if AI output meaningfully displaces demand for original books), but plaintiffs did not present sufficient evidence.
  - Chhabria’s Signal: Other plaintiffs might succeed by focusing their evidence on indirect but substantial market substitution (i.e., "market dilution”).
"[Judge Chhabria] agrees...that the licensing market for uses of works as training data is just a market that the plaintiffs don’t have any right to control. Period." — Pam Samuelson (45:10)
Conceptual Confusion: There’s legal uncertainty over how courts should treat alleged indirect market harm absent substantial similarity in expression—a point of skepticism for both host and guest.

4. Legal Remedies and Policy Solutions (39:52–55:46)

Collective Licensing Models: Europe’s copyright system allows for collective licensing—potentially a model for U.S. reform, though transaction costs are a challenge for individualized licensing at scale.
Remedies in Play: The real-world infliction of massive damages or operational restrictions on AI developers is unlikely, as judicial and market reluctance to "destroy the AI industry" is strong.
- Statutory Damages: In code-related cases like Doe v. GitHub, statutory damages for copyright management info removal can be immense, shaping legal strategies.

5. Circuit Court & Policy Developments

Thomson Reuters v. Ross Intelligence (Third Circuit; 55:46)

Key Issue: Using large numbers of legal headnotes to train AI tools—does this constitute fair use or market substitution? The district judge found infringement, influenced by the Warhol case, but was ambivalent and invited appellate review.

U.S. Copyright Office Controversy (56:14–57:53)

Draft Report: The Copyright Office released a nuanced report on AI and fair use, finding that “some of these uses may be fair uses, and some...may not.” It introduced the “market dilution” concept.
Political Turmoil: Unprecedented chaos erupted when President Trump tried to fire Register of Copyrights Shira Perlmutter; a court reinstated her, pending further appeals. This power struggle adds to the uncertainty around federal copyright policy.
- The Biden administration’s AI policy platforms have conspicuously omitted copyright, deepening policy ambiguity.

6. Notable Quotes & Memorable Moments

On fundamental copyright limits with AI:

“What is different about this is the notion that something that is not substantially similar in expression...but that has a lot of the same information—well, copyright doesn’t protect information.” — Pam Samuelson (30:02)
On why authors may never be able to effectively license training data rights:

“There isn’t such a thing in the world, but let’s just say I did that. How much does every single author of every single book get? And how would you figure out what’s a fair compensation?” — Pam Samuelson (48:04)
On prospects for big change soon:

“It’s going to be a while...Nobody’s planning to make summary judgment motions in any of the other generative AI cases so far as I can tell, until 2026...” — Pam Samuelson (50:38)

Timeline of Important Discussion Segments

| Timestamp | Segment/Topic | |-------------|---------------------------------------------------------------------| | 03:01–04:34 | Intro to copyright law and fair use in the AI context | | 06:07–08:10 | Transformativeness, substitution, and incentive theory | | 11:35–19:49 | Warhol v. Goldsmith: Implications for transformative use | | 22:48–33:01 | Bartz v. Anthropic decision and transformative fair use | | 34:28–39:52 | Settlement, damages, and remedy constraints in AI copyright cases | | 42:27–47:30 | Kadre v. Meta and the rise of market dilution as a legal argument | | 50:38–55:46 | Upcoming appeals and policy outlook | | 56:14–57:53 | Copyright Office report & D.C. political controversy |

Conclusion

This episode tracks a decisive shift in how U.S. courts and policymakers are approaching AI and copyright. With early court decisions (notably Bartz v. Anthropic and Kadre v. Meta) trending in favor of AI model developers on the core issue of using copyrighted materials for training, unresolved tensions remain around market harm, possible new doctrines like market dilution, and appropriate remedies. Meanwhile, policy uncertainty and political drama (especially at the Copyright Office) keep the copyright landscape unsettled. Listeners are left with an expert’s caution: major developments and possible appellate or legislative change are likely years away.

Contact the show: scalinglaws@lawfaremedia.org
Find more content: lawfareblog.com

Loading summary

Transcript66 lines

[00:01]
Pam Samuelson
This fall marks 15 years of lawfare and we're celebrating the only way we know how by gathering our community of readers, listeners and contributors for an in person celebration in Washington, DC. Get your tickets today at lawfaremedia.org 15 years.
[00:26]
Fin AI Representative
AI is transforming customer service. It's real and it works. And with fin, we've built the number one AI agent for customer service. We're seeing lots of cases where it's solving up to 90% of real queries for real businesses. This includes the real world complex stuff like issuing a refund or canceling an order. And we also see it when FIN goes up against competitors. It's top of all the performance benchmarks, top of the G2 leaderboard, and if you're not happy, we'll refund you up to a million dollars, which I think says it all. Check it out for yourself at fin.
[00:55]
Pam Samuelson
AI.
[00:56]
Alan Rosenstein
Did I talk too much? Can't I just let it go? I wish I would stop. Thank you so much.
[01:01]
Aviva Software Advertiser
Take a breath.
[01:03]
BetterHelp Advertiser
You're not alone. Counseling helps you sort through the noise with qualified professionals. Get matched with a therapist online based on your unique needs, and get help with everyday struggles like anxiety or managing tough emotions. Visit betterhelp.com randompodcast for 10% off your first month of online therapy and and let life feel better.
[01:28]
Kevin Frazier
It's the lawfare Podcast. I'm Kevin Frazier, the AI Innovation and Law Fellow at the University of Texas School of Law and a senior editor at lawfare. Today we're bringing you something a little different. It's an episode from our new podcast series, Scaling Laws. Scaling Laws is a creation of lawfare and Texas Law. It has a pretty simple aim, but a huge mission. We cover the most important AI and law policy questions that are top of mind for everyone from Sam Altman to senators on the Hill to folks like you. We dive deep into the weeds of new laws, various proposals, and what the labs are up to to make sure you're up to date on the rules and regulations, standards and ideas that are shaping the future of this pivotal technology. If that sounds like something you're going to be interested in, and our hunches, it is. You can find Scaling Laws wherever you subscribe to podcasts. You can also follow us on X and BlueSky. Thank you.
[02:31]
Alan Rosenstein
When the AI overlords take over, what are you most excited about?
[02:35]
Kevin Frazier
It's not crazy. It's just smart.
[02:37]
Alan Rosenstein
And just this year, in the first six months, there have been something like a thousand laws.
[02:41]
Kevin Frazier
Who's actually building the scaffolding around how it's going to work, how everyday folks are going to use it.
[02:47]
Alan Rosenstein
AI only works if society lets it work.
[02:49]
Kevin Frazier
There are so many questions have to.
[02:51]
Alan Rosenstein
Be figured out and nobody came to my bonus class.
[02:54]
Kevin Frazier
Let's enforce the rules of the road.
[03:02]
Alan Rosenstein
Welcome to Scaling Laws, a podcast from lawfare and the University of Texas School of Law that explores the intersection of AI law and policy. I'm Alan Rosenstein, Associate professor of Law at the University of Minnesota and Research Director at lawfare. Today I'm talking to Pam Samuelson, the Richard M. Sherman Distinguished professor of Law at the University of California, Berkeley School of Law. We discussed the flurry of recent court cases defining the future of AI and copyright, including the pivotal district court rulings in the lawsuits against Anthropic and Meta, the emerging legal theory of market dilution, and the controversial report on AI and fair use by the US Copyright Office. You can reach us at scalinglawsawfairmedia.org and we hope you enjoy the show. Pam Samuelson, welcome to Scaling Laws.
[03:47]
Pam Samuelson
Thanks very much. Glad to be here.
[03:49]
Alan Rosenstein
So we are at a, I think, pretty pivotal moment for AI and copyright, and I'm delighted to have you on. I can't imagine a better guest to talk about these issues with. We have a lot to talk about in terms of cases and reports from the Copyright Office and policy recommendations, but I think conceptually at the heart of all of this is of course, the idea of copyright, and then in particular what's often called the fair use defense. And so just for those of our listeners who it may have been a few years since they took IP or who have never taken this, just give a kind of a high level overview of why when it comes to talking about LLMs and other generative AI systems, so much of the discussion is about potential copyright infringement.
[04:34]
Pam Samuelson
Well, copyright law gives authors of all manner of works of authorship a set of exclusive rights that means a right to not to exclude other people from doing certain things that you don't approve of. And one of those exclusive rights is to reproduce the work in copies. And so the reproduction right is one of the most powerful tools that copyright owners have to be able to control what people do with copies of their works. And there's a limitation that the statute recognizes. When people make fair uses, then even if it was a prima facie infringement, it's not an actual infringement, because fair use defenses actually prevail in some cases. And so that's why these issues are so important. If someone is, for example, using in copyright works as training data for a generative AI model, then they are making reproductions of those works. And they may think that it's a fair use, but there's a prima facie infringement just because copies are made, but both to collect the works in a data set and then to make copies of it during the training process.
[06:07]
Alan Rosenstein
So there are these four fair use factors. They're in the statute. They are a little bit cryptic as to what they mean. My sense is that a lot of the debate in specific copyright cases comes down to, on the one hand, how what's called transformative the work is, how much does it differ from the copyrighted work that's at issue. And then on the other hand, and I should say the more transformative it is, the more likely likely it is that it's going to be fair use. And then on the other hand, the more that the work in question can substitute for or harm the market for the copyrighted work, the less likely it is to be fair use. Is that again, a huge oversimplification? But is that a rough cut at how courts try to think about these cases on a case by case basis?
[06:53]
Pam Samuelson
Yes, that's a really good summary. In general, the purpose of the defendant's use and what effect the challenged use has on the market are the two most significant factors. It depends on the type of case. Sometimes even in non transformative use cases, for example, making copies of television programs to watch them at a later time, that's a private, non commercial copy that the Supreme Court has said is fair use. And so it doesn't have to be transformative in order to be a fair use. But when it is transformative, generally speaking, it's less likely to supplant demand for the original. And what copyright cares about more than anything is substitution of the market. Right? So if this defendant's work is able to be in the market the same time as the plaintiff's work, some people might buy the defendant's work instead of the plaintiff's work, even though what was valuable about the defendant's work really came from the plaintiff. And so it's not fair.
[08:10]
Alan Rosenstein
And just to emphasize that last point, which I think is really important because different parts of IP have different justifications for them, and correct me if I'm wrong, but the justification for copyright, at least in the United States, and I know the Europeans have a somewhat different conception of this, is you want to incentivize people to create more work and therefore you have to give them some sort of economic benefit, which is the kind of copyright time limited monopoly, which is essentially what it is. And so when we think about copyright, what we're trying to do is trying to maximize the sort of productive incentive for creators to create new work. And the reason I'm emphasizing this is because I think that hanging over all of these AI debates is this kind of nightmare scenario where AI essentially displaces all of this incentive to create new copyrighted work because you can't make money anymore, because ChatGPT can do everything you can. And then, you know, whether that's morally good or bad for the authors, there's just a practical case that then there's no more raw material. So in a sense, you know, the well runs runs dry. And this to me at least seems like the kind of worst case scenario that a lot of people have have in mind. I'm curious what you think about that framing.
[09:15]
Pam Samuelson
Well, one thing I'll say is that the only difference I have with what you just said is that the Supreme Court has repeatedly said that the primary beneficiary of copyright is actually the public, that the Constitution says that Congress is given the power to create laws like copyright to promote the progress of science. Right. And so some part of what we're trying to think about when we're thinking about fair uses is whether or not allowing the defendant to do what the defendant has done is going to be overall in the public benefit or whether or not it's not. And obviously if it ruins the market for the work, then that's going to be a problem and we'll come back, I think probably in a little while to what is now called market dilution, which is this fear that if generative AI can essentially output books on mushrooms and on gardening and romance novels, that nobody who might otherwise write a book about mushrooms or gardens or romance novels just won't do it anymore. I think that's a question that the courts are being faced with. So far that hasn't happened. And whether it actually will is something that's I think got to be empirically tested just because it kind of like it hits us where we live. It's also important to recognize that automation has changed a lot of jobs. Right? A lot of skilled people don't have jobs anymore because automation actually made it simpler to do it. And so, you know, if you look at the studies that have been done about what is, what are AI going to do to the labor market, authors aren't at the top of the list.
[11:36]
Alan Rosenstein
Fair. Before we get into the case, I want to talk about, I want to do one more bit of stage setting. And that is to briefly discuss the last major Supreme Court statement on copyright, which is this 2023 case of Warhol versus Goldsmith. And my sense, again, I'm not a copyright scholar, but my sense was that this was a reasonably big deal in the law and that the Supreme Court had simultaneously strengthened copyright and in a meaningful way, and also injected a decent amount of uncertainty into how to actually do the Fair Use analysis. So if you wouldn't mind just giving sort of an overview of the Warhol case, because I think those issues are going to be very helpful as we discuss the specific district court cases in a few minutes.
[12:19]
Pam Samuelson
Sure. So back in the 1980s, Lynn Goldsmith took a photograph or a series of photographs of Prince, the singer. He was then kind of just starting to become famous, and Vanity Fair decided that it wanted to publish a story about his rise to fame. So it contacted agents for photographers and asked for pictures of prints. And Goldsmith had one that Vanity Fair liked. So it made an arrangement called an Artist Reference License. And what that means is that the agent for Lynn Goldsmith agreed to allow Vanity Fair to use this one photograph for the purpose of having a third person actually make a work of art from the. Based on the photograph. And so Vanity Fair commissioned. Commissioned Andy Warhol to do it. Andy Warhol made a purple Prince print that Vanity Fair used in the magazine. And then years later, when prints died, Conde Nast decided that it wanted to do a special commemorative issue on prints, and so contacted the foundation about licensing probably that same image for the commemorative issue, and it ended up doing a different one. So it turns out that Warhol made more than one instance of things based on the photograph. And so they liked the orange prints. And so they put the orange prints on the front cover. And then Lynn Goldsmith, this is like now 2016, sees it and she says, oh, my God, that's based on one of my photographs. And then she contacted the Warhol foundation and said, that's an infringement of my copyright and you owe me money. And Warhol foundation said, I think it's a fair use. And so they were so confident that it was a fair use that they sought a declaration from a court of non infringement. And the trial court agreed. The trial court said it's transformative because it has a different purpose and a different image, and they've changed a whole bunch of things. Also, they compete in different markets. She competes only in the magazine article market. And Warhol's stuff goes on museum walls and stuff like that.
[15:23]
Alan Rosenstein
Okay, so just so I understand, because I think it's important Nuance. What it sounds like Warhol was arguing, and I think this will be relevant when we get to LLMs, is that even the original reference license agreement would have been unnecessary under that logic. Warhol could have just seen Lynn Goldsmith's photo and even without a reference license, converted it into some Warhol Purple Prince print. Is that the import of that argument and the implications of it?
[15:54]
Pam Samuelson
Well, it's certainly one of the arguments. I would say that the fact that there was a reference license means that he was authorized to create that particular work. And so to me, that actually is part of the fair use argument. And my colleague Jessica Silbe actually says that the general understanding of artists reference licenses, once you get that, you know, then you have your own work of art. And so the trial. The trial court took that into account. The Court of Appeals did not take it into account, and the issue just didn't come up in the Supreme Court. So I factor the fact that it was an artist reference license as part of my fair use analysis in that particular case. But Appropriation Art has, in fact, just seen a lot of photographs and other works as raw material for ongoing creation. And so appropriation art was kind of at issue in the Warhol case. Now, what's different about the Warhol case when it was at the Supreme Court, as opposed to when it was at the Second Circuit and at the trial court, is that the parties had been arguing that it was fair use. When Warhol created the print series in 1984. And before the Supreme Court, the Solicitor General said, supreme Court, you don't have to think about 1984. Just think about that one commercial license that the foundation gave to Conde Nast, and they got $10,000 for that, and Lynn Goldsmith didn't get anything. And the question is whether that's fair use, whether that was transformative. And Supreme Court said, no, that the Orange Prince was being licensed as a magazine illustration. That's kind of what Lynn Goldsmith does. So they had the same purpose, and they were commercial. And so the court decided that that actually was a substitution for a legitimate market that belonged to Goldsmith.
[18:25]
Alan Rosenstein
So it's always hard. And I always tell this to my students when I teach them a new case. And they say, okay, what is the rule of the case? And I say, we don't know. The case is too new. Ask me in 10 years, right? You never really know what a case stands for until the Supreme Court, in some future time, looks back and says, this is what we meant in this case. So I'm asking you to prognosticate a little bit. But is it fair to say that, as the kids say, the vibes after this opinion is at least that the courts are going to really squint quite hard and look for any potential substitution effect? And that will weigh reasonably heavily on fair use analysis? Because I suspect that, again, just thinking about these AI cases, conceptually, I think it's very hard to. The concept of transformativeness seems to me a very, very tricky question. It has all these philosophical questions, I think, baked in, whereas the question of substitutability, market effects, market dilution, it's obviously a complicated empirical question, but it feels a little more tractable for a court to decide. And if that's the case, then the logic of a case like Warhol vs Goldsmith would suggest that it's going to be marginally harder. We'll see how much harder it is for the companies to say, no, no, we're doing something totally new, therefore we don't even have to worry about the economic effects on these creators whose work we are alleged to have infringed upon.
[19:49]
Pam Samuelson
So there are people who think that. That Warhol versus Goldsmith transformed copyright, fair use doctrine, and many fewer things will be fair uses in the future. I think that's baloney. And I've actually been reading a lot of the cases that have been decided since then, and there's one case that was decided, I think it was by the 10th Circuit, where Netflix had used part of the a video of a funeral ceremony in a documentary about the Tiger King. And the judge said, you didn't comment on it, you didn't criticize it, and therefore it's an infringement, not a fair use. And then Netflix came back and asked for reconsideration, and they sort of said, oh, I guess Warhol doesn't mean that after all. And so they withdrew opinion that said, if you don't criticize, if you don't comment, it's not fair use. That's not right. Documentary films have been able to use snippets of things for many years, and I don't see that changing in the aftermath of the Goldsmith case. Another thing to realize is that the Supreme Court actually gave several examples of uses that the Warhol foundation could make without Goldsmith's permission, such as hanging it on the wall, such as licensing it for a book about Warhol's art. That's not a market she's in, and therefore there's no market substitution.
[21:44]
Alan Rosenstein
All right, I think that's good for background. Now, let's jump into the recent spate of cases. So the first I want to start with is Bart's versus Anthropic, which I think it's fair to say is probably the case that has gotten the most attention. Here you have three book authors representing a much larger class who allege that Anthropic, which is the company that is behind the popular Claude Chatbot and LLM, used their materials unlawfully in an infringing way in the training of their models. This is a complicated case because it involves both the ingesting of these books, but also many of these books apparently had been downloaded illegally from Internet piracy websites, these big databases, and Anthropic just used those. So there are lots of moving parts in this. So let's start with an overview of how Judge Allsupp kind of split the difference between these. And I'm curious if you think in the end this is a win for Anthropic or a loss, because it's a bit of a split verdict. And I think that there have been a lot of different perspectives on who, quote, unquote, won in this district court opinion.
[22:48]
Pam Samuelson
So I think it's fair to say that a really important part of the ruling was that using in copyright works as training data for constructing a model for a generative AI system that's fair use because it's transformative, highly transformative. You took the whole thing, but what you did with it, you were using it for non expressive purposes. Right? We're trying to construct a model rather than consuming the expression as you would if you were just reading the book. And the judge actually said that, although the authors wanted to say, you didn't get a license from me, the judge said this is a market that authors have no right to control. So it's important that there have been a number of cases, including Supreme Court cases, in which the courts say that copyright owners don't have an entitlement to control transformative markets. And Judge Alsop considered this to be a transformative market and something that the authors didn't have any right to control.
[24:09]
Alan Rosenstein
So let's just make an. Actually, I want to stay on this for a second because I think the transformative market point is important. So an analogy might be, I'm a law professor, I read someone's book about law, I learn about the law, I get some ideas, I read some other people's book, and then I go in and become a very successful lecturer. The idea here is that the author of the book that I used, maybe it would be professionally bad of me not to cite him. Maybe all these sorts of issues. But he does not have, or she does not have an entitlement to control the market for my lectures. Right. That's the kind of analogy that's going on here, is that right?
[24:46]
Pam Samuelson
Yeah. Again, as long as there is not a use of a substantial amount of the expression from the original. Right. If you just recite directly from the book, then that might raise a different.
[24:59]
Alan Rosenstein
Issue because then of course the market is for the book itself. In a sense, I'm reciting the book.
[25:06]
Pam Samuelson
Then that's consuming the book for its expressiveness. So Judge also said that when Anthropic went out and bought books and scanned the books and then threw the physical books away but kept the contents in a database, that that was transformative and fair use also. So those two things were not work had been controversial. But Judge also felt comfortable with both the use of books that were lawfully made available being used, digital books being used, and then scanning books to use his training data. He thought that was just fine.
[25:59]
Alan Rosenstein
Yeah. And so we'll get to the illegal books part in a second. But I do want to focus on this because my sense. But again, I'm not the IP expert, so I really want your read on this. This is a massive deal, right? That's a big deal. This is a big deal, right? It's about as big a deal as it gets. Right. Because if this logic, which really is kind of everything that the LLM companies have been asking for, I mean, what their argument has always been, look, our models are learning in, you know, a way not exactly like humans learn. This is a kind of alien intelligence. But for purposes of copyright law, certainly analogically to how humans learn. And copyright has never been thought to restrict the ability of people to learn from copyrighted material and then do stuff with that. So, you know, obviously, and I should have sort of emphasized this at the beginning, these are district court opinions. There's a long, long road to hoe here. But if this becomes the law of the land, I mean this is kind of the whole ball game or this is a lot of the ball game for AI and copyright. Is that a fair statement?
[26:58]
Pam Samuelson
Yeah, I mean, it's important to sort of understand that the European Union has also adopted a text and data mining exception so that at least nonprofit entities can use any in copyright material that they lawfully acquire to engage in text and data mining. And the European officials have said that training data uses of works falls within this exception. And even profit making entities can do text and data mining copies also, although they have to give copyright owners the opportunity to opt out. So we got started with the fair use conversation. But other countries have recognized that there is a legitimate interest in being able to use works for what are often called non expressive or non consumptive uses. Because when you do text and data mining, you actually learn things that you can't learn any other way. Again, thinking back to what is the constitutional purpose of copyright? It is to promote knowledge. And so this kind of notion that one of the things that these generative AI companies are doing is promoting more access to knowledge, that's one of their things that they're going to say. Now, of course, from the standpoint of the authors, that's fine with me as long as you pay me. And that's going to be the sort of the thing that we'll see how that plays out. But Judge Allsup didn't think that was necessary.
[28:46]
Alan Rosenstein
Is going back to the Warhol opinion for a second. Is this consistent, this kind of reasoning consistent with Warhol? And specifically in the following sense, it seems that what Judge Alsop in this opinion did was he really focused on the question of transformativeness. And he said, look, books and neural nets, they're just too different. Right. And so I'm not going to go down the question of market substitution, and I'm not going to do that. I'm just going to say these are so different that it's just too bad for the authors. Is that consistent with Warhol? I mean, or in other words, if you could have a case and the other case we're going to talk about is a little bit like that, actually. But if you had a case in which the authors could show, sure, the LLM is doing something totally transformative, fine, fair enough. But the effect is going to be that no one's ever going to buy my book again because they can just go to ChatGPT and get all the information that my book initially provided. You can imagine that in the case of a textbook, you know, no one's going to go to ChatGPT to read a novel. That's not how it works. But you can imagine going to ChatGPT to learn calculus and no longer read calculus textbooks. The idea that, well, if it's transformative enough, we don't even look at the economic effects, is that consistent with Warhol as you understand it?
[30:03]
Pam Samuelson
So again, the, the way that plaintiffs are looking at this is that you should think about the generative AI system in a more holistic way, right? Yeah, we're talking about training data, but if you Think about it holistically. Then here's my work, which has this particular purpose, and then here's the model in between, and then here's the output. And if the output essentially will satisfy demand for my work over here, then in fact it's a competing substitute. Now what's hard about this for copyright purposes is that generally speaking, in order for something to be an infringing copy or to be an infringing derivative work, it has to have substantial similarity in the expression to the work that it came from. So what is different about this is the notion that something that is not substantially similar in expression over here, the output, but that has a lot of the same information. Well, copyright doesn't protect information. So, you know, it would be, shall I say, completely weird to say that this output over here is an infringement of this particular input when it says it completely differently. It just uses the information and the ideas.
[31:45]
Aviva Software Advertiser
If you're listening to this, you're ready. Ready to join the industrial intelligence generation. A generation defined not by age, but by a shared mindset to connect teams, accelerate efficiency and drive innovation. Using the power of Aviva software to reshape industries, turning real time insights into real world growth. Discover our stories@industrialintelligence.com join Generation I only Boost Mobile.
[32:16]
Alan Rosenstein
Boost Mobile will give you a free year of service. Free year when you buy a new 5G phone.
[32:20]
Pam Samuelson
New 5G phone, enough.
[32:22]
Alan Rosenstein
But I'm your hype man. When you purchase an eligible Device, you get $25 off every month for 12 months with credits totaling one year of service, taxes extra for the device and service plan online only.
[32:30]
BetterHelp Advertiser
I'm no tech genius, but I knew if I wanted my business to crush it, I needed a website. Now thankfully, bluehost made it easy. I customized, optimized and monetized everything exactly how I wanted with AI. In minutes my site was up. I couldn't believe it. The search engine tools even helped me get more site visitors. Whatever your passion project is, you can set it up with Bluehost with their 30 day money back guarantee. What have you got to lose? Head to bluehost.com to start now.
[33:02]
Alan Rosenstein
So we've talked about, I think the core of all supps ruling. But there's also the part about the fact that much of the anthropic data set was acquired illegally. It was downloaded from these pirated databases. And here Judge Alsop did not think that was fair use and he ruled against anthropic there. And we've, you know, and then recently though, at the time that we're recording this. This is all still pending. Anthropic and the plaintiffs and the have come to a proposed settlement. I believe it's about $1.5 billion. I read somewhere that's about $3,000 per book, which is substantially less than the sort of statutory damages that are available, but, you know, substantially more than nothing. But the judge seems, at least in a preliminary hearing about this, very unhappy with this. So we'll see by the time we release this episode, maybe we'll have ruled on this. But I'm curious, kind of two things. One, what you make of his quite critical comments, at least initially, about the first draft of the settlement, and two, whether this really matters. I mean, obviously a billion and a half dollars. I mean, it's a lot of money to me. But you know, Anthropic, I think its most recent valuation is like north of $150 billion. And the AI market in general is probably trillions of dollars at this point. So, you know, these things seem like rounding errors. And I wonder, you know, in the long in the grand scheme of AI and copyright. And I'm curious what your view on that is.
[34:29]
Pam Samuelson
Well, first of all, the other case that was decided within days of Judge Allsup's decision was Cadre versus Meta. Pretty much the same claims were being made against Meta as were being made against Anthropic. And there were pirated books in their data set too. And Judge Chabria just didn't care. So the idea that you use training data that comes from books that were downloaded from Books three or some of the other shadow libraries, that that necessarily is going to taint the infringement claim. I think that's an exaggeration because Judge Chabria basically said, you know, it doesn't cut either way. And from the standpoint of the people who are doing it, they were basically saying to themselves, look, we're not trying to exploit the expression, right? We're not going to sell the book in competition with you. We just want to extract knowledge from these works. And they thought that they were just doing research. Now, Judge Olsip had a very different view about things. And what's different about this particular situation is that he, Judge Alsop, certified a class that the plaintiffs and the defendants didn't ask for. So he just did this on his own. And he certified the class as all of the copyright legal or beneficial owners of copyright in books that had, number one, been registered with the Copyright Office and number two, had an ISBN or an Amazon number associated with it. And the estimation is that about Half a million books are within that class. And then the question is sort of like, what do you do with $1.5 billion? Well, it's an exaggeration to say that authors are going to get $3,000 each, because the legal owners of many of the copyrights in that collection are the publishers. And the authors may be beneficial owners, as in have a right to some royalties. They might share in this. But I think that Allsup now kind of realizes that, oh, my God, this, most of the money is going to go to the publishers. And it's really the authors who brought this lawsuit. And what are the publishers doing in the lawsuit? They hadn't been in the lawsuit at all, right? So all of a sudden, the publishers get a gazillion dollars and the author's going to have to fight to get a share of it. That just. I think he's kind of like he's been trying to push this thing along. Right? The reason that this case didn't go up on the class certification and didn't go up on a fair use issue to the Ninth Circuit was because he said, I'm going to trial on the pirated books issue on December 1st. You better get ready. Faced with the potential of a much bigger damage award at trial, a settlement actually sounded like a safe thing to do. And in some sense, it also puts Anthropic in a better competitive position than OpenAI because OpenAI has used pirated books, too.
[38:41]
Alan Rosenstein
This remedy question is interesting because as you point out, if you're trying to make authors whole or you're trying to help the authors, you may need to go quite a bit beyond $1.5 billion so that there's money left over. But this then gets into the question, and you've actually, the last time you were on the podcast and you wrote this great piece for us a few years ago about remedies, you start getting into not just huge financial remedies, but also injunctive remedies and remedies to destroy data. Realistically speaking, even if courts find that there has been copyright infringement in some of these cases, are they likely to impose the sorts of remedies that would be necessary to vindicate the injury, which is going to be massive, given that to do that would potentially mean destroying or really severely crippling these AI companies, which my sense is no one actually wants to do? You know, at least by 2020, mid mid 2025, every judge has tried ChatGPT. Whether they like it or not, they realize it's a pretty big deal. I don't think any of them want to be the people that destroy the AI industry. So I'm sort of curious how you think the kind of real politic of this is all going to fall out.
[39:53]
Pam Samuelson
Yeah, I mean, one of the things that's interesting about Europe is that they have a very elaborate collecting society culture. And what I mean by that is that you can actually go to a collecting society, let's say in France or Germany, and get a license that covers not all necessarily, but a big swath of the authors of a particular type of work. And you get the license and it's basically then you can do with the works what you want. You just have to pay something for it. Collective licensing is something that the Europeans are actually thinking about a lot in terms of the generative AI. So when the copyright directive that created the text and data mining exception was adopted, they didn't think about collective licensing. But now that generative AI is here, there's a lot of talk about that. And so there are authors and author groups that would like to see collective licensing in the US for things like the uses of works, its training data. But when Anthropic hired an economic expert to think about the market effects of the use of the works as training data, he said you'd have to make transactions with millions and millions of authors and the transaction costs just swamp that. And so you can't really do it. And so even though there are a number of organizations, including the Copyright Clearance center, that want to offer a collective license for uses of works as training data, they don't have the rights to that. Okay, so they'll issue you a license and they'll collect some money from you, but it's like they don't really have a repertoire of all of the author's works licensing to make that license happen. And the general view is that if there is a training data right, it belongs to the authors. And that's why these lawsuits have been author lawsuits, not publisher lawsuits.
[42:28]
Alan Rosenstein
Let's talk about the META case because I think it is worth talking, digging into a little bit more. As you said, it's basically the same case, but it comes out somewhat differently because as you said, the judge doesn't really care about the pirated data issue. That's not their concern and ultimately rules for meta, but does so as far as I can tell, because he is quite dissatisfied with how the plaintiffs argued their case. So just explain why this case is, if you agree with this summary, less of the victory for meta than I think it was initially portrayed at. And that'll let us get into this question of market dilution and what that means and why that might be important going forward.
[43:09]
Pam Samuelson
Yeah. So there were two arguments that Cadre's lawyers made to Judge Chabria about the market harm that they saw. One was they thought that there was going to be undercutting the actual market for the books, and two, is the market for licenses for uses of works as training data. And they focused all of their briefing and all of their discovery evidence based on those two theories. And Judge Chabria was persuaded that Meta had actually done a good job putting guardrails in its system to stop the recitation or regurgitation of expression from the books, and therefore that the lost sales was not a viable claim of harm.
[44:10]
Alan Rosenstein
And actually, let me jump in here because I think it may be worth noting this question of memorization is important because it's also at the heart of other copyright lawsuits. So, yes, you know, there's this big. I mean, probably the highest profile copyright of all right now is New York Times versus OpenAI. And a lot of that case is about this memorization question that if you prompt OpenAI and it's sort of the right way, it'll spit out large pieces of a New York Times article, which seems bad, but that did always strike me as a little beside the point because, sure, that's bad and maybe you should be punished for that, but that's going to be fixed. That doesn't strike me as the sort of fundamental. The interesting question about LLMs. So just going back to the Meta case, once we clear out this question of memorization, there's still this more interesting lurking question of, okay, well, what about the effect of the truly transformative use. And that's. Please continue. Sort of where the judge then says, this has not been argued very well.
[45:10]
Pam Samuelson
Yeah, so lost sales goes out the window because Meta has done the guardrails thing and the lost licensing fee. Here's a place where Judge Chabria agrees entirely with Judge Allsup that the licensing market for uses of works as training data is just a market that the plaintiffs don't have any right to control. Period. So those were the two arguments that Cadre's lawyers were making. That's what evidence that they had been able to produce. And the judge said, well, what about this market dilution thing? If somebody wrote a book about mushrooms, nobody's going to want to buy their book about mushrooms anymore because you can go to ChatGPT or to Meta's Llama and get the. And get a book about mushrooms if you want, or you can write your own. You know, you can ask Llama to write a book about mushrooms. And so it undercuts that market. So that is, as I said a few minutes ago, a very novel theory that something that uses information from existing works and puts it in somewhat different words that may compete, but it's not, it's not a direct market substitute. And what Judge Chabria thinks is that there's indirect market substitution and that that's enough to say that there's a market effect. And he agrees that it's, this is a novel theory. And of course, he's speculating, speculating, speculating about this. He got really excited about, about this, about this theory. But then if you don't produce any evidence of market dilution, it's just a supposition. That's not good enough. So what he's basically doing is signaling to the other plaintiffs in the other cases, why don't you amend your complaint to raise this market dilution theory?
[47:30]
Alan Rosenstein
So I think this subtle distinction may be lost on me, but if, if the copyright holders do not have an entitlement to the transformative market, why do they have an entitlement to avoid market dilution? I feel like we're slicing the baloney real thin here, which, you know, is, we're lawyers. That's literally what we're paid to do. But I, I, I, I'm, I'm, I'll admit I'm having a little trouble following this distinction.
[47:57]
Pam Samuelson
Well, I hope Judge Chabria has a chance to, like, help us understand it.
[48:02]
Alan Rosenstein
Because it's not, it's not just me. Okay, good.
[48:04]
Pam Samuelson
My reaction was ex same. You know, they say you can't control this particular market. And then during the, during the colloquy with the lawyers, he was saying, well, you guys can actually, you'll figure out a way to get a license to do this. It's like, no, I don't think so. And this is actually a reason why authors who really are worried about market dilution, they would want the models destroyed because there's no, really, Seriously. Because there's never. Okay. I'm like, I trained on, let's say, 2 billion books. Okay, There isn't such a thing in the world, but let's just say I did that. How much does every single author of every single book get? And how would you figure out what's a fair compensation? Now, a colleague of mine in Europe basically says, look, let everybody train as much as they want, and then the AI companies should put a Big bunch of money into a pot. And that pot should be used to subsidize authorship. Kind of like the National Endowment for the Humanities or something like that.
[49:30]
Alan Rosenstein
I'm going to mispronounce the name. Apologies, but it's like Pigouvian taxation. I mean, this is what you do, right? The pie gets bigger and then you tax the pie and then you redistribute, which always the economist's favorite way of doing it because it avoids a lot of kind of deadweight loss and stuff like that. Of course, it tends to work better in theory than in practice. Right. It usually does not, in fact, accomplish the goal. But I agree with you. That does seem like the most conceptually tractable way forward. Okay, so we have these cases on the West Coast. There's other cases that are trickling up. I do want to get to some other stuff sort of in the policy space, but before we move to that, I want you to kind of. To what extent can we read into these cases? Or how much do we know right now? Right in the middle of 2025. Right. It seems to me that the momentum is kind of on the side of the companies. But again, these are district court cases. So are you changing your priors on how the law is going to shake out or is it just too early to tell? And we're going to have this conversation in two years when we have a couple of 9th Circuit opinions and a 3rd Circuit opinion and we're trying to figure out the appropriate vehicle for cert.
[50:38]
Pam Samuelson
Yeah, it's going to be a while. I actually got in touch with some people to find out what's the state of play. And nobody's planning to make summary judgment motions in any of the other generative AI cases so far as I can tell, until 2026 now. There are two AI training data cases now before the 9th Circuit and the 3rd Circuit. The DO versus GitHub case is one in which there's no copyright infringement claim. But the claim is that GitHub's copilot basically was trained on lots and lots of 5 billion lines of open source code. And then you could use Copilot to say, I'm writing a program about X, I need a function, I need some code that will do this particular task. And then Copilot will just generate something for you. Do, 1, 2, 3, 4, and I think five now say that you remove copyright information, such as my open source license, from the works that were used as training data, and you spit out code that's nearly verbatim of Some my code, and therefore that the removal of copyright information is something which is illegal. And so the Ninth Circuit is being asked to review a decision that said that if the code that Copilot produces is not identical to the code that was trained on, then there's no violation of this copyright management removal claim. Now, this is important because the original claim in the DOE versus GitHub case was the violations of this, what's called Section 1202, the Copyright Management Information Law. They asked for $9 billion for that, and that's actually a lot of money for removing copyright information. But it comes with a statutory damage minimum of $2,500 per violation. And if you think about how many times, how many copyright works that are, how many copyright stuff notices might be affected by this, we're talking really large numbers. Okay, so that's before the ninth Circuit right now, I've got a brief in that particular case agreeing with GitHub that identicality should be required. But that's one case. The one that's closer and more significant for the generative AI cases, especially that are pending in the east coast is the Thomson Reuters vs Ross intelligence case. So Ross basically got several thousand headnotes from a vendor that it dealt with and trained, not a generative AI, but trained a model on the headnotes. And the Thomson Reuters is the owner of the Westlaw database and claims that use of the headnotes as training data is infringement. And at first the judge, who is the trial court judge said, you know, you guys are disputing about the facts. This needs to go to trial. And then he changed his mind and said, send me another set of briefs. And so then he decided that it wasn't fair use. And he was very influenced in that by his reading of the Warhol decision because he thought that the Ross Intelligence AI program was offering the same thing as the West Thomson Reuters tool, and therefore they had the same purpose and they were commercial. And therefore that was something that then meant that there's harm to the market. So that had a kind of substitutive effect in that view. Now, that judge actually said, you know, I'm not really sure about this. It may be that we should let the appellate court take a look at it.
[55:46]
Alan Rosenstein
So unsurprisingly, that judge was a former law professor himself, so you can see his sort of law professoriness all over those opinions.
[55:56]
Pam Samuelson
So that case is now pending before the Third Circuit. And the trial judge had to certify, yeah, appellate court, please take this. And then the appellate court has to say, yes, so that happened, and that is in the middle of its briefing schedule right now.
[56:15]
Alan Rosenstein
I think it also probably helps that the trial court was an appellate court judge sitting by designation. So he asked his colleagues for some help, and they agreed. Before we close out, I do want to make sure that I get your opinion on some stuff that's been happening in D.C. in particular, the truly wild story of the U.S. copyright Office and this draft report that they issued setting out some thoughts about fair use. This report, I think, was an interesting read. It was somewhat controversial when it came out, but it became really controversial because shortly thereafter, President Trump fired the head of the Copyright Office, Shira Perlmutter, though recently actually, a federal court has halted that and reinstated her. There's a very interesting separation of powers removal question that we're not gonna get into, obviously. But I. And again, I'm not a copyright person, so perhaps it's not surprising I had never heard of the Copyright Office. I did not know it was such a big deal. So just explain a little bit what the Copyright Office is, what their role is, whether these reports are, you know, are they binding? Are they just law review articles, effectively? What did they say? What. What. What is your read of. Of all of this, especially. And I should say I'll sneak this in as well, given that the administration's recent AI action plan actually says nothing about copyright, which was a notable omission, even though in releasing it, President Trump in the press conference kind of riffed a little bit about how copyright was crazy and you couldn't possibly build these models if everything was copyrighted. So I gave you a lot. I give you a big kind of D.C. stew of stuff. So just to close out, what do you make of all of this?
[57:54]
Pam Samuelson
One of the roles that the Copyright Office has is as an advisor to Congress. So the studies that it conducts are studies that, for the most part, some member of Congress asked them to write. So members of Congress know that there is this big AI thing going on out there, and who are they going to ask for advice about it? The Copyright Office. So the reports were. The training data report was the third in a series of reports about AI related IP type issues. The Office worked really, really, really, really hard on this. They got 10,000 comments on the questions that they posed about AI. And so sifting through all of those was not just an easy thing to do. And the bottom line in the report that the court that the Office issued just before she got fired was a report that basically said, some of these uses may be fair uses, and some of them may not. That's basically what they said. So sometimes it might be transformative. The more researchy, the more kind of educational, the more likely it's going to be. The more commercial it is, the less transformative. And that was the report that actually introduced the phrase market dilution for the issue that you were raising earlier about who will ever write a book again if generative AI will just flood the market with all kinds of other AI generated stuff. Now, the AI generated stuff can't be copyrighted, so you can't really make a lot of money if you can't control a copyright. So the stuff is in the public domain. So authors actually have some benefits from copyright in the human authored works. So I think that they, in fact, would do pretty well. But, you know, there's a real serious empirical question about which, if any, sectors of the copyright industries will be harmed. And, you know, the motion picture industry kind of likes generative AI because it comes up with some pretty cool things. And they've been using computer generated stuff in their movies forever. You know, as long as there's been computers, they've been using computers to do some of the kind of like scenes with space aliens and stuff like that. So. But back to the question, like, what about Shara Perlmutter? Well, Trump didn't have the authority to hire fire her. Only the, only the Librarian of Congress can fire the Register of Copyrights. And there isn't a Librarian of Congress right now because Trump fired her and he tried to put somebody in place as her successor. And that person has to be Senate confirmed in order to be a Librarian of Congress. And so we are at a stalemate right now. And so Shira prevailed at the D.C. circuit, and we'll see whether the government decides to take that up to the Supreme Court. Shadow docket's been getting a lot of work lately, and so maybe they'll take this one on too.
[61:51]
Alan Rosenstein
I could spend the next two hours talking to you about copyright, but I think we're gonna have to leave it here. Thanks so much, Pam. And we'll have to. We'll get you on in the next turn of the screw on all of these cases. Really appreciate it.
[62:04]
Pam Samuelson
Okay, sounds great. Okay, thanks. Bye.
[62:09]
Kevin Frazier
Scaling Laws is a joint production of lawfare and the University of Texas School of Law. You can get an ad free version of this and other lawfare podcasts by becoming a Lawfare material supporter at our website, lawfairmedia.org support. You'll also get access to special events and other content available only to our supporters. Please rate and review us wherever you get your podcasts. Check out our written work@lawfairmedia.org you can also follow us on X&BLUESKY and email us@scalinglawslawfairmedia.org this podcast was edited by Jay Venables from Goat Rodeo. Our theme song is from Alibi Music. As always, thank you for listening.
[62:54]
Aviva Software Advertiser
If you're listening to this, you're ready. Ready to join the Industrial Intelligence generation. A generation defined not by age, but by a shared mindset to connect teams, accelerate efficiency, and drive innovation. Using the power of Aviva software to reshape industries, turning real time insights into real world growth. Discover our stories@industrialintelligence.com join Generation I.