Lawfare Archive: Pam Samuelson on Copyright's Threat to Generative AI - The Lawfare Podcast

Summary7 min read

THE LAWFARE PODCAST
Lawfare Archive: Pam Samuelson on Copyright's Threat to Generative AI
July 17, 2023 (Archived Episode for May 10, 2026)
Host: Alan Rosenstein
Guest: Pamela Samuelson

Episode Overview

This episode features a deep dive into the conflict between copyright law and the development of generative AI. Host Alan Rosenstein interviews Pamela Samuelson, an influential scholar of digital copyright law, following the recent spate of lawsuits involving major tech companies’ use of copyrighted works to train AI models. The discussion wrestles with foundational copyright concepts, the evolving legal landscape for AI training practices, and what solutions or compromises may emerge in the face of technological disruption to creative professions.

Key Discussion Points and Insights

Understanding Copyright Law

Basic Principles: Copyright incentivizes creativity by granting exclusive rights over original works for a limited time.
- "Copyright essentially allows the original works of authorship to be protected from the moment they're first fixed in a tangible medium under U.S. law." – Pamela Samuelson [03:14]
Automatic Attachment: As soon as a work is created in tangible form, copyright attaches, even to everyday items, though commercial relevance varies.
- "Every photograph you take and every grocery list you write, the copyright kind of attaches to it automatically." – Pamela Samuelson [03:54]

The Conflict: AI Training vs. Copyright

Training Data and Creative Professions: AI models are trained on huge datasets, often drawn from the open internet. Professional creators fear loss of income due to competition with AI-generated outputs.
- "The big issue is really about ingesting works as training data. ... Professional writers and graphic artists...are worried about the competition between what they charge money for and what you can get ... from these generative AI systems." – Pamela Samuelson [05:23]
Human vs. AI Learning: Rosenstein draws a parallel between how humans learn from past works and how AIs train, raising the question of why one is legally permissible and the other may not be.
- "We are all, in a sense, recombination engines of stuff that's gone before us. That's how we learn ... Presumably I can't be sued just because I used a bunch of people as my training data. So why can a generative AI system?" – Alan Rosenstein [07:45]

Analogies, Precedents, and Fair Use

Google Books Case as Precedent: Samuelson draws on the Author's Guild v. Google case, where Google’s digitization of millions of books for indexing was found to be fair use because it didn’t compete with original works.
- "If you couldn't supplant demand, if you were just using it for computational purpose, then you weren't exploiting expression. And what copyright law protects is the exploitation of expression." – Pamela Samuelson [08:30]
Fair Use Factors: The purpose of use, nature of the work, amount used, and effect on the market are key.
- "The court said the purpose was transformative because it was a different purpose, ... As long as I'm not spitting out something that's substantially similar to any particular input ... then it probably should be fair use." – Pamela Samuelson [15:05] & [18:41]

The Litigation Landscape

Major Cases: At the time, there were 10 key lawsuits, including Getty Images v. Stability, Anderson v. Stability, and two fresh suits against OpenAI and one against Meta.
- "There's lots, lots of stuff going on, but these are the main cases. ... I think the copyright cases are the ones that are of greatest interest." – Pamela Samuelson [10:45]
What’s at Stake: Outcomes range from monetary damages to, in extreme cases, court orders to destroy infringing models.
- "The biggest threat for them is that the courts decide that the training data copying is infringement and then orders the destruction of the models. That would be something really amazing." – Pamela Samuelson [13:44]
- "The OpenAI lawsuit involving GitHub ... asked for $9 billion." [13:44]

Policy Arguments and Social Implications

Not Just Competition Fears: Creators fear not only direct copying but broad devaluation of creative labor.
- "They're going to be priced out of the market ... There's a further irony that they are ... agents of their own destruction, because of course, it's their content that's being used against them." – Alan Rosenstein [26:03]
“Democratization of Creation”: Samuelson acknowledges the downside but stresses that opening up creative tools benefits society and is consistent with copyright’s broader mission.
- "There's a kind of democratization of creation, which overall is probably a net positive in terms of copyright policy." – Pamela Samuelson [27:03]
Collective Licensing?: Policies like collective licensing may offer compensation but are difficult at the vast scale of AI training.
- "How are you going to get 25 cents to each of 600 million authors, and where are they and how do you get the money to them?" – Pamela Samuelson [30:28]

Legislative, Judicial, and International Contexts

Congress & Copyright Office: While Congress holds hearings, courts are more likely to be decisive; the Copyright Office is engaged but its recommendations may not be binding.
- "I'm going to be surprised if Congress does anything more than just hold hearings. ... But courts are faced with cases that are pending, and those cases are going to keep going..." – Pamela Samuelson [31:57]
Global Competitive Pressures: Countries like Japan, Israel, and China are adopting more permissive regimes to foster AI innovation, with Europe creating exceptions for text/data mining with opt-outs for rights holders.
- "If the US decides not to treat generative AI systems and training data as fair use, then some companies will move their basis of operation elsewhere." – Pamela Samuelson [33:45]
- "Europe ... adopted two exceptions to copyright rules to enable text and data mining." – Pamela Samuelson [36:30]

The Road Ahead and Pragmatic Perspectives

No Easy Answers: Samuelson anticipates a lengthy and inconclusive period of litigation, suggesting fair use defenses are plausible but unsatisfying for many creators.
- "I don't think that we're going to get a conclusive answer to these questions for at least three years, probably a little longer than that." – Pamela Samuelson [41:24]
- "Fair use defenses ... seem pretty plausible to me. ... Copyright can't solve all the problems of the world." [41:57]
Societal Adaptation: The transformation of work by AI may require policy shifts outside copyright, such as considering universal basic income in response to job displacement.
- "One of the things that we are going to be contending with ... is that generative AI systems and AI systems more generally are going to displace jobs that people have had for a long time." – Pamela Samuelson [39:17]
- "What we'd like, I think is for AI systems to be our co pilots ... to make us more productive, ... and to be able to ... spend more leisure time doing fun things." [40:20]

Notable Quotes and Memorable Moments

On the core of copyright:

"If you create something valuable, you get the commercial benefits from it. And that's something ... professional writers, professional musicians, professional coders care about quite a lot."
— Pamela Samuelson [03:49]
On potential risks to AI companies:

"The biggest threat for them is that the courts decide that the training data copying is infringement and then orders the destruction of the models."
— Pamela Samuelson [13:44]
On the limits of copyright protection:

"Copyright law protects is the exploitation of expression."
— Pamela Samuelson [08:30]
On policymaking and global pressure:

"There is a great deal at stake here. ... If the US decides not to treat generative AI systems and training data as fair use, then some companies will move their basis of operation elsewhere."
— Pamela Samuelson [33:32]
On the tension between progress and creator rights:

"All of those works of authorship build on pre-existing works. And so what we care about is the ongoing progress not protecting particular people's jobs."
— Pamela Samuelson [29:00]
On the metaphor of AI as copilot:

"What we'd like, I think is for AI systems to be our co pilots, right? To help us in the creation of new works and not to displace us, but to make us more productive..."
— Pamela Samuelson [40:20]

Timestamps for Major Segments

[03:14] — What copyright law is and why it exists (Pamela Samuelson)
[05:23] — The fundamental conflict: AI models and creator concerns
[07:45–08:30] — Human learning vs. AI training; comparing practices
[10:45] — A landscape of litigation: Major lawsuits and stakes
[13:44] — Remedies and risks: From monetary damages to model destruction
[15:05–18:41] — Fair use doctrine and the Google Books analogy
[26:03–29:00] — Downward pressure on creative markets; “agents of their own destruction”
[31:57] — Congressional and Copyright Office roles; global competitiveness
[36:30] — European copyright policy: exceptions for text and data mining
[39:17–41:57] — Predictions, job displacement, and adapting to AI’s impact

Conclusion

Pamela Samuelson and Alan Rosenstein’s conversation provides a nuanced, balanced overview of the fraught intersection of copyright law and generative AI. The episode highlights the legal uncertainty, the policy challenges, and the real anxieties of professionals whose livelihoods are at stake, while also acknowledging the transformative—and potentially democratizing—promise of these emerging technologies. While legal battles are likely to continue for years, the ultimate answers may require innovation not just in law and policy, but in how society values creativity and adapts to change.

Loading summary

Transcript41 lines

[00:00]
Marissa Wong
The youth mental health crisis is growing and social media is a major driver. Kids are spending up to nine hours a day on screens, often unsupervised, and studies show a direct link to anxiety, depression and even suicidal thoughts. That's where Gab comes in. Gab offers safer phones and watches for kids with no social media. Tailored to every age, offering the right tech at the right time or tech in steps. From GPS tracking enabled watches for younger kids to phones with parent enabled apps for teens, each device allows kids to more safely grow their independence. Visit gab.com getgab and use code getgab for an exclusive offer. That's G-A-B-B.com getgab Gab Tech in steps independence for them, peace of mind for
[00:50]
Advertisement Voice
parents Close your eyes. Listen to Monday.com feel the sensation of an AI work platform so flow, flexible and intuitive. It feels like it was built just for you. Now open your eyes, go to Monday.com start for free, and finally breathe.
[01:19]
Marissa Wong
I'm Marissa Wong, intern at Lawfare, with an episode from the Lawfare archive for May 10, 2026. On May 5, a novelist and five major publishers filed a copyright infringement lawsuit against Meta and Mark Zuckerberg. The lawsuit accuses the tech giant of illegally using millions of copyrighted works to train their artificial intelligence program, LLAMA. For today's archive, I chose an episode from July 17, 2023, in which Alan Rosenstein sat down with Pamela Samuelson to discuss then current litigation on copyright issues for developers of generative AI models. The two also discussed these cases implications for the legal limits of using copyrighted material to train AI programs, and how the issue will develop in future litigation.
[02:22]
Alan Rosenstein
This is the lawfare Podcast. I'm Alan Rosenstein, Associate professor of Law at the University of Minnesota and Senior Editor at lawfare. To explore these issues, I spoke with Pam Samuelson, who was the Richard M. Sherman Distinguished professor of Law at the University of California at Berkeley and one of the pioneers in the study of digital copyright law. She's just published a new piece in the journal Science titled Generative AI Meets Copyright, in which she analyzes the current litigation around generative AI or and where it might lead. It's the Lawfare podcast July 17 Pam Samuelson on copyright's threat to generative AI Let me start by asking a very general question, since we don't usually have much cause to discuss issues of copyright law and lawfare, so both for my sake as a non copyright law specialist and also for the sake of our audience, what is the core idea behind
[03:15]
Pamela Samuelson
Copyright law, the idea is that people often need incentives to be creative, to create books or music or other things. And if they want to make a living from their creations, they need to be able to have some exclusive rights, at least for some period of time. Copyright essentially allows the original works of authorship to be protected from the moment they're first fixed in a tangible medium under U.S. law. And so essentially, every photograph you take and every grocery list you write, the copyright kind of attaches to it automatically. Obviously, those are not things that typically are commercially valuable, but nevertheless, copyright attaches to them. And when they are commercially valuable, then copyright law gives the owner of the rights an ability to control the commercial exploitations of their works. When somebody creates something that in fact is commercially valuable, then they have the exclusive right to, in fact, exploit the work and to authorize people to make derivative works of their creations. And that way, if you create something valuable, you get the commercial benefits from it. And that's something that lots of people who are professional writers, professional musicians, professional coders care about quite a lot.
[04:46]
Alan Rosenstein
And this, please correct me if I'm wrong, this is a right that attaches automatically. So you don't have to actually write on your grocery list or your photograph or your whatever, that little C with a circle, right? This is copyrighted. This is just a thing that attaches because you wrote it and then you created it, and then you get whatever rights that come with that.
[05:04]
Pamela Samuelson
Yes, that's exactly right.
[05:06]
Alan Rosenstein
So now we've established sort of the idea behind copyright law at a high level. What is the potential conflict between copyright law, on the one hand, and generative AI on the other hand? Since, of course, the whole idea, hopefully, behind generative AI is that it's creating new works. It's not just reproducing existing works.
[05:24]
Pamela Samuelson
So the biggest issue for a lot of professional writers and other kind of people who are professional creators is that the generative AI is trained on data. And where is there a lot of data? It's out there on the Internet. And so if a photographer has put up some of his images on the Internet, or a blog is up on the Internet, it's probably going to be used as training data for some of these generative AI systems. There are sites, for example, which have a lot of digital visual art. And one of the lawsuits involves visual artists basically claiming that you ingested our work as training data. And the reason that you can Midjourney and the other generative AI image systems generate really nice images is because of the quality of the images that you used as training data. And so even if the output is not a really close resemblance to the input data, the quality of the output is due to the quality of the inputs. And so the big issue is really about ingesting works as training data. And of course, a lot of the professional writers and graphic artists and so forth are worried about the competition between what they charge money for and what you can get for, let's say, 20 bucks if you use one of these generative AI systems. So they're worried. The screenwriters Strike, for example, is worried about the use of generative AI by the studios to kind of write scripts about this, that, or the other thing. And they're worried that they won't have a job anymore or at least that, that they won't be paid as much for scripts that they might contribute to. So the sort of the job loss issues loom large for a lot of the professional creators.
[07:45]
Alan Rosenstein
Intuitively, I understand the argument that using copyrighted works for training data might cause an issue. But at the same time, what is the difference between a generative AI system using this as training data and me, for example, using this as training data? Right, there's that famous, probably apocryphal, but it's so good, it's kind of too good to check and quote from Pablo Picasso, that good artists imitate and great artists steal. We are all, in a sense, recombination engines of stuff that's gone before us. That's how we learn to paint or write or make music or code or write larvae articles. So presumably I can't be sued just because I used a bunch of people as my training data. So why can a generative AI system.
[08:31]
Pamela Samuelson
Well, I think that the people who design these systems believe that if you take data from the open Internet, you scrape data from the open Internet, that you aren't hurting anybody. You are using it not to essentially exploit the expression in the work. What you're doing as kind of decomposing the works into very small units, which the computer scientists call tokens. And you tokenize things in a way that allows for computational uses and to really trying to understand what words are likely to be next to what words. Their view is that this is a lot like the Google Books case. In that case, the Authors Guild sued Google for digitizing millions of in copyright books from research library collections. And the courts eventually found that that was fair use because Google wasn't trying to exploit the expression and you couldn't essentially get enough expression from the books to essentially supplant demand for the original. And if you couldn't supplant demand, if you were just using it for computational purpose, then you weren't exploiting expression. And what copyright law protects is the exploitation of expression. So from the standpoint of the technologists, this looks like what we're doing is very much like what Google did. And Google was doing it from research library collections, we're doing it from the open Internet. And web scraping is just something that people do all the time and therefore it must be fair use because it's been allowed for years and years.
[10:32]
Alan Rosenstein
What are the main cases that you're tracking in this wave of litigation and sort of where are they in the process? And do you have any sense of how they're going to resolve, or is it sort of too early to know?
[10:46]
Pamela Samuelson
Well, I think it's really too early to know in all the cases, but I counted that There are now 10 lawsuits. The ones that I'm following the most closely are the Getty Images versus Stability case. There's one filed in the US and one filed in the uk again about the stable diffusion and about the training, data and outputs as infringement. Similar claims in the Anderson class action lawsuit against Stability. And the same lawyer is handling the Anderson case and the do vs GitHub case, which is about copilot. So it's about software, not about visual art. And then just in the last week or so, there's been three new lawsuits, two against OpenAI and one against Meta based on books. So there's lots, lots of stuff going on, but these are the main cases. There's also another case that's more focused on privacy issues, which is also against OpenAI. But I think the copyright cases are the ones that are of greatest interest.
[12:03]
Alan Rosenstein
Is your sense that the companies that have developed these generative AI systems, are they surprised? Did this catch them off guard or was this inevitable and they understood it and they decided, well, it's just the cost of doing business and building these amazing technologies is that we're just going to have to deal with this one. This comes down the pike.
[12:22]
Pamela Samuelson
I don't think it was a complete surprise. What you have with OpenAI and with Meta are companies that have very high valuations and they're doing something that's quite novel. And it certainly has, I think, probably surprised them just how angry some of the visual artists and some of the authors, professional authors groups have been attacking them. But the lawsuits, I think not a huge surprise. But I, I'm sure that the companies, Microsoft, Open AI and Meta have had to do Some risk analysis. And they wouldn't have gone forward with these projects if they didn't think they had a pretty strong case.
[13:13]
Alan Rosenstein
And in terms of that risk analysis, what is the range of outcomes in these cases? I mean, obviously we. One possibility is the courts find that there are no copyright issues. The other possibility is that the courts find that there are some copyright issues. And then there's this question of what the remedy is in those cases. In particular, are we looking at, it's going to cost them some money, but it's not that big of a deal. Or are we looking at, wow, this could stop generative AI in its tracks because obviously without the training data, these models are useless.
[13:44]
Pamela Samuelson
So I think the biggest threat for them is that the courts decide that the training data copying is infringement and then orders the destruction of the models. That would be something really amazing. But it is quite possible as an outcome course have the authority to order impoundment and destruction of things that are copyright infringements. I'm not predicting that that would be an outcome. It may be that damages would suffice. But the OpenAI lawsuit involving GitHub, that complaint asked for $9 billion. And I mean, Microsoft has $9 billion, but $9 billion is a lot of money.
[14:48]
Alan Rosenstein
Fair use, the concept of fair use is likely to be a major defense from AI companies. And so I'd like you to sort of explain again generally the idea behind fair use and also how it's likely to apply in these cases and what parts of fair use in particular are likely to be most relevant.
[15:05]
Pamela Samuelson
So the copyright statute in the United States says that fair uses of copyrighted works are not infringements. It directs courts to take certain factors into account in making a fair use decision. So what purpose did the putative fair user have when making use of an existing work? What's the nature of the copyrighted work, how much was used, and what kind of effect does that have on the market for or value of the work? Let's go back to the author skill B Google case for a minute because that's the closest analog to the generative AI fair use cases. So what was the purpose of Google scanning millions of in copyright books from research library collections? It was to create a database so that it could engage in computational uses of the books and the contents of the books, including for enabling snippets to be created of content. So if you're looking for information about Buffalo, New York, you can ask a question in Google Search engine and you get a little snippet from a book that talks about the city of Buffalo, and maybe that will satisfy your curiosity. Maybe you just wanted to know what the population of the city is, and you can get that kind of information through Google Book search. So the purpose was quite different than the purpose for which the books were initially marketed. So the court decided that that meant that when it was done for a different purpose, it wasn't competing with the book as a commodity. It was allowing people to get information and allowing Google to be able to make information available to people. And that that was actually a positive thing. So the court said the purpose was transformative because it was a different purpose, the nature of the copyrighted works. They were old books in the research library collections, but the court didn't give very much weight to that. Now, generally, spe speaking, if you copy the whole thing, that doesn't necessarily turn out to be a good thing, right? It usually cuts against fair use. But if you want to index the contents of books and if you want to be able to serve up snippets, you in fact have to copy the whole thing. And so the court basically decided that that was reasonable in light of the purpose. And the court said, no, there's no harm to the market for the books because Google basically isn't serving up ads next to the snippets. It just in fact has links to places where you can buy the books that snippets are shown of. And so they're not undermining the market for the book because they're not really supplanting demand for purchases of the books. And so the court kind of, on balance decided that that was a fair use. And there will be similar kinds of arguments made by OpenAI and by stability that, gee, my purpose is very different than the purpose of the original. And the works are creative, but you put them up on the open Internet and so that makes them fair game. Again, you copy the whole thing, but that's necessary if you want to be able to essentially create these language models or image models. As long as I'm not spitting out something that's substantially similar to any particular input of the training data, then it probably should be fair use. That's the kind of argument they're going to be making. Now, again, I think the get an Images case is one that's going to be a little tougher for stability to win, because Getty Images says, hey, I've got a licensing program for making my photographs available as training data. And so you're interfering with a market opportunity that I have. And so that's going to be, I think, a big issue in that particular case.
[19:50]
Advertisement Voice
This episode is brought to you by Bill, the intelligent finance platform that helps businesses and accounting firms scale with proven results. When you're growing a business, the stakes get higher. You can't afford infrastructure that breaks under pressure. If you care about security, reliability and scale, I want to let you in on a secret. Bill is the foundational software that nearly half a million businesses and 90 of the top 100 US accounting firms used to automate back office workflows, add secure controls to payment processes, and scale without increased overhead. With AI powered Accounts Payable automation, Bill erases the busy work from capturing invoices, routing approvals, and processing payments syncing seamlessly with the top accounting software platforms so your books are always accurate and But Bill isn't just accounts payable, it supports the full payments workflow. Bill has processed over $1 trillion in transactions, leveraging that expertise to help you manage, move and maximize your finances. So stop the guesswork and start scaling with the proven Choice. Go to Bill.comProven to talk with a payments expert and get a $250 gift card as a thank you. That's Bill.comProven terms and conditions apply. See Offer page for details. Deleteme makes it easy, quick and safe to remove your personal data online at a time when surveillance and data breaches are common enough to make everyone vulnerable. Look, it does all the hard work. You give it the information that you want to get rid of from the public domain and it does the job of wiping you and your family's personal information from data broker websites. It isn't just a one time service. Delete Me is always working for you, constantly monitoring and removing the personal information you don't want on the Internet. The data brokers don't quit. They keep putting stuff about you back where the bad guys can get it. And Delete Me doesn't quit either. It keeps taking it down. And it sends you regular personalized reports showing what information they found, where they found it, and what they removed. That's why the New York Times Wirecutter has named Delete Me their top pick for data removal services. I'm somebody with an online presence. I do a lot of commentary on things. I don't hold back on my opinions. I have people out there who really don't like me. And yet my privacy is important to me. I don't want things that I don't want about myself in public to be made public. I don't want people knowing where I live or knowing what my car's license plate is they one time somebody defaced my car. I've been a victim of identity theft harassment. And if you haven't, you probably will be at some point and you probably know someone who has. Delete Me can help. So take control of your data and keep your private life private by signing up for Delete Me now at a special discount for our listeners. Get 20% off your Delete Me plan when you go to JoinDeleteMe.com lawfare20 and use the promo code lawfare20 at checkout. The only way to get 20% off is to go to JoinDeleteMe.com Lawfare20 and enter the code lawfare20 at checkout. That's JoinDeleteMe.com lawfare 20 code lawfare20
[24:17]
Pamela Samuelson
what
[24:18]
Marissa Wong
if you could finish your college degree faster and save thousands while doing it? With Study.com, you can earn online college credit for $95 a month. Choose from hundreds of self paced online college courses you can study from your phone or computer on your schedule and earn college credit that Transfers to over 2000 colleges and universities nationwide. So whether you're looking to knock out your gen eds or complete your degree, Study.com gives you a smarter, more affordable way to move forward. Go to study.com podcast to start earning college credit and get 10% off your first month. That's study.com podcast. Shipping, billing, admin, payroll, marketing. You're managing all the things, so why waste time sending important documents the old fashioned way. Mail and ship when you want, how you want with stamps.com print postage on demand 24. 7 and schedule pickups from your office or home. Save up to 90% with automated rate shopping. That's why over 1 million small businesses trust stamps.com go to stamps.com and use code podcast to try stamps.com risk free for 60 days.
[25:30]
Advertisement Voice
My wife won't stop talking about Jerry. Jerry says he saved us money on car insurance. Mine too. Found her a better rate and didn't waste her time.
[25:40]
Alan Rosenstein
Who is this guy?
[25:41]
Marissa Wong
Babe? Jerry checked again and found us an even better rate. Pulled 20 quotes from top insurers, showed them side by side and helped me switch policies in the app.
[25:52]
Advertisement Voice
It's a car insurance app?
[25:53]
Alan Rosenstein
Yep.
[25:54]
Advertisement Voice
Let's just never happened.
[25:56]
Marissa Wong
Do yourself a favor. Visit Jerry AI Acast.
[26:04]
Alan Rosenstein
So one of the factors you pointed to in the Google book case was the idea that the output of that was not really going to compete in a meaningful way with the authors of those books. And so that's one thing that would Cut in favor of a fair use determination. It does seem in this case though, that you have a lot of really worried artists and coders and musicians who are not claiming that their work is directly going to compete with them. Obviously it's going to be transformed, but that the output or the outcome, which are these incredible models, are going to basically drive the costs of this creative work down to very low amounts. And therefore they're going to be priced out of the market, essentially. And that there's sort of a further irony that they are in a sense the kind of agents of their own destruction, because of course, it's their content that's being used against them. Does that seem like a meaningful difference to you between these cases and the Google Books indexing case?
[27:04]
Pamela Samuelson
Well, that's certainly what some of the professional writers and visual artists are arguing, and I have some sympathy with that. But remember, advances in technology have made lots of creations possible that compete now. So professional photographers, for example, today are having a tougher time because the quality of the images that we are all able to generate, even on our phones, the quality of those images makes it possible for people to essentially use Creative Commons instead of hiring a professional photographer for certain kinds of images that they want to be able to use on their websites or for ads or whatever. And so it does seem to me that lots of tools to make fan fiction and the like essentially means that there's kind of more competition. That isn't necessarily a bad thing. In fact, you know, there's a kind of democratization of creation, which overall is probably a net positive in terms of copyright policy. You know, what is the purpose of copyright? It is to promote the progress of science and the useful arts, that is to say, to encourage the creation and dissemination of original works of authorship. And all of those works of authorship build on pre existing works. And so what we care about is the ongoing progress not protecting particular people's jobs. So if the outputs are substantially similar to particular inputs, that's actually something that copyright law would treat as an infringing copy or an infringing derivative work. But very often what's going to be outputted is going to be something that's very different from any particular input. And insofar as that's true, that's not something copyright law has done before. It has not extended that far. The Anderson lawsuit, for example, claims that every image that's outputted by stable diffusion is an infringing derivative work, because it essentially is derived from the training data, which is derived from the images that were copied in the course of the training data. And that's just a stretch from the standpoint of copyright law. Now, again, Congress is going to be having hearings about this. The Copyright Office has already had a series of listening sessions in which people who have ideas about what copyright law should do about these generative AI systems, they heard some criticism, they heard some praise, and they will be having a notice of inquiry sometime this summer and asking people for comment, and then probably writing a report sometime later this year or early next year and making recommendations to Congress. And I know in Europe, one of the things that people are talking about is a possibility of some sort of collective license so that creators can get some compensation for the use of their works as training data. But when we're talking about the stable diffusion was trained on a data set of, I think, 600 million images, like, how are you going to get 25 cents to each of 600 million authors, and where are they and how do you get the money to them? So it's not going to be an easy thing to solve through a collective license. But that's another one of the issues that Congress will probably have to contend with.
[31:13]
Alan Rosenstein
So you just mentioned both Congress and Europe, which is very helpful because that's where I wanted to take this conversation next. So let me ask a few questions about that. Obviously, we've been talking so far largely in the context of a judicial case and judicial doctrine, but it sounds like if there's going to be a comprehensive solution to this, it's going to come at least in part, from the political branches, Congress and the executive branch. So I was hoping you, you could talk a little bit at a high level about what Congress and the executive branch's role is in setting out copyright law and copyright policy. And then, as it applies in the case of generative AI, what are the interventions that you'd expect Congress and the Executive branch to make, and in particular, what interventions you think they should make?
[31:57]
Pamela Samuelson
Well, one of the things that Congress has done and will do is hold hearings and invite people to make some presentations. And there's already been one congressional hearing about generative AI, and I expect there will be more in the future. But I'm going to be surprised if Congress does anything more than just hold hearings. The Copyright Office, I think, is very focused and very aware of the consternation about generative AI that has been raised by some author groups and by some visual artists and also by some of the people in the music industry. And so I think they're going to be pretty sympathetic to the concerns of the professional creators. At the same time, you know, they can make some pronouncements, but they, you know, courts are faced with cases that are pending, and those cases are going to keep going unless the motions to dismiss are granted. There is actually one motion to dismiss in the Anderson case next week, so we'll see what happens with that. But the Copyright Office can make a recommendation and it can offer its own interpretation of copyright law, but courts may or may not find that interpretation to be persuasive. So, you know, I think they will do a careful and thorough job because they realize that there is a great deal at stake here. But one of the things that's at stake is US competitiveness in the international marketplace for generative AI systems. And the Ministry of Justice in Israel, for example, has come, has published a paper essentially arguing that ingesting in copyright works is for as training data is fair use and that there isn't infringement unless the output is substantially similar. And that will attract some investment in developing generative AI. And Japan also has a very broad exception to enable text and data mining. And they too, want to be leaders in the field of generative AI, and China wants to also. And so there are. If the US decides not to treat generative AI systems and training data as fair use, then some companies will move their basis of operation elsewhere. And so there's a kind of countervailing interest for the United States because generative AI right now is a big industry for the US And US Firms are doing very well with it. And so you want the industries to be successful. And of course, there will be more generative AI systems developed in the next three to five years. And it's not clear at this point what the legal situation is going to be. I think all the companies who are defending these cases are well represented by good lawyers, and so they will put up a good fight. But it will be up to the courts to really decide this, I think. More than Copyright Office and more than more than Congress.
[35:43]
Alan Rosenstein
What about Europe's role in all of this? So you already mentioned that there are plans in Europe or proposals for a compensation system. I want to ask more generally about Europe and its effect in copyright policy in this space, in part because obviously Europe has been, I think, much more proactive when it comes to regulating technology over the last several years than the United States has been. And I think there's a perception, at least in some parts of tech policy, that this is kind of Europe's world and we just live in it. And that the Brussels effect is honestly more important than the DC effect. And I'm curious if you agree with that and if so, sort of what you think the effects of whatever Europe is going to do will be on the ecosystem of generative AI.
[36:30]
Pamela Samuelson
Well, I think before generative AI was a thing, Europe actually went through a copyright revision process and decided that essentially ingesting copyright works for text and data mining purposes should be lawful. So adopted two exceptions to copyright rules to enable text and data mining. One is for nonprofit research institutions and if they do text and data mining copying, that's actually completely exempt from copyright liability. It was based on that that a German research institution essentially created essentially a training Data set of 5.8 billion works from the open Internet. That database is available on an open source basis for anyone who wants to use it as training data. So that's exempt from liability. As I understand it, under European copyright law, there is a separate one for non research institutions, that is to say for companies and the like that might want to engage in text and data mining. So text and data mining is also lawful by, let's say, commercial firms such as Microsoft, but there's an opt out allowance for that. So if you are a copyright owner and you don't want your work to be used for text and data mining purposes, you can opt out of the, of the text and data mining regime. And so that's the state of play in Europe and there are definitely firms that will opt out of text and data mining for commercial purposes.
[38:39]
Alan Rosenstein
I'd like to finish our conversation by trying to synthesize the many very interesting legal and policy issues that we've talked about and to ask you to the extent that you have a view of what the right answer here is, in other words, to the extent that's most compatible with existing law and also with the policy objectives of the copyright system, if you were the judge in these cases, what would be the principles that you would apply and that you would want to see in whatever long term settlement there is when it comes to these issues of copyright in the generative AI context?
[39:17]
Pamela Samuelson
Well, I have sympathy with the concerns of many of the professional writers and visual artists. I think that one of the things that we are going to be contending with, not just for them, but more generally, is that generative AI systems and AI systems more generally are going to displace jobs that people have had for a long time. So the copilot system that OpenAI and GitHub and Microsoft have been promoting is a way to essentially automate the creation of new computer programs and programmers have been making a lot of really good money in the last several decades and their jobs are at risk too. So we're going to have to think about universal basic income for people and finding ways for these systems to be tools that we can use for our good purposes. So the metaphor that GitHub and OpenAI have for this system that they are promoting of Copilot, that metaphor of Copilot is I think, a really powerful one. Obviously it's a trademark for OpenAI and GitHub. But what we'd like, I think is for AI systems to be our co pilots, right? To help us in the creation of new works and not to displace us, but to make us more productive, to make us able to create things more quickly and to be able then to spend more leisure time doing fun things. So that's the happy story out there. And I'm going to be surprised if generative AI gets shut down. But certainly there are people right now who are pretty intent on trying to stop them. And I don't think that's the perfect solution either. So I don't think that we're going to get a conclusive answer to these questions for at least three years, probably a little longer than that. I would guess that the Getty Images case will settle because that would be a sensible thing for stability to do. But these class action lawsuits seem to me to be just too remote from what copyright law has done been able to handle so far. So I'm, I'm kind of thinking that the, the fair use defenses that have been discussed so far, they seem pretty plausible to me. And so, you know, that's not going to make a lot of people happy, but it, it is something which, you know, copyright can't solve all the problems of the world.
[42:23]
Alan Rosenstein
Well, I think that's a good place to, to end this. Thank you so much, Pam Samuelson, for coming on the show.
[42:28]
Pamela Samuelson
Okay. Thank you for inviting me.
[42:33]
Alan Rosenstein
The Lawfare Podcast is produced in cooperation with the Brookings Institution. You can get ad free versions of this and other Lawfare podcasts by becoming a Lawfare material supporter@patreon.com lawfair. You'll also get access to special events and other content available only to our supporters. The podcast is edited by Jen Patya Howard and your audio engineer. This episode was Noam Osbind of Goat Rodeo. Our music is performed by Sophia Yan. As always, thanks for listening.
[43:21]
Marissa Wong
What if you could finish your college degree faster and save thousands while doing it? With Study.com you can earn online college credit for $95 a month. Choose from hundreds of self paced online college courses. You can study from your phone or computer on your schedule and earn college credit that Transfers to over 2000 colleges and universities nationwide. So whether you're looking to knock out your gen EDS or complete your degree, Study. Com gives you a smarter, more affordable way to move forward. Go to Study. Com Podcast to start earning college credit and get 10% off your first month. That's study. Com Podcast.