
AI companies wanted to train their language models on books. But they didn’t want to pay.
Loading summary
Veterans Service Announcer
Navigating post military challenges can be tough. Regardless of when you served, you are not alone. Connect with fellow Oregon veterans and find activities, navigate resources and join a community to help support your journey or challenges after military service. From mental health support to veteran community groups and activities, discover what's possible for you at BeyondTheMilitaryUniform.com that's BeyondTheMilitaryUniform.com.
Martine Powers
So you might have heard of the AI startup called Anthropic. This is the company behind the AI chatbot. Claude well, in 2024, executives at Anthropic ramped up an ambitious project that they'd hoped to keep quiet. It was codenamed Project Panama, and internal planning documents recently unsealed in legal filings described it as their, quote, effort to destructively scan all the books in the world.
Will Oremus
Project Panama was Anthropic's ambitious project to buy as many books as it possibly could. It would take them to a scanning center, it would slice off the spines, scan every page one by one, feed it into this digital library. It thought if it could get all the prose from all the books in the world, that it would be able to build the best AI chatbot of all.
Martine Powers
This week, technology reporter Will Aremus first reported on Project Panama's details exposed in these legal filings.
Will Oremus
It was really evocative that this company was literally destroying hundreds of thousands, maybe millions of books, because that's sort of what creatives are worried about, right? Like the destruction and the aggregation of their work into these gargantuan AI systems that are going to hoover up all the knowledge in the world. It was just sort of like a physical manifestation of that concern.
Martine Powers
Initial details about Anthropic's hunger for books emerged in documents filed in a copyright lawsuit. That lawsuit was brought by book authors against the AI startup in 2023. It was settled in August for more than a billion dollars. But documents from the case also show something else that the company may have crossed legal lines while pursuing all the world's pros.
Will Oremus
Even though it sounds really bad to try to destructively scan all the books in the world, this was actually the company's effort to do this in a more legal or more ethical way than everybody else was doing it. The alternative was to download them from these really shady pirate sites, which Anthropic.
Martine Powers
Had also done and is what ultimately landed them and other AI companies in legal trouble. From the newsroom of the Washington Post, this is Post Reports. I'm Martine powers. It's Thursday, January 29th. Today, Will tells the story of how one Silicon Valley company built its AI by buying, scanning, and destroying millions of books. Will, thank you so much for joining us.
Will Oremus
Thanks for having me.
Martine Powers
All right, so tell me how you first heard about Project Panama.
Will Oremus
The name Project Panama actually had not been reported before. Our amazing researcher Aaron Shaffer keeps all these alerts out for court cases that he's tracking and that various teams are tracking. And one of those sets of cases were these AI copyright cases against these big tech and AI firms. And by the way, it's not just Anthropic, it's pretty much all the tech giants or many of them getting sued with allegations of violating people's copyright in various ways. But he noticed that this case that had already settled, and we reported on it when it settled, Anthropic settled with the authors for $1.5 billion in August, had some new alerts on it. And so he went and he was like, oh, what's that? Why would there be new files in a case that's already settled? And it turned out that they were files that were part of the evidence in the case, but now they were significantly less redacted. So a bunch of stuff that was blacked out before was no longer blacked out. And part of that were the details of Project Panama. We knew some of the vague outlines of it. We didn't know what it was called or everything that was involved in it. And by the way, when we reached out to Anthropic for comment, they emphasized that the case has settled, that the judge in the case found that much of what they were doing was, in fact, legal.
Martine Powers
And just quickly, can you tell me about the lawsuit against Anthropic that these court filings came out of?
Will Oremus
So these were court filings in a lawsuit that was filed against anthropic in 2023. And it was part of this big wave of lawsuits by book authors suing pretty much all the tech giants, alleging that they had violated their copyright. It wasn't just book authors either. It was online publishers. It was news outlets. It was videographers, writers of screenpl photographers, visual artists as they started to realize that their life's work had been vacuumed up by tech giants and fed into ChatGPT and other AI models like that. They were like, hey, no. Why? Nobody asked us. Like, we didn't say you could do this. That is that legal? And we're still in the process of finding out the answer to that question.
Martine Powers
And when you were looking through these files, what stuck out to you here? Like, why did this kind of part about destructive scanning catch your attention?
Will Oremus
Well, first of all, that was just a really evocative quote, right? Like, destructively scan all the books in the world sounds like something that a James Bond villain would be trying to do. And it's interesting because Anthropic is actually. They sort of have this ethos of wanting to be seen as the good guys in the AI industry. This company was founded by people who split off of OpenAI, the maker of ChatGPT, because they were worried that OpenAI was straying from its original mission of, like, saving human from runaway AI. And here they were just. They're not literally destroying all the books in the world. I think they only needed one copy of each, not all the copies, but it was just a really evocative phrasing. And the name, Project Panama, we still don't know why they called it that, and they didn't say when I asked, but we just wanted to dig in and figure out what this was and why they were doing it.
Martine Powers
So tell me a little bit more about Project Panama. Like, how did it work?
Will Oremus
It might help to start with who they hired. At the outset of this project, they hired a guy named Tom Turvey. Now, most people don't know that name, but if you are familiar with the history of scanning books, you know that name. Because when he was at Google, he helped oversee the massive project called Google Books, where Google Books wanted to scan all the books in all the libraries and digitize them and make them available online. So Anthropic brought this guy on to help lead Project Panama, and what he did was not to just go out to publishers and authors, which is what I think the plaintiffs in the case would have preferred. There is some evidence from his deposition that he did a little bit of outreach to publishers and authors, but he quickly determined that this wasn't a viable strategy. It just wasn't practical to pay to license books on such a massive scale. So instead he settled on a different approach that included going to these massive used book warehouses with names like Better World Books, where you could buy hundreds of thousands of books at a time in bulk for the cheapest possible price. And then the cheapest and fastest way to scan them turns out to be. You know, it's kind of hard if you ever tried to spread out a book on a, you know, a Xerox machine, right?
Martine Powers
Yeah, yeah. It's like the part of them in the kind of near the spine always gets cut off, and you're just trying to, like, press down the Book as.
Will Oremus
Hard as possible and it's also just really slow. So instead they did this practice that they didn't invented, but it's a practice that's out there where you slice off the spine so that you don't have to worry about that anymore. And you can just quickly feed all the pages in to a machine that will scan them all.
Martine Powers
So they're just basically like ripping these books apart?
Will Oremus
Yeah, I mean it was actually very neat. It was more like a paper shredder than a kid tearing up pieces of paper. But yes, they were destroying the books. And then there was a proposal from a vendor in the court files that indicates they recycled all the materials afterwards. Because I guess who wants to store giant warehouses full of destroyed books once you've scanned them all? And the result was that they had the books in digital form and now they had this massive, massive digital library. They didn't get all the books in the world, but they got a lot. And now they could draw from that library to train their AI models. And we should say that. Better World Books did not respond to our request for comment.
Martine Powers
How many are we talking about in terms of the number of books that they were able to scan?
Will Oremus
So the exact number that they scanned is still redacted in the files. And keep in mind these filings are from a year or two ago, so they wouldn't have the updated numbers anyway. But there's evidence that they had at least one project proposal to acquire hundreds of thousands of books from one outlet. And a judge last year said they spent many millions of dollars to purchase millions of print books. And Anthropic didn't just go to these massive books used book warehouses to acquire books. They were trying everything. There's evidence in the files that they approached the Strand, you know, the famous bookstore in New York that promises 18 miles of books. A spokesperson at the Strand said that didn't end up happening. They also thought about approaching libraries to non destructively in this case to scan their books. More like the Google Books project where they keep the spines intact, but they were looking all over at all kinds of different sources, whatever they could get their hands on.
Martine Powers
Wow. What you're describing here almost strikes me as like a reverse Noah's Ark for books. Like you take one of each book and you take it to this warehouse to be scanned. But instead of saving the book, it's like the book is sort of sacrificed to the recycling gods in the pursuit of AI. I mean it's just, it's pretty bizarre.
Will Oremus
I love that metaphor. You Know, there are people in the tech industry who would say that it's actually more apt than we're giving it credit for it. They would say, look, in a way, we are saving this. I mean, the people who run Anthropic and a lot of the people in the AI world seem to truly believe that AI is going to be like everything, you know, in the future. You know, in a few decades, AI will be doing all kinds of intellectual work. It'll be what we turn to for everything. And so in their minds, I'm just hypothesizing here, to be clear, we don't have evidence of this in the court documents, but I think they might say we are saving these books. We're making sure that the content of these books, many of which by the way, are probably somewhat obscure, obviously they're not super rare and valuable books or they wouldn't be in these bulk warehouses. For the most part, they might say, we are saving these books. We're saving it for a world where AI does everything. Now, we made sure that the AI will know what was written over all these centuries of human authorship.
Martine Powers
Hmm. Well, I wanna talk more about why this came out in court and why people had concerns about this. I mean, obviously I too could go to a bookstore, probably a used bookstore. Cause I don't have enough money to buy all these books new, but I could go to a bookstore, buy a bunch of books, like do what I want with them and throw them out later. Why is this, I mean, I guess is this illegal? And why do people have concerns about what happened here?
Will Oremus
So the potentially illegal part was actually what they were doing before this used book buying. One of the things that was really interesting to me about this story is that as bad as it sounds to try to go out there and destructively scan all the books in the world, this was actually seen by a lot of people in the industry as a more legal or more ethical way of doing it than what a lot of the other companies were doing at the time. Authors have also sued Meta, which declined to comment for our story. And by the way, there are pending lawsuits still against OpenAI and Microsoft. There's a lawsuit against Google. They're all making broadly similar claims that these companies violated authors and publishers copyright. What some of these companies were apparently doing and what Anthropic apparently did before Project Panama was going to these vast repositories online that are known as shadow libraries. And these are unauthorized copies of millions of works that have been put together into these giant data sets. And they're traded around for free on the Internet via the software called Torrent software. And there is evidence that Anthropic torrented the entirety of two huge shadow libraries of books and other copyrighted works. There is evidence that Meta did the same thing. There are allegations that other tech companies did this too. And that's seen by a lot of people as even worse because there A, nobody's getting paid. Right. B, a lot of these sites are in legal trouble. They're like, you know, the FBI is not happy about their existence and is investigating them for all sorts of things. And sometimes they go dark because they've gotten busted by some authorities or another. And then there's a third thing involved, which is that when you use this Torrent software, you sometimes end up making the pirated copies available for other pirates to take for free. And so that's what Meta is still accused of doing. They're accused of while they were trying to download all these books secretly and for free from the pirate sites, they were also making stuff available for other pirates, which might be a more straightforward copyright violation, at least that's what's alleged than using them to train an AI.
Martine Powers
And then in terms of the buying of the books and scanning the books, was there anything illegal about that? How is that different from me just buying a book and scanning a book? Obviously on a probably smaller scale.
Will Oremus
Right. So with the caveat that I'm not an expert in copyright law, I want to amend something you just said, which is, yeah, we have this intuition like if I buy a book, I can do what I want with it. Right, but that's not quite true. You actually can. You can't buy a book, copy it, make your own version and then sell that. That would be pretty clear cut violation of copyright law. You can't buy a book and then digitize it and then sell it online or make it available for free online. What the judge said in this case was that Anthropic was actually doing something different. They were taking the books and transforming them into something else. They're transforming them into these AI models, including Anthropic's popular AI chatbot, which is called Claud and Claude is a fundamentally different product from a book. And so they're not competing directly with the books that they're acquiring. The judge found that this falls under a doctrine in copyright law known as fair use. This is where you can make use of copyrighted works without permission if you're using them for certain purposes. Often you can do it for teaching purposes. Right. In schools One of those purposes is you can copy somebody else's stuff without permission if you're then transforming it into some other innovative thing that isn't the same as the thing you copied and doesn't compete directly with the thing you copied.
Martine Powers
So the argument here from these AI companies is like, yes, we wanted to train our AI models, but what we're putting out isn't just copying these books, that it's a new thing, that it's this innovative thing that is not a copyright infringement, but instead like a thing onto its own.
Will Oremus
Yeah, exactly. So Anthropic's not out there saying, hey, you want to buy a John Grisham book? We've got him here for 10 cents apiece. Right. And in fact, in the court cases, there is some of the evidence turns out to be like, can you get Claude to spit out a whole verbatim copy of one of these books that the company acquired? And so far the answer is usually no. The chatbots don't do that. And I think the companies probably try to train them not to do that. And so that's one of the factors that the judge is weighing. But I should note here that this is really unsettled law.
Martine Powers
The comparison that I'm making in my head, and maybe I will age myself with this, is. Is Napster, right? That like in the Napster, where everyone was like just downloading random stuff off the Internet, you know, music and uploading it again and a lot of it wasn't legal and you didn't know where any of this music came from, you certainly weren't paying for it. That it's basically that was happening with books. But it sounds like the allegation here is that huge billion dollar companies were engaging in this, not just like a bunch of teenagers in their basements.
Will Oremus
Yeah, you're exactly right. I mean, Napster is a great point of comparison. Yeah, I'm old enough to remember like the stories of federal agents showing and knocking on the door to some teenager's basement where he's like downloading some Nirvana tunes or whatever. But in fact, there is a way in which this is sort of like the teenagers in their basements. I mean, there's evidence from the court filings, especially in the Meta case, that some of the engineers who were working on this project were really worried that what they were doing was illegal. There's chat logs between employees at Meta. There was one where an engineer who was working on downloading the shadow libraries said, quote, torrenting from a corporate laptop doesn't feel right. And Then later on, he shared a concern with the company's legal team that using these torrent sites could mean uploading pirated works to other people. And he said that it, quote, could be legally not okay. And then you started to get the sense that they were told, don't worry, this has all been approved. There was a quote from the filings that is going to refer to MZ that apparently Mark Zuckerberg, the CEO of Meta, after a prior escalation to MZ, Genai, has been approved to use Libgen for Llama 3 with a number of agreed upon mitigations. So that's a lot of jargon there. That's internal to the company, but basically it seems to be saying, yeah, go ahead, go ahead and do this stuff. You know, I know you're worried that it's illegal, but it's been approved, we're going to do it. But then you've also got people who defend this type of file sharing, people who say that information wants to be free. One of these famous shadow libraries, it's called Libgen, which is short for Library Genesis, actually arose out of this culture in Russia. When there was censorship and Russian academics couldn't get access to published research from around the world. They started, you know, finding ways to attain academic literature and putting it together. And it was this big cooperative underground project to be able to share with each other the accumulated knowledge of their academic peers around the world so that they could advance research. And this was seen by a lot of people as a really noble thing, trying to break down the walls to the collected knowledge that humanity has put out.
Martine Powers
After the break, why these AI companies value books so highly and how AI lawsuits might impact artists, writers and other creatives in the future. We'll be right back.
HelloFresh Advertiser
There's this moment every day, right around 5pm where my brain just shuts down. Not because I don't want dinner, because I don't want to decide dinner. Before hellofresh, it was always the same thing. What do we have? What can I make? How long will it take? Hellofresh didn't magically turn me into a chef. It just removed the hardest part, figuring it out. The meals are already planned, the ingredients are already portioned, and the steps are crystal clear. Every week, HelloFresh offers over 100 recipes. So I'm choosing what sounds good, not what I can survive making. I've tried a lot of HelloFresh recipes, from their deli style turkey wraps to their street cart style chicken bowls. And they're all fantastic. Though their Buffalo chicken melts are quickly becoming a new favorite. Go to hellofresh.com posttenfm to get 10 free meals plus a free Zwilling knife. A $144.99 value on your third box offer valid while supplies last. Free meals applied as discount on first box. New subscribers only varies by plan. This is the year you stop overthinking and start building. The year your side idea becomes something real. Founder, creator, business owner. It all starts with one decision, and that decision is launching with Shopify. Maybe it's a product your friends already asked to buy, a service you know you're great at, or a brand that's been living rent free in your mind. January is your window before another year slips by. 2026 is when you turn the idea into income. And Shopify is how you begin. Millions of entrepreneurs have already made the leap from household names like Heinz and Mattel to first time business owners just getting started. Choose from hundreds of beautiful templates that you can customize to match your brand. Create email and social campaigns that reach customers wherever they scroll. In 2026, stop waiting and start selling with Shopify. Sign up for your $1 per month trial and start selling today at shopify.com reports. Go to shopify.com reports. That's shopify.com reports. Hear your first this new year with Shopify by your side.
Martine Powers
Why have all of these AI companies been so desperate to attain this massive amount of digitized books? There's a lot of written word on the Internet that is free to access and easier to access, and they could just suck up. What is it about books that they were going to such extraordinary lengths, including buying physical copies from a used bookstore and scanning them very quickly one by one? Like, what is it about books that is so valuable here?
Will Oremus
Oh, don't worry. They were definitely also vacuuming up like the entire Internet and using that to train their models too. I mean, you know, more or less, the companies were trying to get their hands on all the created works that they possibly could. Whether that's a blog post, whether that's the Library of Congress, whether that's copyright filings over the decades, or whether that's books or videos, movies, screenplays, you name it, right? Books, though, were seen as particularly valuable because the Internet has a lot of detritus. You know, like there's a lot of cruddy writing across the Internet. There's a lot of stuff that's not true.
Martine Powers
No kidding.
Will Oremus
There's stuff that's written by bots for bots. There's stuff that's fake news and propaganda. And so the quality of any given Internet content is not very reliable. And you might end up with your AI model spitting out fake news or spitting out conspiracy theories or spitting out just pure nonsense, because that was in its training data. And so books were seen as especially valuable because by and large, I mean, there's bad books out there too. But by and large, books are carefully curated, they're edited. Somebody went to a lot of trouble to gather facts or to create a work of fiction and to craft the prose. And so Anthropic's executives are shown in the court documents talking about how valuable books can be. You know, this is the way that even though so Anthropic, by the way, is an underdog in this AI race. OpenAI, Microsoft, Google, Meta, these are much, much larger companies with a lot more money. Anthrapic thought a way that they can kind of bootstrap their way to competitiveness with these bigger companies was to focus on books and to focus on better quality training data.
Martine Powers
Wow. You know what you're saying there, I feel like there's a little bit like, I feel a little bit of. I don't know if it's like patriotism or pride in hearing you say that, this idea that. But even in this moment where it feels like nobody reads books anymore, and obviously AI is going to take over all of our kind of literary habits and in the future no one will even be writing books because AI can do it for us. But that in this moment, still the crowning achievement of human linguistics is still a book and in some cases a physical book. And that that still has a lot of value and that these companies recognize that being able to inspect and understand a book and how it works and how it's written is something that is. That is valuable to do.
Will Oremus
Yeah, I think you're exactly right. And, you know, authors should be proud of what they've produced. I think their objection to this practice is that they're not seeing any of that value. Right. The tech companies clearly value the work that these authors poured their years of their life into, in some cases their heart and soul into. But they still didn't want to pay the authors for it.
Martine Powers
Right.
Will Oremus
Like they didn't want to pay the publishers. I think what a lot of the publishers and authors would prefer is for these giant tech companies to come and say, hey, look, we want permission to use your work. We think it's really valuable and we're willing to pay you some for that. Right. We're going to pay you to license your work. That's what they would have liked to see. And that's in most cases, not what happened with books. Now, we have seen some types of licensing agreements with news outlets. In fact, we had to disclose in the piece that there's a content arrangement of some sort between OpenAI and the Washington Post. And in these cases, the tech companies are paying outlets in order to use their material in training, because news is another thing that you can't just, you know, you can't just find it in a reliable way for free on the Internet always. But again, that didn't happen in most cases with the books, and that's why you're seeing all these lawsuits from the book authors. I should also mention that there are other copyright lawsuits out there from photography wires, from video makers, from illustrators and visual artists. You know, everybody's suing the tech companies, and we just don't know yet how it's all going to shake up.
Martine Powers
Yeah. Are these lawsuits having an effect in terms of changing the expectations around whether these AI companies are paying artists, writers, authors, musicians, filmmakers in the future? That. That because they can't get away with just like stealing people's stuff for free, that they are, you know, standardizing some sort of process going forward to make sure that people are paid for the work that's consumed by these AI models.
Will Oremus
Both the books industry and the tech industry are watching extremely closely to see how the judges rule in these cases, because the details of their decisions will shape whether the tech companies have to pay for the stuff that they train their models on, how much they have to pay, what they can get away with as fair use and what they can't. And so absolutely everybody's trying to see how this will shake out. In the Anthropic case in particular, the judge issued kind of a nuanced ruling. He dismissed the parts of the case where the authors were alleging that Anthropic broke the law by training its AI models on their books without permission. He also dismissed the part where they scanned all those books as part of Project Panama. He said that was probably actually okay. That was fair use. What Anthropic got in trouble for were the books that it pirated that it didn't use to train its AI models. As weird as that sounds. His decision was, as long as you're using this to train AI models, that's fair use. You're transforming the books into something else. But all those books you downloaded and then didn't use, that's just. You just created A library. You just created your own library of books. And that actually could be illegal copying. And that's what Anthropic settled with the authors to avoid going to trial over. They ended up settling for $1.5 billion. But in the Meta case, the ruling was a little different. The judge ruled that the author's lawyers, he basically was like, you guys, you missed the point here. Like, you have to show me that this is hurting your ability to sell books. I'm sure there's a million ways you could have showed me how an AI could hurt your ability to sell books, but you guys just never did that. And so I can't find that they violated the law here. But that judge thought other plaintiffs in other cases should be able to show that it's hurting their book sales or their, you know, their ability to sell a screenplay or to be a graphic artist in Hollywood or wherever, whatever it may be. So, again, a lot of this is still unsettled and different judges are going to be arriving at different conclusions because there's never been a case exactly like this before. The idea of hoovering up all the world's knowledge and putting it into AI models is just not something that the original copyright laws had explicitly anticipated.
Martine Powers
Will, this is so fascinating. Thank you so much for explaining all this.
Will Oremus
Thanks again for having me.
Martine Powers
Will Oremus is a tech reporter for the Post. That's it for Post Reports. Thanks for listening. Today's episode was produced by Renny Svrnovsky. It was edited by Dennis Funk and mixed by Sam Behr. Thanks also to Aaron Shaffer and Tom Simonite. We would love to hear what you think about this episode. If you've got thoughts, questions or ideas for future shows, give us a shout. Send an email or a voice memo To Post Reportsheat post.com I'm Martine Powers. We'll be back tomorrow with more stories from the Washington Post.
Veterans Service Announcer
Navigating Post military challenges can be tough. Regardless of when you served, you are not alone. Connect with fellow Oregon veterans and find. Find activities, navigate resources and join a community to help support your journey or challenges after military service. From mental health support to veteran community groups and activities, discover what's possible for you at BeyondTheMilitaryUniform.com that's BeyondTheMilitaryUniform.com.
Date: January 29, 2026
Host: Martine Powers
Guest: Will Oremus, Technology Reporter at The Washington Post
This episode investigates Anthropic’s covert initiative known as Project Panama, an effort to physically buy, scan, and digitize millions of books to train its AI chatbot, Claude. Drawing on new details from unsealed legal filings, host Martine Powers and tech reporter Will Oremus delve into the ethical, legal, and cultural implications of “destructively scanning” printed books. The conversation also situates this project in the broader context of the AI race and ongoing lawsuits over copyright infringement, exploring what’s at stake for authors, tech companies, and the future relationship between AI and human creativity.
“Destructively scan all the books in the world sounds like something that a James Bond villain would be trying to do.”
– Will Oremus [05:42]
“What you’re describing here almost strikes me as like a reverse Noah’s Ark for books. Like you take one of each book… but instead of saving the book, it’s like the book is sort of sacrificed to the recycling gods in the pursuit of AI.”
– Martine Powers [09:57]
“In a way, we are saving this… we’re making sure that the content of these books… we are saving these books… for a world where AI does everything.”
– Will Oremus (speculating on tech industry perspective) [10:21]
“There’s stuff that’s written by bots for bots… the quality of any given Internet content is not very reliable.”
– Will Oremus [23:12]
“Authors should be proud of what they’ve produced… I think their objection… is that they’re not seeing any of that value.”
– Will Oremus [25:01]
“The idea of hoovering up all the world’s knowledge and putting it into AI models is just not something that the original copyright laws had explicitly anticipated.”
– Will Oremus [29:14]
The discussion is inquisitive, thought-provoking, and laced with both wit and unease. There’s a clear sense of awe at the scale and audacity of AI companies, skepticism about their ethical frameworks, and empathy for creators left out of the new digital value chain. Will Oremus presents complex legal and technical arguments in accessible, relatable language; Martine Powers injects metaphors and skepticism that make the stakes and strangeness of the story vivid to listeners.
This episode is essential listening for anyone interested in the intersection of technology, law, and the future of creative work — it’s a window into how AI is already reshaping the world’s literary and cultural legacy, one (destroyed) book at a time.