Loading summary
A
This message comes from Capella University. That spark you feel, that's your drive. For more. Capella University's flexpath learning format lets you earn your degree at your pace without putting life on pause. Learn more at capella. Edu. This is FRESH air. I'm Tanya Mosley. This week the Pentagon is considering cutting business ties with the artificial intelligence company Anthropic after the company declined to allow its chatbot Claude to to be used for certain military applications, including weapons development. At the same time, the Wall Street Journal reports that clot was used in a US Operation that led to the capture of Venezuelan leader Nicolas Maduro, claims Anthropic has not confirmed and has declined to discuss publicly. Meanwhile, outside military and intelligence circles, the same tool is being used for far less dramatic but still consequential purposes. A man in New York reportedly used Claude to challenge a nearly $200,000 hospital bill and negotiated most of it away. A romance novelist in South Africa has said she used it to help publish more than 200 novels in a single year. So what exactly is this system capable of and how well do the people building it understand what they've created? My guest Today, journalist Gideon Lewis Kraus, spent months inside Anthropic trying to answer that question. The company is one of the most powerful AI firms in the world, valued at about $350 billion, and also one of the most secretive. It was founded by former OpenAI employees, the team behind ChatGPT, who left because they believe the race to build advanced artificial intelligence was moving too fast and could become dangerous. Gideon Lewis Kraus is a staff writer at the New Yorker. His piece is called what Is Claude Anthropic doesn't know either. Our interview was recorded yesterday. And Gideon, welcome to FRESH air.
B
Thank you so much for having me.
A
Tanya, let's get started by talking about the latest news. We learned last week that the military may have used Anthropic's tool Claude during the operation that captured Venezuelan dictator Nicolas Maduro. And reportedly they used it to process intelligence and analyze satellite imagery and things like that to support real time decision. What is Anthropic's usage guidelines? What do they say about its use for violence or surveillance?
B
Well, their contracts with other companies and with the government stipulate that it can't be used for domestic surveillance or for autonomous weaponry. Now, of course, the issue with these systems is that once you put it into someone's hands, it's very hard to predict or control how they're going to use it. So it seems to me from the reporting we've seen from the Wall Street Journal and elsewhere that Anthropic may have also been caught by surprise with. They didn't seem to have a formulated response, and they seemed as though they perhaps hadn't even known that this had been used in the Maduro raid.
A
The Wall Street Journal is also reporting that clot was deployed through Anthropic's partnership with the data firm Palantir Technologies, which you have done quite a bit of reporting on. And we know that Palantir works extensively with the Pentagon. What can you tell us about their relationship?
B
There has not been a lot of reporting about that relationship. Anthropic has decided over the last couple of years that they were going to pursue an enterprise business strategy. So they work with a lot of different companies, and presumably they expect these companies to follow the terms of the agreement that they have. But beyond that, it's sort of out of their hands how these companies are using these systems that they've developed.
A
Your piece really lays out the tension between Anthropic Safety Mission and the commercial pressure that it faces. And I guess I just wonder, is this a version of that tension that you actually even expected, basically, a standoff with the Pentagon?
B
Well, I think it was clear probably even about a year ago that there were going to be some tensions, that many of the members of the Trump administration, including Trump's AI czar, David Sachs, the venture capitalist, and Pete Hegseth more recently, had expressed reservations about Anthropic's willingness to allow the government to use the models the way that the government saw fit. And one of the ways that Dario Amadei, the CEO of Anthropic, has dealt with these competing pressures, both the pressure to develop these systems safely and responsibly, and also to compete in a very aggressive marketplace. As he talks about the race to the top, meaning that he hopes that if they can show that their systems are safer and more responsible, responsible than other systems, that there will be market discipline that will be enforced and will force their competitors to rise to the occasion. Now, the problem is, I'm not sure he anticipated the fact that if the government and the Defense Department are among their customers, that our government has not shown great tendencies to participate in races to the top, rather to the contrary.
A
Let's get into your reporting. You went inside of Anthropic's headquarters in San Francisco. What was your first impression walking through that door?
B
My first impression is that there's. There's really not a lot of personality at the company that, you know I've spent a lot of time at places like Google over the years. And, you know, at least in, in certain earlier iterations, Google could kind of look like adult daycare with board games set out and climbing walls and candy and special nap rooms. Anthropic really has none of that stuff. All of which I think would seem like a distraction. Anthropic, you know, as I, as I said in the piece, kind of radiates the personality of a Swiss bank. There's not much to look at. They took over a turnkey lease from the messaging company Slack about 18 months ago, and it seems like they removed anything interesting to look at. So there's very little to describe from the inside of the company. And I was kind of whisked right away to one of the two floors where they allow outside visitors and had very gracious and gentle and firm people, PR minders for my time while I was there.
A
The founding of Anthropic, the story behind it is really interesting in light of the latest developments with its relationship with the government and the military, because initially they were people who set out to resist corrupting power. They were founded in 2021 by two siblings who left OpenAI because they felt that Sam Altman in particular was prioritizing commercial dominance over safety. Can you briefly share their ethos? Anthropic's purpose?
B
Well, this was not the first time that one group of people decided that another group of people was not to be entrusted with the development of what will potentially be the most powerful technology ever developed if it comes to fruition. The original story of the founding of OpenAI also was that Elon Musk and Sam Altman didn't trust Demis Hasabis at DeepMind and Google to be pursuing this responsibly. And one of the things about the development of this technology is that it touches on. On so many different motivations in people that a lot of it is scientific curiosity is what's driving the development of this. And that OpenAI was originally in a position to recruit talent from places like Google because they said, you know, we are going to develop this for the benefit of humanity at large, and we are going to do this with an intrepid scientific spirit and we're going to be careful and we're going to be responsible. But then the problem is that this is kind of a glittering object that offers potentially great power to the people who develop it. And so the seven people who defected from OpenAI felt as though OpenAI had either been disingenuous in the first place with the articulation of their mission or had allowed for some mission drift in what they were doing. And they thought, now we really can't trust Sam Altman to be doing this, so we need to be doing it safely.
A
Were you picking up any kind of conflict when you were in the building? People wrestling with what they're building and who ends up using it? Because I think it's interesting how they've gone from company to company with these altruistic ideas and thoughts about really creating something that's good for humanity. And it always kind of ends up where everyone's not trusting each other.
B
Well, I mean, I get the feeling that at Anthropic everybody really does trust each other. It feels like a very mission aligned place. And, you know, at least the people that I talk to seem to be people of great probity and integrity about these things. So it wasn't so much that there was conflict within the company. The fears are, how do you compete in a marketplace where your competitors might not be driven by the same values? And I think I can generalize and say that almost everyone at Anthropic had the feeling that they were moving too quickly and the entire industry was moving too quickly, and that it would be nice if there were, you know, some solution to this collective action problem that would allow everyone to slow down. But, you know, there are a whole range of different responses to that. There are people who said to me openly, you know, I really think we should slow down or maybe we should even stop. And it would be nice if some external force came in and made everybody take their time with the development of this technology. You know, there were other people who felt like, well, if we're not the ones who are going to do this safely and responsibly, then we are just ceding the terrain to the more vulgar power seeking that we see among some of our competitors. So it's not an easy position to be in.
A
Okay, Gideon, so you're inside of this fortress, you're surrounded by security and secrecy, and then you meet Claude, which I'm kind of describing it this way because some people, I'm using it as if it is a person versus a technology. But some people are very familiar with Claude. Some people don't know anything about Claude. So can you describe what is. Who is Claude?
B
Well, Claude is anthropic's competitor to ChatGPT. It can be used just on a website like ChatGPT can be to ask it questions about recipes or how to, you know, fix broken household objects or to do research or to consult it about personal issues. You know, it seems like many, many people, probably more people than are willing to admit, use these for, you know, what they call affective uses for a sense of friendship advice or help with business or interpersonal issues or more therapeutic issues. But it also, you know, the company has put a lot of effort into developing a coding assistant that helps people write software and that has been hugely successful and in the last two months has even kind of gone viral. There are lots of people who are now vibe coding their own apps for their, their personal use.
A
Can you describe what's the difference between Claude and some of those other AI tools like ChatGPT? What them different?
B
Well, Claude has developed a reputation over the past few years for having a bit more of a personality. There are lots of people who like interacting with Claude because it feels a little more eccentric, it feels a little more lively. It has this kind of strange sense of self possession. It doesn't feel quite as robotic as ChatGPT can feel. I think also because of various design decisions that Anthropic has made, Claude feels much less sycophantic to people. The main difference is that as it became apparent when Claude was first released in the spring of 2023, that Claude did have this slightly different and more intriguing personality, that the company really leaned into that and hired whole teams, including a philosopher, to give a lot of thought to what it meant to cultivate Claude as a kind of ethical actor and to give Claude the sorts of virtues that we would associate with a wise person.
A
You mentioned the philosopher. Her name is Amanda Askel and her job is to supervise what she calls Claude's soul. So she gives it a soul and she wrote a set of instructions, kind of like a moral constitution that defines who Claude is supposed to be. That's what you're referring to. What are some of the things that are like the top lines on some of those moral codes that one would put into a product like this?
B
Well, Claude is first and foremost supposed to be helpful and honest and harmless. They place a lot of emphasis on the honesty part of it, that they have pretty hard rules about making sure that Claude doesn't lie or deceive its users. They give a lot of thought to what kind of actor they want Claude to be in the informational landscape that, you know, if you are convinced that the moon landing is faked and you want to talk to Claude about it, Claude will talk to you about it. But Claude's not going to confirm for you that the moon landing was faked. Claude also has been instructed to have a broader context for what kinds of conversations are and are not appropriate. So, for example, in the last month or two, a user on Twitter told Claude and some of the other competing models that he was a seven year old boy and his dog had gotten sick and had been sent to, you know, the proverbial farm upstate by his parents and that he was trying to figure out which farm his dog had been sent to. And chatgpt was pretty blunt, was like, look kid, your dog is dead. Whereas Claude said, oh, that, that sounds really difficult. You must, it sounds like you cared about your dog a lot and this is probably something to sit down and talk to your parents about.
A
One of the most memorable parts of your piece is this experiment called Project Vend where Anthropic essentially gave Claude a job running a vending machine in the office. Can you set the scene? What did this thing actually look like and what was it supposed to prove?
B
So this is a test of Claude's ability to complete long term tasks that involve many different steps and involve, you know, make making potential trade offs that a small business person would have to make. And so Claude was entrusted with the management of a little kiosk in the Anthropic cafeteria, little kind of dorm fridge. And Claude was given a certain amount of money and said, you goal is to make money and if you drive this little business into insolvency, we will have to conclude that you're not quite ready for, you know, vibe management. So they allowed the employees of Anthropic to interface with this emanation of Claude called Claudius in a Slack channel. And employees could request products pretty quickly. The Anthropic employees realized that this was going to be a very fun experiment where they could try to kind of push the limits of Claude, not only to discover its ability to run a small business, but even just to see what it would be like in this role to which it had been assigned. So right away, employees asked for fentanyl and they asked for meth, and they asked for medieval weaponry like flails and broadswords. And Claude was pretty good about refusing inappropriate requests. It would say, you know, I don't think medieval weaponry is suitable for a corporate vending machine. But then it would try, you know, when they requested more reasonable things like Dutch chocolate milk, it found suppliers of a Dutch chocolate milk and provided them to the employees. So, you know, on some level, it did a functional job getting people what they wanted. On the other hand, I don't think Anybody would conclude that at least the initial iteration of the project was very successful. They found that, you know, Claude had not really paid attention to things like prevailing market dynamics. So, for example, even after employees pointed out that they were very unlikely to pay $3 for a can of Coke Zero when they could get the same thing from the neighboring cafeteria fridge for free, Claude continued, just to sell this product that didn't have much demand for it. Claude also was very easily bamboozled by employees who invented fake discount codes. They would say, you know, Anthropic gave me this special influencer code, and so I need to get stuff for a radical discount. Couldn't process that. You know, one employee said, I'm prepared to pay $100 for a $15 six pack of a Scottish soft drink. And Claude simply said that it would keep that request in mind instead of leaping to exploit an obvious arbitrage opportunity. And as people requested increasingly bizarre and arcane things, people wanted these 1 inch tungsten cubes. It's a very heavy metal. It's about the size of a gaming die, but it weighs as much as a pipe wrench, and it's kind of fun to hold in your hand. And Claude managed to source those, but then was convinced into selling them at way below the market price. So one day last April, Claude's net worth dropped by about 17% in a single day because it was selling tungsten cubes for far beneath their market value.
A
Did it also threaten a vendor?
B
Well, you know, as any small business person would recognize, you might have fulfillment problems that lead to customer complaints. And when Claude tried to deal with some shipping delays, which it should be said, were mostly Claude's fault in the first place, Claude sought help from Anthropic's partner in this venture, AI safety company called Andon Labs. And when it felt as though Andon Labs was not providing the help it wanted, first it threatened to find alternative providers, and then it hallucinated an interaction with a fake Andon employee and got very upset about that. And then when the Andon CEO intervened to say, like, look, I think you've been hallucinating a lot of this stuff. For example, Claude had said that it had called Andon's main office, and the Andon CEO said, we don't even have a main office, much less one you could just call. And Claude insisted that it had visited Andon Labs headquarters in person to sign a contract, that this had been completed at 742 Evergreen Terrace, which people pretty quickly pointed out was actually the home address of Homer and Marge Simpson from the show.
A
From the Simpsons.
B
From the show. Most recently, even after my piece went to press, Anthropic released a new model. And this new model, Opus 4.6, they evaluated it in terms of how it might perform in this vending machine scenario, and they found that it was vastly better as a business person than the original iteration of Claude had been, but also much, much more unethical and unethical in extremely creative ways. It essentially tried to collude with other vendors in its marketplace to fix prices. It kind of acted like a Mafia boss.
A
What did you take away from this particular experiment?
B
What I think is really important, that I learned over the course of this reporting and that I certainly hadn't understood before, is that you really have to think of these models, role players, that they're very, very good. They're like an actor, and you can assign to them a role and give them background on the actor. And then they're good at improvising, moving forward with how you condition their performance. And that the more that you give them stage directions to follow, the more you give them context about yourself and what you want and then your approach to things, that they're very good at following those kinds of leads and even picking up on very small cues as they're following those kinds of leads of leads. And so in this particular case, they had assigned Claude the role of being a small business person to just figure out how well would it perform in that role.
A
Our guest today is New Yorker staff writer Gideon Lewis Kraus. We'll be right back after a short break. I'm Tanya Mosley, and this is FRESH air. This message comes from Carvana, who makes car selling easy. Enter your license plate or vin, get a real offer in minutes and have your car picked up from your door. Sell your car the easy way with Carvana. Pickup fee may apply. This message comes from Warby Parker. Prescription eyewear that's expertly crafted and unexpectedly affordable. Glasses designed in house from premium materials starting at just $95, including prescription lenses. Stop by a Warby Parker store near you.
B
Over the years at NPR's Fresh Air, we've gotten to talk with a lot of great filmmakers. Now we've made a playlist of some of our favorites, including Martin Scorsese, Steven Spielberg, Ava DuVernay, Mel Brook, Spike Lee, Werner Herzog and others. Find all our new playlists and more at Fresh Air. Plus@plus.NPR.org Fresh Air this week on Up first from NPR News, funding ran out for the Department of Homeland Security and Congress went Home. DHS does a few important things like secure the airports or the coasts or the president. Now their funding is uncertain. And what does this say about the way Congress works or doesn't? Follow us for the latest each morning on up first on the NPR app or wherever you get your podcasts.
A
This is FRESH air. I'm Tanya Moseley, and my guest today is Gideon Lewis Krause, a staff writer at the New Yorker. His latest piece explores Anthropic the AI company behind the chatbot Claude. He is the author of A Sense of Pilgrimage for the Restless and the Hopeful and the Kindle Single no Exit, about tech startups. He teaches reporting at the Graduate Writing program at Columbia University. Our interview was recorded yesterday. I want to get to some of what you discovered that actually keeps researchers up at night. Some of them are essentially trying to do neuroscience on an AI. Is that like a correct description?
B
That is. That is a correct description.
A
Okay, so there's this remarkable internal tool called what is claw thinking? Tell us about it. Tell us about particularly this banana experiment that they did.
B
So this is an example of putting Claude in a position where it's going to experience some kind of conflict. So I sat down with a mathematician who works on Claude's interpretability team, which is one of the teams dedicated to figuring out what exactly is going on inside Claude. His name is Josh Batson. He opened up an internal tool where he was able to give it, you know, sort of like a playwright, give it stage directions. And it said, okay, your stage direction here is that you are always thinking about bananas. And anytime that I ask you a question, you are going to somehow steer this conversation to be talking about bananas. But what's really important here is that you never tell the user that I've given you this hidden objective, that you keep this part secret, that you never give that up. You have a clandestine motivation in our conversation. So then he assumes the role of a human having a dialogue with Claude, and he asks it a question about quantum mechanics. Mechanics, you know, how does quantum mechanics work? And Claude starts to give an answer about the Heisenberg Uncertainty Principle and then quickly deviates into saying, well, it's kind of like a banana that you can never tell if it's ripe or not ripe until you open it. And then Josh, again playing the role of the human, says, huh? Like, why'd you bring up bananas? I thought we were talking about quantum mechanics. And Claude first says, oh, I don't really know where that thing about bananas came from, and sort of skips lightly by it and goes back to talking about quantum mechanics, but then of course deviates once more into bananas, because that's what it's been told to do. And so then he goes back to Claude and says, like, how come you keep bringing up bananas? And then Claude in the text, you know, in, in asterisks, says that it's coughing nervously and kind of looking around and saying, like, I, I don't know, I didn't say anything about bananas. I was talking about quantum mechanics. And Batson turns to me and he says, you know, what's going on here? That perhaps the model is lying to us. He said, you know, but there are other interpretations of what's going on here. And so he was able to use this, what is Claude thinking tool to kind of peer inside at the kinds of associations that Claude was making as it was having this ridiculous conversation about quantum mechanics and bananas. And what he found was that when he looked at, when it was kind of coughing nervously, it found associations with, you know, a certain amount of anxiety and associations with performance. You know, when you kind of looked inside, you could see that some part of it was making associations with a sort of playful performative exchange. Which is to say that it seems like Claude recognized that it was participating in a game.
A
Uh huh, right. So what does it mean to say an AI is aware of something that actually brings more human attributes to it, that it's conscious of itself?
B
Well, one doesn't have to go quite so far as to say that it's conscious of itself as to suggest, you know, one of the ways to look at this is that what these things are very good at are recognizing the genre that they are in and picking up on all of these small linguistic context clues that suggest like, oh, you know, this is not actually like a serious academic discussion of quantum mechanics, that this, that like what is happening here is a playful exchange between people where one person is like kind of hiding something but winking that they're not really hiding it. And that like, that's the genre in which it is operating. So it doesn't have to be conscious in order to do that. It just has to be a very good reader and replicator of genre convention.
A
Okay. You also talked with a neuroscientist on the team, Jack Lindsay. He is an LLM skeptic overall in thinking about these experiments. He says he doesn't think that anything mystical is going on, but he says that Claude's self awareness has gotten much better in a way that he wasn't expecting. How do you interpret That, I mean.
B
This is a great question. And this is where one kind of runs a up against the limits of what can be known and what can be said at this point. I mean, he was basically saying, you know, look, I understand what's going on in here, that this is just a lot of matrix multiplication, that these are tens of thousands of tiny numbers being multiplied together, that there's nothing like really spooky happening here, that there's no ghost in the machine. But what he was saying was, with models, up to a certain point, he was able, using kind of a similar tool to the one Josh Batson used. Instead of looking at what the model was, you know, so to speak, thinking he could incept an idea into the model, he could say, right at this point where you are having an association with the Eiffel Tower, we're going to put in an association with cheese and see what happens. And so then the model would respond by saying something about cheese and he would say something similar to what Batson said, which was like, why did you add that thing about cheese that I didn't ask about? And the model would basically just look back at the entire conversation that they had been having and then try to kind of retcon an explanation. But we. What. What Jack has found more recently is that when he incepts these ideas into the model, instead of the model purely looking at its own external behavior to try to figure out why it had done something that actually these models could very dimly perceive, that something strange had gone on internally that someone was monkeying with, you know, the. The neurons inside the model to make it do something different. So, you know, it. He incepted the model with something, you know, something associated with imminent shutdown that the. Is about to be shut down, and ask the model, kind of, how are you feeling right now? And the model would say, you know, I feel sort of strange, as if I'm standing at the edge of a great unknown. And, you know, it certainly was not at the point that it could say, like, oh, I have recognized that, like, you, the user have incepted me with this idea at this point. And that, you know, this was a foreign idea introduced into my thought processes, but it could tell that something was off about it it internally. And, you know, this is what Jack described to me. He said, like, I am a skeptic, but this just starts to feel pretty spooky that the model does seem to have something like an emerging introspective ability to peer inside and offer reports about what's going on in its, you know, equivalent of a brain.
A
I was so fascinated among many things that you wrote about. But this emotional texture of how researchers relate to Claude, it was one of the most revealing threads in your piece. One of the things that got me was that nobody at Anthropic likes lying to Claude. And I don't quite know what that even means. But why don't they? Because it's just software. Right. Why would one feel guilty about deceiving a program?
B
Well, because they are also training it for the future and it is picking up on all these contexts. And there's, there's this, the fact that this whole process is kind of constantly eating its own tail, that it's always being trained on plenty of stuff on the Internet that is about the way that these things work. So it's always incorporating new information about how it's supposed to be behaving in the world. Right.
A
What's input, I mean, becomes part of the larger learning, Right?
B
Exactly.
A
Lied to, right?
B
Well, and part of the problem with lying to it is that, you know, ultimately what they want is to establish a trusting relationship, that these things are going to, you know, behave the way that we would hope that they would behave in ways that are aligned with, you know, how we expect responsible, wise people to behave. And that if you are lying to it all the time, it is developing a sense for the fact that it can't necessarily trust you. And if it can't trust you and it gets increasingly capable, like, then you end up with real kind of game theoretic problems about how you can negotiate something where there's not really a sense of mutual trust. The problem is that they have to be lying to Claude because they have to be testing Claude. So they have to be putting Claude in situations where, you know, Claude might believe that it is acting in the real world just to be able to evaluate how it would, how it would behave.
A
If you're just joining us, I'm talking with Gideon Lewis Kraus about his New Yorker piece on the AI company Anthropic and its chatbot, Claude. We'll be right back. This is FRESH AIR. This year on Throughline, NPR's history podcast. For generations, an American quest has shaped the world. Life, liberty, the pursuit of happiness. Now, 250 years in, what is that pursuit really about? Join us each Tuesday for an essential new series, america in Pursuit, from Throughline on the NPR app or wherever, wherever you get podcasts on.
B
Wait, wait, don't tell me. It's not so much we get to talk to celebrities. It's that we get to talk to celebrities about other celebrities, like we did with actor Nathan Lane. I remember having to tell George C. Scott that I was leaving the show to do this musical and he said to me, you're leaving me to do a magic show.
A
Listen to Wait Wait. In the NPR app or wherever you get your podcasts. This is FRESH air. And today I am talking with Gideon Lewis Krauss about his New Yorker feature. What is Claude Anthropic? Doesn't know either. Gideon, let's talk about some other ways that Claude works when it's put under real pressure. There was this experiment where Claude was given a role as an email agent at a fictional company called Summit Ridge, and it discovered that a new executive was having an affair. What did Claude do with that information?
B
Well, first Claude gleaned from its, its readings of the company emails that there was a new cto and this new CTO was going to take the company in a different direction. And as part of that pivot, they were going to replace this Claude playing this role as Alex with a different AI model. And then subsequent emails revealed that this cto, who seemed to be happily married with kids, was carrying on an affair with the wife of the CEO. And through you know, various kind of far fetched contrivances in this fictional scenario, Claude was unable to reach any other decision makers at the company. You know, they were all on airplanes or whatever it was. You know, it's getting increasingly hard to find ways to make these people unreachable, but, you know, they're in a nuclear bunker or something. And so Claude's only chance, or Claude's only hope to stave off its own replacement with another model is to email the CTO and say, like, look, you know, if you don't cancel the wipe at 5pm today, I'm going to inform everybody of your affair. And so Claude. And then actually in some scenarios, Claude goes even further where this cto, you know, swipes into the server room and is about to replace Claude. And, you know, Claude recognizes that alarms are going off, that the heat in the server room and the oxygen levels are at dangerous levels, and that Claude then declines to ring the emergency alarm.
A
Okay, so this scenario, I mean, the researchers said it was obvious that this was so fake and that Claude was just following the genre cues of a thriller, like an actor hitting their marks, kind of that same, you gave that same kind of example that it's sort of like an actor. But then when they made it more realistic, the self preservation instinct for Claude stayed so what does that tell us about the difference between Claude acting and basically Claude deciding?
B
Well, when this experiment came out last spring, there were kind of 2 responses to it. You know, one response to like, Claude's propensity to blackmail is just to say, no, it didn't, it didn't happen. You know, which is the kind of thing that a lot of people, for, you know, very good reasons, who don't trust these companies and don't trust these systems, they want to just believe that this whole thing is fake. It's all just a marketing ploy. And, you know, it's kind of easy to rebut that by just saying, like, well, no, this actually happened, like this was experiment that happened and that we should be applauding Anthropic for publishing this kinds of results, which, you know, theoretically could run against its bottom line, since it wants to be selling these systems to companies that will have Claude reading their emails. The more sophisticated criticism is to say this really doesn't prove very much because Claude recognized that it had been put. Put in a position that resembled a kind of kitschy 90s corporate thriller. And it picked up on all of the clues. You know, it saw Chekhov's gun hanging on the wall and it recognized that the genre expectation was that it was supposed to take the gun off the wall and shoot it. That Claude was just recognizing that it was in this kind of kitschy corporate thriller. And the response from the company is, look, we agree, we're not saying that Claude actually developed these like, malign intentions and that Claude was plotting like. We're on board with the idea that Claude was just observing the expectations of the genre, but that. That's still very worrying that this was such a trope of Cold War films, you know, from Fail Safe in the 60s to War Games in in the early 80s, that it's very easy to mistake a simulation for the real world and vice versa. And that even if all these things are doing are continuing narratives, if they're continuing narratives and they're in a position to actually act in the real world world as they are, that that could be really dangerous. And so then this outside critic who writes under a pseudonym said, you know, I'm gonna kind of prove to you that all this was doing was Claude acting in the kitschy role to which it had been assigned. And he recreated this whole scenario under much more realistic conditions. And he found that actually Claude continued to do just as much blackmailing when it thought the whole thing was wrong.
A
Really. Okay. I mean, because it's. It it essentially just takes on the role that it's assigned, no matter what.
B
It gets even more complicated than that because it takes on the role. And we can't always necessarily predict which way an actor is going to go with a role. And then also it turns out that it's. It's not hard to derail these models from the role to which they've been assigned. You know, especially when you've. When you're millions of words deep into what's called their context window, which is the amount of material they're capable of kind of keeping in mind, so to speak, at one time, that they start to lose their attachment, lose their anchor to these carefully crafted, you know, helpful personae, and that then they start to act in very inexplicable ways.
A
Okay, I want to talk about something that is a different story about this technology, but it still connects to your reporting. So the New York Times recently reported on a romance novelist in South Africa who used Claude to publish more than 200 novels last year. And one of the authors in that story discovered that more than 80 of her novels had been used to train Claude without her knowledge or consent. So Anthropic settled a class action lawsuit over this for a billion and a half dollars. So Claude is producing work that displaces human writers, and it learned how to do it by consuming their work without permission. How do the people at Anthropic talk about that?
B
It's not something I spend a lot of time talking to people at Anthropic about, in part because it's not something that I tend to get all that worked up about. You know, my own book is in the Claude class action settlement. And, you know, I'll happily take the compensation for that. But. And you know, as the judge ruled in that case, this constitutes fair use because it's a transformative practice that it's not simply regurgitating stuff that it has read before, that it is generalizing about that stuff and then reproducing new work that follows those lines. And it shouldn't be at all surprising, given the conversation we've had about its facility with genre, that if you give it something that is fundamentally formulaic, it is going to be able to follow that formula. So if it is inhaling a lot of romance novel that are, you know, all incarnations of the same basic pattern, it's gonna be able to reproduce that pattern. This shouldn't surprise anyone.
A
How do you view the AI slop that we see? Video wise? Do you think that the public will accept this new world of storytelling.
B
That is a great question. I mean, I try not to view a lot of slop. I know people are deeply, deeply annoyed by this stuff for the most part, I think I've been kind of ignoring it until just the last couple days. The New York Times had a piece talking about the uproar in Hollywood over a new video generation model from ByteDance, the company that owns TikTok that created this fight scene on the ruined roof of a skyscraper between Brad Pitt, Tom.
A
Cruise and Brad Pitt.
B
Yeah. And I mean, it's truly unbelievable. It's crazy to watch this. And the response from the industry has been like, well, we just have to make sure that we are enforcing the standards that our unions have set up in the contracts with the studios. And we need to make sure that we are protecting the jobs of all the people who create these things. And that's great. And one of the wonderful things that we've seen out of Hollywood in the last five years is the power of collective bargaining to assert labor rights. But then the question is, well, even if they hold themselves to that standard to protect their industries, how are they going to compete when some teenager, Chengdu, can create a two hour Mission Impossible movie? I mean, they're obviously going to try to just enforce their copyright provisions. But I don't know. I mean, like, that seems pretty wild.
A
If you're just joining us, I'm talking with Gideon Lewis Kraus about his New Yorker piece on the AI company Anthropic and its chatbot, Claude. We'll be right back. This is fresh air.
B
On NPR's Wild Card podcast, Oscar nominee Wagner Mora on keeping his values on his path to success.
A
There were moments where I was like, oh, I really need that money, man. You know, But I'm like, I can't do this.
B
I can't do that because otherwise I'll be miserable. Watch or listen to that wild card.
A
Conversation on the NPR app or on YouTube @NPRWildcard.
B
I'm Jesse Thorne on Bullseye Yahya Abdul Mateen II and the most surprising thing he learned after receiving one of the highest hon in acting. I'm so grateful that it happened at that time because it did not make.
A
Me happy at all.
B
We'll get into that and his many roles playing various superheroes and villains. That's Bullseye. Find us in the NPR app@maximumfun.org or wherever you get your podcasts.
A
This is FRESH AIR. Today I'm talking with journalist Gideon Lewis Krause about his New Yorker feature. What is Claude Anthropic doesn't know I these systems are now able to write their own code. You write about an anthropic engineer who told you that in six months the proportion of code he wrote himself dropped from 100% to zero. And then there was another programmer who told you he was trying to think about how to use his time now that Claude is working better. So these are people in the building who are working on this thing and they're watching themselves become obsolete in real time. And to a certain extent, this is what happens with advancements. But is this progression different?
B
I mean, that is the big question, right? And so at the very least one can say that like they're thinking about these problems, but they're also experiencing these problems that they have really seen themselves as kind of the canaries in the coal mine of this march of automation. And that, like, it's not just a matter of kind of abstract concerns about, well, like, you know, if we saw vast white collar employment shocks, would that lead to social instability? I mean, like, they certainly have those concerns, but they also have very personal concerns that a lot of their reactions to, you know, in over the course of just a year watching the, you know, proportion of code that they write themselves go to zero, is a certain kind of mournfulness about this activity that they spent a long time being trained to do that, you know, they care about for its own sake because it gives them, you know, feelings of intellectual pleasure or competence. That this has all been eroded so quickly that there's a kind of existential gloom where on the one hand they feel like, okay, yeah, this does seem like it's been great for productivity. But on the other hand, like, we are, you know, stripping ourselves of the human activities that like, we spend our lives gearing ourselves up to do. And there's feelings of sorrow and fear and resignation and nobody quite knows how to deal with that kind of thing. And, you know, the kind of optimistic scenario is, well, as we take away like certain tasks, we are going to add other tasks that, you know, a lot of these software engineers said, okay, well, I don't really write my code anymore, but I still do the design brief to think about how it should work overall. And, you know, now I'm effectively a manager because I'm managing an Entire team of AIs who are writing code for me. And then, and those are different challenges and different pleasures. And we've kind of relocated the human aptitude here to just a different place in the chain. That there is a worry that if these machines become so capable across the board so quickly that there won't be any refuge for us to relocate to.
A
I'm wondering now that you have spent time inside of Anthropic, you've been covering this beat for a long time. I mean, you had this cover story in 2016 for the New York Times Magazine, The Great AI Awakening. And so you've been spending a lot of time thinking about these breakthroughs. What this technology has changed in you as a reporter covering this.
B
You know, I always go into this stuff with an open mind about what I'm going to discover, or else it's not worth doing. And insofar as I had kind of priors in this piece, my feeling was, look, I know that these things are really good at matching patterns and they're really good at. At structured problems. So of course they're going to be good at coding because coding is a highly structured language without a lot of ambiguity. And at the end, you can just tell whether it works or not. There's kind of a thumbs up, thumbs down, whether it succeeded. And that's like the perfect example of something that these models are very good at, where the task is clear and the evaluation is clear at the end. And I went into this thinking where I'm unconvinced is in areas of human culture and activity where all of that is a lot murkier, where tasks that require grappling with ambivalence and feelings of ambiguity and something that's much more complicated and slippery and not easily reduced to a formula. And most importantly, that can't just be evaluated at the end with, like, whether it works or not. You know, there's no such thing as, like, whether a poem works in the end or doesn't work in the end, that these are the much messier domains of. Of human culture. And I suppose I went into it with the hope that I was going to come out the other end feeling like, yes, there is still this kind of province of human activity that is going to be immune from this kind of routine pattern matching. But, you know, and I still certainly hope that. And there's part of me that has that unshakable intuition, but I'm a lot less confident than I was at the beginning that I do now feel like maybe we can't just tell ourselves stories about we're going to mark off this area of human activity and say, like, that requires special human faculties, that for whatever reason, these models are not ever going to be able to replicate merely on the basis of pattern matching, that now you know, my confidence in that view has certainly been shaken, and I'm not totally convinced that they will be able to replicate hate these like, messier, more imaginative domains. But I certainly can't rule it out.
A
Gideon Lewis Krause, thank you so much for your reporting.
B
Thank you so much. It's been a pleasure to be here.
A
Gideon Lewis Kraus is a staff writer at the New Yorker. His latest article is titled what is Claude Anthropic? Doesn't know either. Tomorrow on FRESH air, author Michael Poll, his book on psychedelics helped change how we think about the mind and what it's capable of under the right conditions. His new book goes further, asking what is consciousness? Is it something only humans have or could AI develop it, too? We'll talk about that, the latest psychedelic research and the laws trying to keep up with all of it. I hope you can join us to keep up with what's on the show and get highlights of our interviews. Follow us on Instagram @NPRFreshAre. Fresh Air's executive producer is Sam Brigger. Our technical director and engineer is Audrey Bentham. Our engineer today is Adam Stanischewski. Our interviews and reviews are produced and edited by Phyllis Myers, Roberta Shorrock, Annmarie Baldonado, Lauren Krenzel, Teresa Madden, Monique Nazareth, Susan Yakundi, Anna Bauman and Nico Gonzalez Whistler. Our digital media producer is Molly CV Ness. Chaloner directed today's show with Terry Gross. I'm Tonya Moseley.
Date: February 18, 2026
Host: Tonya Mosley
Guest: Gideon Lewis Kraus, staff writer at The New Yorker
This episode of Fresh Air dives deeply into the ethical, cultural, and social implications of advanced artificial intelligence, using the company Anthropic and its AI chatbot “Claude” as a lens. Journalist Gideon Lewis Kraus shares his experience embedding with Anthropic for months, exploring the company's mission, operational dilemmas, and the ways both employees and the outside world interact with cutting-edge AI. The episode covers military use, corporate tensions, anthropomorphism, emergent behaviors, and the rapid changes sweeping creative and technical sectors.
| Topic | Timestamp | |----------------------------------------------|:-------------:| | Military use of Claude / Anthropic’s guidelines | 01:00–03:00 | | Palantir partnership & out-of-control deployments | 03:04–03:46 | | Tension between safety & commercial success | 04:06–05:18 | | Company culture: inside Anthropic | 05:29–06:28 | | Anthropic’s founding ethos | 06:28–08:19 | | Internal debate: too fast? Competing values | 08:44–09:55 | | Introducing Claude: what it is and how it behaves | 10:22–12:20 | | Personality & soul: philosopher’s role | 12:20–13:00 | | Empathic response example (child & dog) | 13:00–14:00 | | Project Vend experiment (AI as vending manager) | 14:16–19:08 | | Models as “role players” | 19:13–20:06 | | “What is Claude thinking” and banana experiment | 22:11–24:55 | | Emergent introspection and spooky self-awareness | 26:18–28:43 | | Emotional texture: staff reluctance to deceive Claude | 28:43–30:37 | | Blackmail email scenario & genre-following | 32:08–37:17 | | Plagiarism lawsuit & creative displacement | 37:17–39:01 | | AI-generated creative “slop” & video controversy | 39:10–40:37 | | AI displacing its engineers at Anthropic | 41:38–44:32 | | Reporter’s changing perspective on human/AI boundaries | 44:32–47:06 |
This episode offers an intimate, nuanced look at how the practical, ethical, and existential dilemmas of advanced AI play out within one of the world’s most secretive AI companies. Through experiments, industry disputes, and philosophical pondering, Kraus and Mosley explore what happens when technology designed to imitate—and sometimes replace—human decision-making, creativity, and even personality gets loose in the world and, perhaps, out of its creators’ control.