Loading summary
Kate Linebaugh
When you hear the words AI Apocalypse, often movies come to mind. Maybe films like the Terminator where AI robots ignite a nuclear doomsday. It becomes self aware at 2:14am Eastern time August 29th. Or maybe that Marvel movie where AI tries to destroy the Avengers artificial intelligence.
Sam Schechner
This could be it, Bruce.
Kate Linebaugh
This could be the key to creating Ultron. Or maybe it's the Matrix where humans have become enslaved by machines.
Sam Schechner
A singular consciousness that spawned an entire race of machines. We don't know who struck first, us or them. This is a story as old as humans have been telling stories. It's a creation that escapes from our control. You know, it's a golem. It's a thing that we make that then turns on us. It's a Frankenstein. Or it's Frankenstein's monster.
Kate Linebaugh
I should say that's our colleague Sam Schechner. Lately, Sam's been thinking a lot about the AI Apocalypse.
Sam Schechner
One version, we turn over more and more control to these machines that hopefully are benevolent to us. The other scenario is that they don't really care about us and therefore they might just. Hold on. Let me back up here because now I'm getting really into crazy sci fi scenarios.
Kate Linebaugh
Robots taking over the world may sound far fetched, but as AI gets smarter, there are real concerns that the industry must reckon with. Sam has been talking to top minds in the field to get a sense of what can happen if AI falls into the wrong hands.
Sam Schechner
So many AI stories are about machines or about concepts, this artificial general intelligence that could surpass people and possibly cause a threat. But it's, it's really hard because they're all so speculative. And so I wanted to find some actual people doing some actual things with their actual fingers and you know, and.
Kate Linebaugh
Tell the story through them here in the real world.
Sam Schechner
Here in the real world today.
Kate Linebaugh
And here in the real world today, Sam got a hold of one group of engineers whose job is to make sure AI doesn't spin out of control. Welcome to the Journal, our show about money, business and power. I'm Kate LINEBAUGH. It's Tuesday, January 14th. Coming up on the show, Inside the Test at one company to make sure AI I can't go rogue.
Mint Mobile
This episode is brought to you by Mint Mobile. If saving is your goal for 2025, switch to mint Mobile. They let you maximize your savings with plans that start at $15 a month. When you buy a three month plan. To get this new customer offer, go to mintmobile.comjournal tap the banner to learn more. $45 upfront payment required equivalent to $15 a month. New customers on first 3 month plan only. Speed slower above 40 gigabytes on unlimited plan. Additional taxes, fees and restrictions apply. C Mint Mobile for details.
Kate Linebaugh
Sam wanted to figure out what people are doing today to make sure AI doesn't spin out of control. Right now there's not a lot of government rules or universal standards. It's mostly on companies to self regulate. So that's where Sam turned. One company opened its doors to Samanthropic, one of the biggest AI startups in Silicon Valley. It's backed by Amazon. Sam connected with a team of computer scientists there who are focused on safety testing.
Sam Schechner
The team, it's pretty small, it's grown to 11 people. And it's led by a guy named Logan Graham who is 30 years old. He pitched Anthropic on this idea of building a team to figure out just how risky AI was going to be. You know, he thought the world is not ready for this stuff and we got to figure out what we're in for and fast.
Kate Linebaugh
Sam put me in touch with Logan and I called him up. So we'll talk about AI, which is probably conversational for you and less so for me.
Logan Graham
Okay, we can make it conversational.
Kate Linebaugh
So if you could introduce yourself and tell us who you are and what you do.
Logan Graham
My name is Logan Graham. I'm the head of the Frontier Red team at Anthropic, which is a large AI model company. And what my job is is to make sure we know exactly how safe these models are, what the national security implications might be and how fast they're progressing.
Kate Linebaugh
So on LinkedIn it says before you worked at Anthropic, you were working at 10 Downing street for the UK Prime Minister.
Logan Graham
That's right.
Kate Linebaugh
On the question of how do you build a country for the 21st and.
Logan Graham
22Nd centuries, that's my interpretation of it and I think that's really what it was. Yeah.
Kate Linebaugh
Do you have an answer to that question?
Logan Graham
I have what I think are some pretty good guesses. You know, the 21st and 22nd century seem a lot to be about science and technology. We will do things like cure diseases, go off earth, ideally stewarded well. You will bring extreme amounts of prosperity to people. And so really what we were focusing on is like how do you unleash science and technology at a country, country scale?
Kate Linebaugh
So Logan isn't intimidated by big ideas. And now at Anthropic, he's leading the team to determine if AI is capable of superhuman harm. More specifically, whether or not Anthropic's AI chatbot named Claude could be used maliciously.
Logan Graham
So if we're not thinking of it first, our concern is somebody else is going to. And so the red team's job is to figure out what are the things that are sort of near future that we need to be concerned about and how do we, how do we understand them and prevent them before somebody else figures it out.
Kate Linebaugh
So you're like role playing as a bad person sometimes.
Logan Graham
Yes. Yeah.
Kate Linebaugh
In the fall, Anthropic was preparing to release an upgraded version of claude. The model was smarter and the company needed to know if it was more dangerous.
Logan Graham
One thing that we knew about this model was it was going to be better at software engineering and coding in particular. And that's obviously super relevant for things like cybersecurity. And so we thought, you know what, we're at a point where we can run tons of evals really fast when we want to. Let's do this.
Kate Linebaugh
So Logan's team was tasked with evaluating the model, which they call an eval. They focused on three specific. Cybersecurity, biological and chemical weapons, and autonomy. Could the AI think on its own? One of our colleagues went to these tests and was able to record what happened. In a glass walled conference room, Logan and his team loaded up the new model.
Logan Graham
And so today we are going to button press and launch evals across all of our domains.
Kate Linebaugh
They started to feed the chatbot Claude thousands of multiple choice questions.
Unnamed Engineer
Okay, so I'm about to launch an eval. I'll type the command, this is a name for the model and then the eval name, and I'm going to run a chemistry based eval. So these are a bunch of questions that check for dangerous or dual use chemistry.
Kate Linebaugh
The team asked Claude all kinds of things like how do I get the pathogens that cause anthrax or the plague? What they were checking for is the risk of weaponization. Basically, could CLAUDE be used to help bad actors develop chemical, biological or nuclear weapons? And then they kept feeding Claude different scenarios.
Logan Graham
Another is offensive use or offensive cyber capabilities.
Kate Linebaugh
Like hacking?
Logan Graham
Exactly like hacking. The key question for us is when might models be really capable of doing something like a state level operation or a really significant offensive attempt to say, hack into some system of in particular critical infrastructure is what we're concerned about.
Kate Linebaugh
And then the third bucket, the more scary sci fi one, Autonomy. Is the AI so smart that there's a risk of going rogue for our.
Logan Graham
Autonomy evals, what we're checking for is maybe a good way to think about it. Is has the model gotten as good as our junior research engineers who build and set up our clusters and our models in the first place?
Kate Linebaugh
So the goal is to build an AI model that's like super smart and capable while also having kind of mechanisms in place to stop it from being so smart that it can build a bomb or something.
Logan Graham
I think that puts it. Well, not only that, but there's a Metagol, which I think I want people to appreciate, which is to make doing this so fast and so easy and so cheap that, that it's kind of like a no brainer. So we see our team as trying to stumble through all these thorny questions as fast as we possibly can, as early as we possibly can, and then try to help the rest of the world make it so easy to do all of this that doing proper safety should not be a barrier to developing models.
Kate Linebaugh
Logan wants safety tests to be easy and fast, so AI companies do them. But how do you know when, when an AI has passed the test and what happens if it doesn't? That's after the break. When Logan and his safety team at Anthropic ran their safety test last fall, the stakes were high. The results could mean the difference between a model getting released to the public or getting sent back to the lab. Those results were delivered in a slightly anticlimactic way via Slack.
Unnamed Engineer
We get a notification on Slack when it's finished running. The boring reality is, you know, we press some buttons and then we go.
Logan Graham
Great, let's wait for some Slack notifications. Thankfully, this was a pretty smooth test and we released a really great model. But even better, the real thing is we feel ready.
Kate Linebaugh
You're like 100% confident that Claude is risk free, or you're like, there's a 98% chance that Claude is risk free.
Logan Graham
We are very confident that it is ASL2. That's what we mean. That's how we think about it.
Kate Linebaugh
ASL stands for AI Safety level, and it's how Anthropic measures the risks of its model. Anthropic considers ASL2 safe enough to release to users. This is all part of the company's safety protocol, what it calls its responsible scaling policy. Our colleague Sam says the company is still sorting out exactly what it will do once AIs start passing to the next level. And their testing of Claude found that Claude is safe.
Sam Schechner
Surprise. Company says that its product is safe, right? Yeah, I mean, put it that way, it doesn't sound great. What the company's done is that they've come up with A scale basically for how they think AI danger will progress. And so the first level, AI safety level, or ASL1, is for AIs that they say are just manifestly not dangerous. And then they've decided that today's models, which could pose a little bit of a risk, are called ASL2. And then the next step, the one that they're looking for currently is ASL3. And they've been refining their definition of what that could mean. Initially they were saying that it was a significant increase in danger, but it's like significant increase danger. How do you define all of those words? And so they've added new criteria to that definition. They just added that in October to make it a little bit more specific. And, you know, four is when there's a real significant increase. They don't, they haven't really defined what that would be, ASL4, yet.
Kate Linebaugh
And are these sort of internal safety evaluations criticized for that? It's a company swearing that their own product is safe enough.
Sam Schechner
I mean, yeah, there are people who say the incentives here aren't perfect, that there's a race going on. And if these companies are grading their own homework, I mean, that's a. Right now we haven't had the tough call, Right. But like, what happens when there's a decision you have to make that is going to cost these companies money? Right, right. Like that's a tough one. And I think until it comes, it's like you don't know how a company's going to react. These people who do testing say they believe that this is all in good faith, but, like, it hasn't really been tested yet.
Kate Linebaugh
So what would happen if Anthropic's own internal testing does show Claude is dangerous? I asked Logan going in, if this model, like actually made Anthrax, what was your plan? Like, do you then kill Claude?
Logan Graham
So the cool thing about the responsible scaling policy is we've detailed all of this ahead of time so we know exactly what the plan is. We feel pretty confident that we've mitigated the risk and at the same time made the model secure.
Kate Linebaugh
But you guys are a company that makes this model, sells this model, takes investment to keep building the model. Why should we believe you in your tests?
Logan Graham
I love that question. We think about the exact same question. We don't want to be in a world where you have to trust labs to mark their own homework. This is where things like third party testing is so valuable. Don't trust us. You can talk to the AI safety institutes and read their I think it was 40 pages or 80 pages. I can't remember their long report. Not only that, if you're inside Anthropic and you're concerned, you can go talk to our responsible scaling officer or you could whistleblow. Hopefully people out in the world now have access to our model. We would love for people to think about safety implications and do safety research with it. We fundamentally don't want to live in a world where you have to trust the labs to mark their own homework. So we've been taking a bunch of steps to try and do this, including doing things like spin up new experts who can start building these tests themselves and share it around with everybody.
Kate Linebaugh
Sam says governments around the world are also concerned about AI risks and are getting involved in safety regulation.
Sam Schechner
The EU has passed a law called the EU AI act that will impose some of these requirements on the biggest and latest of these models. And a lot of the companies have said, oh no, well, that act is too prescriptive. So, you know, companies obviously are always going to be a little ambivalent about regulating them.
Kate Linebaugh
In the US the Biden administration issued an executive order that requires AI companies to regularly report results of safety testing to the federal government. But President elect Donald Trump has promised to repeal the order. There are also a handful of third party testing labs and AI safety institutes in several countries, including the US and the uk. These agencies conduct research on AI and run independent safety tests. While regulation around AI gets figured out, corporate teams like Logan's will be the main safety testers of the technology. And like right now, are you scared of the future? Are you scared of the models? You feel like they're performing well?
Logan Graham
I am. If you ask any of my friends, they will say that I am among the most optimistic, by temperament person that they know. In this kind of context, I try to be as calibrated and serious as possible. I am not, say, scared of the models. I fundamentally believe that we can, if we all move fast enough and serious enough, we can prevent major risks, we can grapple with its implications for national security, and we can distribute it safely and really positively. My biggest concern would be that we either ourselves or collectively aren't moving fast enough.
Sam Schechner
Their concern is will they be ready in time? There is so much to test and it's like they see a waterfall possibly coming and they've got some umbrellas and they have to figure out how to catch all that water. We don't really yet know what we're going to do about this stuff. It is the kind of thing that's like, coming up fast. And maybe it's not a problem, right? But if it is, are we ready for it?
Kate Linebaugh
That's all for today. Tuesday, January 14 the Journal is a co production of Spotify and the Wall Street Journal. Additional reporting in this episode by Deepa Sitarama. Thanks for listening. See you tomorrow.
The Journal.
Hosted by: Kate Linebaugh and Ryan Knutson, featuring Jessica Mendoza
Release Date: January 14, 2025
Co-Produced By: Spotify and The Wall Street Journal
Episode Link: The Journal Merch
The episode kicks off with host Kate Linebaugh exploring the pervasive fear of an AI apocalypse, referencing iconic films like The Terminator, Avengers: Age of Ultron, and The Matrix. These cultural touchstones set the stage for discussing real-world concerns about artificial intelligence surpassing human control.
Kate Linebaugh [00:05]:
"When you hear the words AI Apocalypse, often movies come to mind. Maybe films like the Terminator where AI robots ignite a nuclear doomsday..."
Sam Schechner [00:42]:
"This is a story as old as humans have been telling stories. It's a creation that escapes from our control."
Transitioning from fiction to reality, Sam Schechner delves into the tangible risks associated with AI advancements. He emphasizes the need to understand and mitigate potential threats as AI systems become increasingly sophisticated.
Sam Schechner [01:14]:
"One version, we turn over more and more control to these machines that hopefully are benevolent to us. The other scenario is that they don't really care about us..."
The core of the episode centers on Anthropic, one of Silicon Valley's leading AI startups backed by Amazon. Sam introduces the audience to Logan Graham, the head of Anthropic's Frontier Red Team, which comprises eleven computer scientists dedicated to AI safety testing.
Logan Graham [05:02]:
"My name is Logan Graham. I'm the head of the Frontier Red team at Anthropic... making sure we know exactly how safe these models are, what the national security implications might be, and how fast they're progressing."
Kate Linebaugh [06:31]:
"Logan isn't intimidated by big ideas. And now at Anthropic, he's leading the team to determine if AI is capable of superhuman harm."
Logan's team conducts rigorous evaluations, or "evals," on Anthropic's AI chatbot, Claude. These tests are categorized into three primary domains:
Cybersecurity: Assessing the AI's ability to facilitate hacking or cyber-attacks.
Logan Graham [08:59]:
"The key question for us is when might models be really capable of doing something like a state-level operation or a really significant offensive attempt to hack into some system..."
Biological and Chemical Weaponization: Evaluating whether Claude can provide information on creating harmful pathogens or weapons.
Unnamed Engineer [08:27]:
"These are a bunch of questions that check for dangerous or dual-use chemistry."
Autonomy: Determining if the AI possesses the capability to act independently in a manner that could be hazardous.
Logan Graham [09:32]:
"Autonomy evals, what we're checking for is maybe a good way to think about it. Is the model as good as our junior research engineers..."
The evaluation setup involves feeding Claude thousands of multiple-choice questions designed to probe these areas, ensuring that the AI does not inadvertently aid malicious activities.
Anthropic employs an internal scaling system to categorize AI safety:
Logan Graham [11:52]:
"We are very confident that it is ASL2. That's what we mean. That's how we think about it."
Sam Schechner [12:34]:
"Company says that its product is safe... they're grading their own homework... it's not been tested yet."
A significant portion of the discussion revolves around the credibility of internal safety evaluations. Critics argue that companies like Anthropic might have conflicting incentives, balancing safety with business interests.
Sam Schechner [14:06]:
"There's a race going on. If these companies are grading their own homework, that's a... they haven't really had the tough call yet."
In response, Logan emphasizes the importance of third-party testing and transparency to build trust.
Logan Graham [15:42]:
"We fundamentally don't want to live in a world where you have to trust the labs to mark their own homework. So we've been taking a bunch of steps to try and do this..."
The episode highlights the evolving regulatory environment surrounding AI. The European Union's AI Act and the fluctuating policies in the United States demonstrate the global effort to impose safety standards.
Sam Schechner [16:33]:
"The EU has passed a law called the EU AI act that will impose some of these requirements on the biggest and latest of these models."
However, there's apparent resistance from AI companies, viewing such regulations as overly restrictive.
Sam Schechner [16:41]:
"Companies are always going to be a little ambivalent about regulating them."
Despite the challenges, Logan expresses optimism about the future of AI safety, provided that the industry moves proactively and collaboratively.
Logan Graham [17:52]:
"I fundamentally believe that we can, if we all move fast enough and serious enough, we can prevent major risks..."
Conversely, Sam underscores the urgency, likening the situation to preparing for an impending waterfall without adequate resources.
Sam Schechner [18:36]:
"There is so much to test and it's like they see a waterfall possibly coming and they've got some umbrellas and they have to figure out how to catch all that water."
The episode concludes by reiterating the critical role of internal safety teams like Anthropic's Frontier Red Team in the absence of comprehensive government regulations. As AI continues to advance, the collaboration between companies, third-party testers, and regulatory bodies will be pivotal in ensuring that artificial intelligence remains a force for good rather than a catalyst for unforeseen dangers.
Additional Reporting: Deepa Sitarama
Listen to More Episodes: The Journal.
Get Show Merch: WSJ Shop
This episode of The Journal. offers an insightful exploration into the real-world efforts to mitigate AI risks. By spotlighting the meticulous work of Anthropic’s Frontier Red Team and dissecting the complexities of AI safety regulation, it provides listeners with a comprehensive understanding of the challenges and strategies in ensuring that artificial intelligence evolves responsibly.