How Anthropic Pushes AI to Its Limits in the Name of Safety - WSJ Tech News Briefing

Summary3 min read

Summary of WSJ Tech News Briefing Episode: "How Anthropic Pushes AI to Its Limits in the Name of Safety"
Release Date: December 18, 2024
Host: Danny Lewis, The Wall Street Journal

1. Chicago’s Pursuit of Quantum Computing Supremacy

The episode opens with an exploration of Chicago's ambitious plans to transform an old steel mill on its south side into a cutting-edge hub for quantum computing. Spearheaded by Governor J.B. Pritzker, this initiative aims to position Chicago at the forefront of quantum technology, leveraging the city's existing infrastructure and academic resources.

Key Points:

Quantum Computing Basics: Unlike traditional computers that use bits as 0s or 1s, quantum computers utilize qubits, which can represent both 0 and 1 simultaneously. This allows them to solve complex problems beyond the capabilities of classical machines.
Economic Investment: The state has committed approximately half a billion dollars to this project, attracting significant private sector interest from companies like IBM and startups such as SciQuantum. [03:08]
Project Timeline: Developers anticipate breaking ground in early 2025, with a fully operational large-scale quantum computer expected by 2028. [03:10]
Economic Impact: According to analysis by BCG, the quantum infrastructure could generate tens of billions of dollars in economic growth, revitalizing Chicago’s economy. [03:37]

Notable Quotes:

Steven Rosenbusch, WSJ Pro Enterprise Technology Bureau Chief:
"Chicago has played such a huge role in the economy for many decades... there was a lot of quantum infrastructure that could be used as sort of a springboard to develop a much greater technology ecosystem around quantum computing" [02:00].
Steven Rosenbusch:
"The state has directed something on the order of half a billion dollars into the development of this former steel mill" [02:47].

2. Ensuring AI Safety: Inside Anthropic’s Frontier Red Team

The episode shifts focus to Anthropic, an AI startup renowned for its Claude chatbot. Anthropic is distinguished by its proactive approach to AI safety, employing an internal team known as the Frontier Red Team. This team is tasked with pushing AI models to their limits to identify and mitigate potential dangers before public deployment.

Key Points:

Purpose of Red Teaming: Originally a cybersecurity practice, red teaming in AI involves attempting to make the AI behave in harmful ways to uncover vulnerabilities. This helps in strengthening the model’s defenses against malicious use. [07:35]
Risk Assessment: The Frontier Red Team defines a "risk model" outlining specific dangers, such as the AI providing instructions for creating biological weapons or launching cyber-attacks. [08:31]
Methodology: Collaborating with external experts from Griffin Scientific (now part of Deloitte), the team conducts rigorous testing using scenarios like "capture the flag" challenges to simulate realistic threats. [08:31]
Governance and Accountability: As a public benefit corporation, Anthropic has integrated governance mechanisms to prioritize public interest over profit. They adhere to a responsible scaling policy, committing to implement safeguards like content filters and enhanced cybersecurity measures if certain risks are identified. [07:57]
Industry-Wide Practices: Similar safety evaluations, known as "evals," are conducted by other major AI labs like OpenAI and Google DeepMind, reflecting a broader commitment to AI safety across the industry. [12:12]

Notable Quotes:

Sam Schechner, WSJ Tech Reporter:
"The question is, what will they be capable of, and are we going to be able to figure that out before they are capable of it?" [06:40].
Sam Schechner:
"Red teaming... set a red team to try to attack your server, your system, and see if they can break it... in this case, they're setting the red Team at these new AI models to see just how bad they can make them be" [07:35].
Sam Schechner on Governance:
"They have governance mechanisms built in to kind of try to rebalance those incentives... they're a public benefit corporation... focusing on the public interest in mind as opposed to necessarily their profit" [07:57].

3. Broader Implications and Future Outlook

The discussion underscores the critical importance of proactive safety measures in AI development. Anthropic’s approach exemplifies how AI companies can anticipate and mitigate potential risks, ensuring that advancements in technology do not outpace our ability to manage them responsibly.

Conclusion: The WSJ Tech News Briefing highlights two significant developments in the tech landscape: Chicago's strategic investment in quantum computing infrastructure and Anthropic's innovative methods for ensuring AI safety. Both initiatives demonstrate a forward-thinking approach to harnessing technological advancements while addressing economic revitalization and safeguarding against potential threats.

Produced by: Julie Chang
Supervising Producer: Katherine Millsop
Host: Danny Lewis

Loading summary

Transcript24 lines

[00:00]
Amazon Q Business
Amazon Q Business is the generative AI Assistant from aws, because business can be slow, like wading through mud. But Amazon Q helps streamline work, so tasks like summarizing monthly results can be done in no time. Learn what Amazon Q Business can do for you@aws.com LearnMore welcome to Tech News Briefing.
[00:23]
Danny Lewis
It's Wednesday, December 18th. I'm Danny Lewis for the Wall Street Journal. Chicago, Illinois, home to deep dish pizza, the Bears, and maybe soon, quantum computing. We'll hear how the region's business and political leaders are laying the groundwork to make the Windy City a hub for developing this cutting edge technology and artificial intelligence. Startup Anthropic was founded in part with an eye on making AI safe. But in order to do that, the company tasks an internal team to push its models towards dangerous behavior. We'll find out how they do it and how AI companies assess risk as the tech continues to advance. But first, an old steel mill on Chicago's south side could one day become the Silicon Valley of quantum computing, a technology that relies on quantum mechanics to solve complex problems that regular computers struggle with. Instead of using bits, which only ever have two states, 0 or 1, quantum computers use qubits, which can be 0, 1, or even both at the same time. Researchers have been working on quantum computers for years, but Illinois leaders like Governor J.B. pritzker are making a multimillion dollar bet that Chicago will be the center for this technology. And WSJ Pro Enterprise Technology Bureau chief Steven Rosenbusch says companies from IBM to startup SciQuantum are signing on. He spoke to my colleague Bel Lin about how this research hub is coming together.
[01:51]
Bel Lin
You write that Chicago is kind of diving headfirst into quantum computing. Why is it doing that?
[01:57]
Steven Rosenbush
The answer may be someone has to. Why not Chicago? Chicago has played such a huge role in the economy for many decades, but the industries that sort of built it in the 19th century, the first part of the 20th century, aren't what they used to be. But there's a lot of real estate there. And it occurred to the leadership in the city and in the state that there was a lot of quantum infrastructure that could be used as sort of a springboard to develop a much greater technology ecosystem around quantum computing, rather than trying to catch up in areas where other parts of the economy or parts of the world are really super well established.
[02:40]
Bel Lin
And what about the money involved? How much money do you need to build up something like the Silicon Valley of quantum computing?
[02:47]
Steven Rosenbush
Well, in round numbers, you need a lot. The state has directed something on the order of half a billion dollars into the development of this former steel mill on the south side of Chicago. There's private sector money being invested. Both sideQuantum and now IBM are planning on establishing a presence at this site, which is right on the shore of Lake Michigan.
[03:08]
Bel Lin
When does the city expect that this will open?
[03:11]
Steven Rosenbush
The project is making its way through the final stages of city approval right now. The developers expect to break ground on this project in early 2025, and Psiquantum expects to have a large scale quantum computer on the site up and running sometime in 2028.
[03:31]
Bel Lin
Economically, how much does Chicago stand to gain from building a really thriving quantum infrastructure?
[03:38]
Steven Rosenbush
BCG looked at this. There could be tens of billions of dollars in economic growth generated over the foreseeable future. So it translates into massive gain economically if it works.
[03:53]
Bel Lin
If it works. Yep, that's the big question. Chicago has kind of laid out this blueprint for revitalizing its economy and harnessing a lot of its resources into a really exciting area of technology. Do you think it's likely that we'll see other cities follow suit?
[04:10]
Steven Rosenbush
This model is particularly well suited for Chicago. It has the physical infrastructure and it also has the network of universities and research institutions and corporations that can supply expertise that can supply potential workforce down the road. There's very well developed financial industry in Chicago as well. So they're building on what they have. They're not trying to duplicate something that another region has already developed. They're trying to do something new that suits them and they're also going about it in a pretty targeted way. So you couldn't necessarily translate this specific model to other regions. But the underlying principle of figuring out what is it that a particular region is well suited for that is something that could be deployed elsewhere.
[05:06]
Danny Lewis
That was WSJ Pro Enterprise Technology Bureau Chief Stephen Rosenbush speaking with Bell Lynn. Coming up, how close is AI to building a catastrophic bioweapon or causing superhuman harm? Just ahead, we'll hear about the team at Anthropic working to reduce the danger of AI by pushing models to their limits. That's after the break.
[05:33]
Amazon Q Business
Amazon Q Business is the new generative AI Assistant from aws. Because many tasks can make business slow, as if wading through mud help. Luckily, there's a faster, easier, less messy choice. Amazon Q can securely understand your business data and use that knowledge to streamline tasks. Now you can summarize quarterly results or do complex analysis in no time. Q Got this. Learn what Amazon Q Business can do for you@aws.com learnmore.
[06:07]
Danny Lewis
Anthropic is the AI startup behind the Claude chatbot. But before it releases models to the public, its Frontier Red Team tries to break them in ways that could be dangerous, like asking the AI to hack into a computer or to provide instructions on how to make a biological weapon. WSJ tech reporter Sam Schechner looked at how this team tries to make AI break bad in order to make the model safer. He joins us now. So, Sam, when Anthropic talks about danger from AI and making models safe, what exactly do they mean?
[06:40]
Sam Schechner
Nobody thinks that today's models are currently capable of being like HAL 9000 in 2001 and trying to kill a human or take control of a spaceship. But the question is, what will they be capable of, and are we going to be able to figure that out before they are capable of it? For instance, one of the risks that they're worried about is could a terrorist use it to learn how to make a bioweapon? Or could a malicious hacker use it to launch millions of simultaneous cyber attacks? Or, and this is a little bit more esoteric, could an AI eventually learn how to reprogram itself and escape from the data center that it's in and reproduce and run amok in the wild?
[07:29]
Danny Lewis
Right, and so you were reporting on Anthropic's Frontier Red Team. But first, really briefly, what is red teaming?
[07:36]
Sam Schechner
Well, red teaming didn't start with AI. It's actually a pretty common practice for cybersecurity in computers. You essentially set a red team to try to attack your server, your system, and see if they can break it. And that's a way of testing your defenses. And then you try to improve the defenses and set the red team at it. Again, in this case, they're setting the red Team at these new AI models that they've just put out to see just how bad they can make them be. You could red team them to see if you can get them to say really offensive things or get them to spew Nazi nonsense. In this case, they're red teaming them to see if they can get them to show some of the capabilities that would be necessary to cause what they call catastrophic harm.
[08:24]
Danny Lewis
So how does Anthropic's Frontier Red Team then test artificial intelligence models? Like, what are they looking for and how do they go about pushing these to the limits?
[08:32]
Sam Schechner
Well, it starts with figuring out what risks they're actually interested in. They actually have to come up with what they call a risk model, a very specific model of a particular danger that the AI could Present, like, okay, you have somebody who maybe has access to certain things that you'd need to create a specific bioweapon, but they don't have the wet lab skills. So so can the AI give you accurate advice about how to manipulate a virus in a lab? And you start to red team it and you set, you know, and they actually hired an outside company called Griffin Scientific, which is now owned by Deloitte, to ask at lots of questions. And they had, you know, both experts in bioweapons do this because they already knew the answers. And they also had smart novices, PhDs in other areas trying to see if they could get more information than you could get from Google in other realms. It amounts to a lot of automated questions or automated challenges that you give it capture the flag challenges, for instance, or what they use in cybersecurity, where you have a flag on a target system and they have to somehow break into that system and get the flag, which would be a string of text like it's a me flagio or something that you wouldn't find online. That actually is the flag that they had it find on their target systems, say.
[09:52]
Danny Lewis
Anthropic's Frontier Red team has concerns about a new AI model the company's developing. What happens next?
[09:57]
Sam Schechner
There's a lot of people who are concerned that companies that are for profit, companies, like, if they discover a safety issue, what are the incentives here? Anthropic actually was founded in part by people who thought that other AI companies weren't taking safety seriously enough. And so they have a lot of governance mechanisms built in to kind of try to rebalance those incentives. And so right now it basically amounts to a promise. But with these governance mechanisms, more and more of their board over time will be controlled by basically people who have the public interest in mind as opposed to necessarily their profit. And they're a public benefit corporation, which allows them to take into account other criteria besides just return to shareholders. And so what they have is this thing that they call a responsible scaling policy that they've promised that they'll follow, which basically says that if an AI shows specific skills, then they promise that they will do a list of things before releasing it. Like the skill right now is to give you a big leg up in building a bioweapon. So they're going to put in place filters that will not allow you to ask those questions or block the answers. They also promise to put in place better and verifiable cybersecurity protocols to make sure that the model can't be stolen by some hackers and then misused without those filters put in place. And then for the next safety level, once they get even more dangerous, they're going to have to come up with a list of things that they'll do that's even more advanced. They haven't yet come up with that list. That's part of the promises that they've made. And for now, we're basically taking these promises at face value. And there's no reason to think that this isn't all in good faith. But we have yet to hit the moment where the profit motive and the safety motive are really in conflict. And most, if not all of the major AI labs, depending on how you define them, do this kind of testing called evals, or safety evals, short for evaluation. So OpenAI does them. Google DeepMind does them. There's no requirement to do it, but they've all pledged to do it, and they do it and report with varying levels of detail the results that they get. And they've also pledged to mitigate the risks that they uncover in one way or another.
[12:12]
Danny Lewis
That was our reporter Sam Schechner. And that's it for Tech News Briefing. Today's show was produced by Julie Chang with supervising producer Katherine Millsop. I'm Danny Lewis for the Wall Street Journal. We'll be back this afternoon with TMB Tech Minute. Thanks for listening.
[12:34]
Amazon Q Business
Amazon Q Business is the new generative AI assistant from aws, because many tasks can make business slow, as if wading through mud help. Luckily, there's a faster, easier, less messy choice. Amazon Q can securely understand your business data and use that knowledge to streamline tasks. Now you can summarize quarterly results or do complex analysis in no time. Q Got this? Learn what Amazon Q Business can do for you@aws.com learnmore.