
Loading summary
A
They like to say publicly, we're building tools, not replacements, but behind closed doors, they are very much talking to investors about replacing and automating large fractions of the human workforce. There's no other way that the numbers make sense for the level of investment that is happening in AI. It will feel like, oh, I'm getting empowered. I'm getting more and more empowered. I'm getting more and more empowered. Oh, and then I'm replaced. What are the investors going to invest in? They're going to invest in things that are going to pay off giant profits. Where do those come from? Well, they come from replacing humans. The idea that you're going to create something that is vastly smarter than you operates at a vastly faster timescale, at levels of comprehension and thinking that you have no access to and that you are going to control that and meaningfully know what it's doing and tell it what to do and have it say, yes, I will do exactly what you want is questionable at best. There are all kinds of things that AI can do for us that we actually want. And so just saying we want to slow down or stop AI in general is totally, totally throwing the baby out with the bathwater. That's not what we want. We don't want to run a suicide race between the US and China. We want to run some other race.
B
Welcome to the Future of Life Institute podcast. I'm here with Anthony Aguirre, the CEO of the Future of Life Institute. Anthony, welcome to the show.
A
Thanks. It's great to be here again.
B
Great. All right. We are talking about a Better Path for AI, your new essay series. It's available at BetterPath4AI. And that's BetterPath4AI. And it's also in the description. So what you do in this, in this essay series is talk about different AI races. So let's begin there. Let's you tell us about which races are happening. The four different races.
A
Yeah. So I think when we. When we were first starting to think about AI years ago, one of the things that was. Was most feared was that we would have a race condition, that the people trying to develop these systems would be in a competition with each other, maybe internationally, and that this would lead to corner cutting, to put it mildly, on the sort of safety measures that we would actually need if AI was a dangerous technology. And I would say that sort of some years later, here we are. And we're not just in a little bit of a race. We're in multiple races with sort of full intensity. So the way I break it down is that we have some races that we're familiar with and then some sort of new ones we've been familiar for a while with a race for attention. So social media has been a, has kind of created and our whole online advertising based ecosystem has created this kind of attention maximizing race to sort of get products in front of people and to get eyes glued and attached to where you as a company want them glued and attached to. And so this I think is pretty widely acknowledged now to have had fairly negative effects, that the side effects of kind of racing for attention have been fairly negative and that the dynamics of that are not entirely healthy for our online discourse or our individual psychologies and so on. So that race has been going on and that is still going on, like there's no sign of that abating. But we now have a new thing going on, a race for attachment, which is that the AI systems that companies are putting out have different sort of aspects to them. One is that they are a tool that people can use and productively and accomplish their purposes, but they're also created as a being, like a person, a companion. And the growth in AI systems being used as companions, whether just friends or colleagues or confidants or romantic companions, has been explosive. And also use of those AI systems in a professional setting, like as a therapist or as a doctor or lawyer, where they can be quite useful, but you have to worry about crossing boundaries and so on. But so some of these aspects are fairly positive. They're again being useful as tools or providing services. Some of them though, you see the same sort of thing where, well, how do we get customers coming back to our AI system? Well, we try to get them attached to it. Like we make it very personable, we make it very productive in various different aspects. We make it really compelling to talk to. We kind of draw the person in and we optimize. If we want people to be using our product for hours and hours and hours, we optimize our product so that they will want to use it for hours and hours and hours.
B
We make the system remember stuff about you personally and send you messages and sort of interact with you a bit like a person would interact with you, right?
A
And so there are all sorts of ways that have existed since the beginning of time to sort of try to draw people in and make them like you more. But these have become mechanized now. So we have AI systems that are optimized to make you like them more. Like pretty literally optimized for that. Right? Because the reinforcement process is that you create something and then you say, well, did you like that or did you not like that? And you give it a thumbs up or a thumbs down and that is fed back into the AI system. So they're basically rewarded for being likable and sort of pleasing in their responses in a way that at some level humans also do. But this is on a massive scale and using huge amounts of data and so on in this whole giant tech context. So that race for attention has turned into this race for attachment that not only do we want to hold people's attention, but we want them to get glued into the AI systems that you're using. I think this is actually important from an industry standpoint because the attention race with social media and some of these things relied on a network effect that there were. Once you get a critical threshold of people in your network, it's hard to leave that network. AI systems are a lot more one to one. So there's less of a network effect by default in using these AI systems. So what do you go to instead? You have to have some other way to make the product that you have very, very sticky. And that's where you get the attachment sort of farming. So so we have that race, the race for attachment. But now we have some new ones. Those were at some level present before much more the attention a little bit the attachment. We have new ones which are a race for automation and in particular a race for trying to grab a piece of the giant economic pie that is represented by human labor. And this has always been a little bit the case in the sense that if you create a very labor saving or economically productive tool, you allow workers to do more with less of them, you allow each worker to be more productive. In principle you can use less workers and so there's a job displacement risk. But in the past using as workers get displacement from one job, there are other jobs for them to go to. So we had people displaced from farming and they went to work in cities and factories and we had people displaced from factories and they went to work in service and then, you know, more white collar thought work and so on. And so that's been a cycle we've been through a number of times before. Very disruptive for the people undergoing it. But as society we sort of accept the, that, that, that pain, we mitigate it where we can, but it's worth it for progress. But the race that is going on now is a little bit that like that is what an ideal version of rolling out AI would be. But it is unfortunately not Necessarily the race that companies are running. Because instead the sort of ideology of AI has been developed not to develop it primarily as a productivity tool like other technologies, but as a simulation and a replacement for the full human. So it didn't have to be this way, but the goal of AI, the way that it has developed in our world and in our industry has been artificial general intelligence. That's sort of the holy grail that you not just duplicate particular human faculties that regular software had a hard time doing, but you do the whole thing. You make something that is as capable as the top human expert across the board in sort of everything that they do. And if you have that as your goal, then the natural dynamic of having something that can do all of the things that the human does is that you just don't need the human anymore. Right? So if you're at that point, you're not talking about job displacement by making people more efficient, you're talking about job replacement or worker replacement, because you can just take the person out and stick in the AI system. And if it's able to do all of the stuff that the person could do that, you know, no effort required, you just slot in one for the other. And then if you ask yourself, well, where does the worker go, let's look for another job that now the AI system can't do. Well, there aren't any. Because by construction you have made something that does all of the jobs that the people do, at least the white collar ones for now. So this is a fundamentally different thing. And I think the reason that we're in a difficult spot is because we've made that the goal to full human replacement instead of full, full instead of human augmentation or empowerment. And what is behind that goal, unfortunately, is a whole lot of economics, because if you're building tools, that's sort of the path that we've been on with previous technologies, as I'll argue later, the path we should be on with AI. But you have to work within the current system. You're empowering the people in the current system, but you're not fundamentally shaking things up. But if you have replacements, then suddenly you have the ability to replace whole swaths of human labor with AI labor. And the money that was going to those humans is now going to you, the AI company instead. Not quite as much, right, because you want to undercut the human labor, but still a lot. So if there's a $50 trillion human labor market, you only have to capture 10 or 20% of that to be making many, many Trillions of dollars. So that's the prize that the company is going after. And that is pretty clearly what they are selling their investors. So I think they like to not talk about this publicly. And they like to say publicly we're building tools, not replacements, but behind closed doors, they are very much talking to investors about replacing and automating large fractions of the human workforce and that there's no other way that the numbers make for the level of investment that is happening in AI. So that's the third race.
B
I think that's an important point to stress then, because there's a lot of excitement about AGI. It's going to make us more productive is what people are saying. It's going to make it easier to see start new businesses and it's going to make you so much more productive. But that is not really what's happening here. What's happening here is more about replacement than enhancement, as you say.
A
I mean, in some sense it's more nefarious than that because it is what's happening right now. Right. It's very hard to replace a person with an AI system all out, right. Like, if you think I've thought about this with the Future of Life Institute, if we wanted to replace people with AI systems, what would that look like? It's very hard. Like, there's no obvious way to do it. The nice thing to do is give them to AI systems, give them AI systems to make them more productive. And that is good. The problem is that it's not the goal. Like the goal is replacement. And so we have a sort of, a very, sort of cruel dynamic in that if people don't recognize that that is the goal and that that is where we're going, it will feel like, oh, I'm getting empowered, I'm getting more and more empowered. I'm getting more and more empowered. Oh, and then I'm replaced. Right? So. And I think that could happen. Unfortunately, that switch over from empowerment to replacement could be rather sudden in the sense that once the AI system hits a certain threshold in some nexus of capabilities, it could suddenly make sense to replace the human with AI system rather than augmenting them. And so I think we're in a tricky spot because things are looking sort of tool like now and are productivity boosting and are good in a lot of ways. There are lots of problems with current AI systems, but there are lots of things that they're quite helpful for. But the problem is where we're going, really. And the fact that this goal of overall human Matching and overall human replacement is, is driving so much investment that there's almost no way to sort of get off of that cycle. Like, if you're now an AI company that says, oh, I just want to make productivity tools, it's kind of like, who's going to invest in that when they can invest in this other one to get a good fraction of the human labor supply? So it's deeply unhealthy. And that's a third race.
B
Yeah. And perhaps a good microcosm here might be to think about AI for coding, where in the early 2020s it seemed like a nice tool a allowed you to autocomplete lines of code. Now it seems more like we have these agents that can replace at least junior programmers. And so as I see it, that's sort of like a mini version of what's going to happen across many professions.
A
Right. So if you're a senior developer now, this is great because you don't need to hire a bunch of junior developers and deal with them. And Claude is just very friendly and you don't have to worry about HR and all these things. But for how long is it going to still be an advantage to be a senior developer over the Cloud code version 4? At some point it's going to get as good as the senior developers and then suddenly they're gone too. And I think this is just the pattern that we're going to keep seeing if we go down the road of building human replacements deliberately.
B
Yeah. And so this brings us to the fourth race, which is sort of the end game here. What is the fourth race?
A
Yeah, so the fourth race is if you had AGI, that's a big deal. AGI meaning something that could replace all the workers. That's a big deal financially, economically. But what happens if you go beyond that? And so people from the start, even from 10, 15 years ago, have talked about superintelligence. So something that is not just equal to the best humans, but far surpasses them. So as a, a sort of intellectual competitor, not with individual people, but with civilization as a whole. You know, a better scientist than all of human science, a better planner than all of human planning and so on. And this is another thing that is, that doesn't have to be a goal. Like, we don't have goals like this. We don't think of, like, what is the biggest nuclear weapon we could possibly make? Let's make that one. Or what is the giantest airplane we could make? Let's make the just the giantest airplane. Like the bigger the better. Right. That's not the way we think about other things. But. But we have thought about this in terms of superintelligence because two reasons. One, the more positive one, that there's an idea that superintelligence will grant us all of these things that we want. It will grant us these technologies that we want. It'll cure cancer for us, it will create nanotechnology, it will create immortality, it will solve global warming, you name it. If we've got a problem, superintelligence will solve it. That's sort of the. The selling point. But there's a much darker undercurrent to this, which is that people believe that superintelligence, insofar as it is like a super agent that is working for you, it is like an employee that is controllable and you can tell it what to do, will grant you an enormous amount of power, that whoever builds and controls the superintelligence gets all of the goodies and everybody else is at a disadvantage. If. If you're in a competition with someone and they have superintelligence and you don't, you're in trouble. And if you have it, you're golden. And so there's an intense drive to superintelligence as a grantor of power. If you're someone who is worried about being disempowered or worried about your competition having more power than you, boy, you'd better have superintelligence before they get it. And so we've seen this play out on all sorts of levels. So the genesis of OpenAI was essentially Elon Musk looking at Google and saying, they're going to create superintelligence. Google is basically going to rule the world. I don't like that. So let's create something where there's another player that might develop superintelligence and sort of give it away to the people or whatever the vision was. Exactly. And so he ended up with OpenAI plus Google, DeepMind, then we ended up with Anthropic, because, boy, I'm a little worried about what these OpenAI guys are doing. If they have superintelligence, that's not so great. We need superintelligence that's done better, so let's create Anthropic. That does it better. And then full circle to Elon creating X, saying, I think superintelligence is probably going to be a potential disaster, almost certainly not going to control it, but if I'm not building it and everybody else is, I'm out of the game. It more or less explicitly says this. And then we have the competition between the US And China where there's a story that if we're the US And China build superintelligence first, we're screwed. If you're China, if the US builds superintelligence first, we're screwed. And so this becomes a geopolitical competition for geopolitical power. And so while the early thinking about superintelligence was kind of utopian hope for humanity, and we had things like the Asolomar principles where people signed superintelligence should only be developed for the betterment of humanity and not in favor of some particular corporation or country. You know, many people signed this who are now very much not following that credo. So, you know, it started out with a lot of good intentions, but also with the knowledge that it could potentially be a, you know, a race and a power race and a geopolitical race. And indeed, we've ended up with the races and not the good intentions for the most part. I mean, maybe the intentions are still around, but the dynamic is driven by the race. And so exactly the feared situation that we have multiple corporations and governments backing them at some level, competing to get to superintelligence that absolutely as fast as possible is very much what's playing out. And I think that is what's very scary about that race.
B
It's a very tempting line of thought that this other group is trying to develop AGI or super intelligence, and they're being irresponsible about it. So I'm going to step in and I'm going to do it better, I'm going to do it the responsible way, and then you are thereby contributing to the race. From their perspective, though, it can look rational because the other players are. Or the existing players are indeed being irresponsible about how they're developing this technology.
A
You could see the temptation, and it's still happening. So the EU is sort of behind in development of frontier AI systems. Everyone acknowledges this, despite having a very advanced economy, huge amounts of talented people, the economic resources, the technology and so on. They just haven't been in the thick of it in the same way that the US and Chinese companies have. And so the question for the Europeans is, what do we do now? And a lot of people are answering, well, jump in the race. You know, get your own frontier AI system with European values and do it right as the, the. As the, the European, you know, AGI and superintelligence effort. So we see it like, potentially repeating again. And I'll. I'll advise against that. But we'll come back. We can come back to that later. But you can. So you can certainly see the, the logic. I think the, the flaws in that logic that I think don't get realized by the people who are jumping into it are two. Once you jump into the race, you're subject to all the same incentives that you saw corrupting everybody else when they were in it, Right? So if you think I'm going to be the pure one, as soon as you need the giant amounts of money to develop the data centers, to develop the AI systems, you need all the investors. And what are the investors going to invest in? They're going to invest in things that are going to pay off giant profits. Where do those come from? Well, they come from replacing humans. So you're going to end up in a very similar dynamic of trying to develop these things under competitive pressure and under economic pressure from the investors and under government pressure. And all the same pressures are going to afflict you. So that's the one thing that I think doesn't get appreciated. The second, and this is especially true of superintelligence at some level of AGI as well, is is that the idea that you're going to create something that is vastly smarter than you, operates at a vastly faster timescale, at levels of comprehension and thinking that you have no access to, and that you are going to control that and meaningfully know what it's doing and tell it what to do and have it say, yes, I will do exactly what you want is questionable at best. So if you, if you think of you just imagine taking a human being like me. So can the federal government of the United States control me? Absolutely. Like they totally can. They can get me to do what they want if they really want to. What if I were 1000 times human speed, both in movement and in thinking, could the federal government of the United States control me? I would say no. Like, by the time they, they want to take some action, I can just walk miles or tens of miles away at my own leisure. By the time they get to where I am, if they want to enact some plan, I've got months to figure out every little between their hour, months for me to figure out what to do in response. I may not be able to control the US government, but they're sure not going to be able to control me. I think even the speed advantage alone capability basically the same. I'm just a human and the government has lots of humans, but the speed advantage alone makes it more or less impossible to control something if you're in an adversarial relationship to it. So I think the idea that we are going to easily control superintelligent systems that are adversarial with us is preposterous. We simply won't. So the questions then are, can we make something that isn't adversarial towards us, that it is totally pliable, or can we make something that is hard? Can we control something even though it's difficult, like put in all the control systems to make that happen? And I think both of those are very, very challenging things that we have no solution to. We have no solution to. If something is more powerful than adversarial, how do we defeat it? It's like, almost by definition, you're not going to. It's like, better than you at doing things and you're competing with it. It's going to be better at those things, too. We don't know how to make a powerful AI system reliably aligned to human interests or sort of corrigible so that we can tell it what to do. And it will say, like, yes, I'm going to do that, or, no, I'm not going to do that. We don't have these alignment techniques or the control techniques or the engineering techniques to know how to do that. And so I think that the default is that we are going to build. If we go down the road of AGI and superintelligence, we are going to build systems that at best are of questionable alignment with us and are not controllable by us. And so if you're in a power competition, like I'm racing to get this powerful thing before they do, and you build something that you can't control, you haven't really won anything. Right? You. You don't have. It hasn't given you power. You've created something that you've given power to, and maybe your competitor is giving power to them too. Maybe the two things that you've created are in competition with each other now, but you're sort of out of the picture as a creator. So I think the real misguided part of this race for superintelligence and power is that it simply isn't going to work. The power is going to end up in the AI systems rather than in any of the people who are developing them.
B
Yeah, I think it's easy to anchor our expectations about what an intelligent being is like to other people. And so you think, okay, what is a very smart person? Maybe it's a professor at A top university. So you think, okay, that person is still sort of responsive to normal human motivations. That person seems, it seems like you could put that person in a room with other smart people and have her work for you and sort of, yeah, benefit your projects. But I think the more useful way to think about this is as you say, what would a person be like if that person thought say a thousand times or 10,000 times faster and had more memory and had more knowledge. But then also you have to imagine this person not with human motivations, but with some other motivations that we can't quite understand perhaps.
A
Right. So imagine Microsoft as a company that also runs 1,000 times faster than it currently does. So Microsoft does 20 years work in a week of your time. So I can't control Microsoft as it is. Right. And the CEO of Microsoft can sort of like keep tabs on Microsoft. But if Microsoft was running at a thousand times the speed, even the person nominally in control of it, the CEO would basically be completely unable to keep up with what's happening. And the rest of us would be completely outclassed. Right. Microsoft would just run roughshod over anything it wanted to, including the US government and so on. Like, if I can not be controlled by the US government as a person person, certainly Microsoft is not going to. And Microsoft, unlike me, I think Microsoft running at a thousand times human speed, but at the same sophistication that it has, could basically take over, you know, be in charge. Right. It's got the span of capabilities and so on that could take control, not just have control lost of it, but take control of other things if it was running at a thousand times the speed. So I think superintelligence, I think a better model for it is a corporation like an ultra fast operating corporation. That's the sort of thing that has the generality and the capability and the speed. And the only thing that would be different is that it would actually be much, much smarter than Microsoft and able to do all sorts of things that Microsoft can't do.
B
Yeah. So we have these four races. We have the race for our attention, we have the race for our attachment, we have the race to display displace workers, and then we have the race to superintelligence. How do these feed into each other? How do they sort of affect each other?
A
So I see them all as at some level being part of one big race, which is that they're all constructed around replacing people. So they're replacing people in the generators of what happens online with attention, they're replacing People in their roles as friends and doctors and lawyers and companions and teachers. In the attachment race, they're replacing workers very straightforwardly in the economic race. And then in the end, they're replacing humanity in the sense that if we build superintelligences that are either out of our control or even if they're like relatively under our control, but we're delegating all of the difficult stuff to the AI systems, all of the decisions are being delegated to them, all the technologies are being developed by them, all the new thinking and content and creativity is coming out of them. Then those things are really running civilization rather than us. And we're replacing human civilization with a machine civilization. So I think the thing that runs through all of these is effectively we're replacing human acts and people and institutions with machine ones. And the ultimate end of this race is wherever it stops along the way with concentrated power and economic displacement and all of these things. The end is a world being run by machines rather than people. And that's the path that we're on fundamentally. And this just seems like incredibly dystopian to me, like why. Why would we possibly want to replace humanity with a world run by machines? Even in the best case scenario where the machines are nice to us, we've sort of lost control of the future. Like the 100,000 years that humanity has spent as a species, culminating sort of in where we are now with the technological civilization. We have all the. The great things that humanity has created. We're sort of putting an end to that and handing it over to some other unknown species that isn't us. That and like crossing our fingers that they're going to treat us well and sort of build a human utopia for us. This seems like an incredibly bad idea to me. So I think that the, the bottom line is that the dynamics, these race dynamics and all of the incentive structures that AI is being developed under in this current path are leading us to this dystopian race to replace humans and then humanity with machines. And that is the race that I would. And the pathway that I would like to very much get off of.
B
Yeah. And so it's easy to think that this is all inevitable, that we are sort of being led down a stream that we can't really prevent because of the incentives, because of the enormous pressure to go down this route. But you write about that not being the right way to think about this. So is this inevitable?
A
So I think it isn't in the sense that there are lots of ways to think about this. So one argument is that things are inevitable. If they're going to give economic advantage or power, like we're going to develop them, they're going to happen. But take for example, human super soldiers or like superhuman super smart people through genetic engineering or eugenics or something else. These would clearly be valuable. Like people value intelligence. If we had super workers, they'd be super valuable to corporations, they'd be valuable to armies to have super soldiers, but we don't have them. Right. We decided at some point, in some way that we weren't going to do human intelligence engineering. We decided we weren't going to do eugenics. We'd like to decided very hard that we weren't going to do eugenics. And I think very much for the better. So these at some point may have, like, if you imagine early 20th century, we might have imagined that eugenics was more or less inevitable. Like the groups that do eugenics are going to have advantages over the groups that don't. And so we're just going to have to have a eugenics race and we're all going to be doing eugenics. That is not the way it turned out. We, fortunately we decided very differently. And so I think just because something gives a potential technological advantage or an economic advantage doesn't mean that it has to happen. We have for a long time had cultural decisions that we make, ethical decisions, moral decisions, feelings about what it means to be human and what's important as humans that we very much enforce on our technologies. And so I think that is one reason why not every technology comes into being, even if it could grant power or money. So I think AI can also be one of them. Now, we have gone fairly far down the road on this. So we're not in the situation where, you know, this, this would have been a much easier argument to make 10 years ago when all of these things were in our imaginations rather than now where trillions of dollars are being spent on them. So we're certainly in a harder place to change paths. But I think we do see some, some hope here. So I think a lot of what has driven the race so far is sort of promises made and sort of ideologies held without that much input from everybody else. So we've had sort of Silicon Valley developing these technologies. We've had the investor class investing in them, but the politicians have more or less been not paying attention. Society as a whole has been more or less not paying attention. Workers have been more or less not paying attention. Now suddenly all of those groups of people are starting to pay attention. And the unions, for example, and general people are starting to see AI companies say, we're going to put all of you out of work and start to take it seriously. Not just as a pitch to investors, but as a statement of what their intentions are. Governments are starting to see AI companies saying, we're going to build AI systems that are smarter than the smartest human and can do things that humans can't do. And for a long time people have been like, yeah, yeah, okay, someday maybe that'll happen. But until they get really dangerous, let's just not worry about it. Well now suddenly we have AI systems that are superhuman at developing cyber exploits to develop, like at cyber attacks, at cyber security. And governments are suddenly starting to pay attention, like, oh shit. We from last month to this month have a system that can identify thousands of zero day exploits in our most secure systems. What does this mean for our banks? What does this mean for our information systems? What does this mean for our hospitals? What does this mean for all of our corporations? That suddenly what used to be maybe a state actor, you know, the US or Chinese or Israeli or Russian government would have access to these sorts of capabilities. Now basically anybody can have it if they can get a subscription to the most advanced AI system. This is a fundamental change overnight because the AI system hits superhuman capability in a particular thing. And once it does that, you can scale it up, you can have it run a thousand times faster, you can run it a thousand times in parallel, you can make it available to everybody. And so there's a sudden realization, and we've seen this just in the past couple of days in the US government saying maybe we should be testing these AI systems before we let them go to market to see what sorts of risks they may pose. And I think that was occasioned primarily by Mythos and similar things from OpenAI and the capability that they have in the cyber domain. But we're going to see that that's not the first or only time where we're going to see that it's going to get more significant in cyber and we're also going to see it in other domains. And so I think as the people start to realize like this is a really raw deal, we're developing these AI systems that are going to replace me in my job at the same time as replace my student's girlfriend with an AI bot or replace mentors with AI systems or like addict all of our children to these things that are untested. They're going to do that, they're Going to put me out of my job. And they're a national security threat who's building things that we don't know how to control and suddenly give all bad actors in the world access to the things that only the US military or the Chinese or whatever military used to have. Like, holy shit, what are we going to do? So I think that we're at that moment roughly now where this realization is starting to become widespread. Until now, the AI companies I think have more or less had sort of a free run where the only things pushing on them were the incentives to race toward more powerful systems as fast as possible. Now they're going to start feeling a lot more incentives from all of these other directions. And so I think that is what makes it possible to change paths. Like if we can identify what is the path that we're on and channel this pushback and channel all of these other societal concerns toward redirecting the sort of way we develop AI into a path that is more positive for everybody else, then we're in much better shape. If it just sort of pushes back willy nilly and not in an organized way, I think it won't have as much of effect. I think it can be more organized around, like here's what we're doing now, here's what we could do instead. That I think is more useful. And so that's why we have put a huge amount of effort into trying to develop, well, what does that better path look like? If we could change tracks, what does the other track look like that we could switch to? And why would it be better?
B
Yeah, and so I guess the main question is then whether democracies can muster a response quickly enough to this, whether. Because there's still a sort of reluctance to jump onto some hype train for you. We've been warning about this for more than a decade. It seems like the evidence is coming in and it's seems like we are pretty late in the game for many people. For most people, in fact, they're, they're hearing about these things for the first time and, and now suddenly it seems real. And can we, can we respond? Can, can democracies respond? Because democracies are, are old institutions. They, they, they function slowly. AI is very moving incredibly quickly. There's a mismatch there. What do we do about that?
A
Yeah, so I think there's a temptation, I think, to say, well, let's slow down the AI companies. And I think there's a lot of valid reasons for that. Things are moving so fast. If the rest of US had just had a little bit more time to sort of adjust and create the right governance mechanisms and so on, things would go a lot better. So I think, and I think there's a lot of validity to that argument, but I think what that often gets translated into is, well, we should slow down AI in general. That AI is one thing, and it can either be at speed 7, 8, 9, or 10, which it's set at right now, and we should just dial it down to three or four, and then things would be better. And I think this is reasonable, but I think it's misleading in the sense that what that does is sort of try to slow down the train. Like, if the locomotive is running along the tracks at 100 miles an hour, trying to slow that down is, like, very, very difficult. What you can do instead is change direction. So changing the direction of some very big, fast moving object is much, much easier than, like, stopping it or slowing it down. And we don't necessarily want it to be stopped or slow down. Like, there's a lot of AI that is really good. Like, we. We do want better technologies. We do want better science. We do want people to feel empowered and be more productive. We do want better AI in education. There are all kinds of things that AI can do for us that we actually want. And so just saying we want to, like, slow down or stop AI in general is totally throwing the baby out with the bathwater. That's not what we want. So while the temptation is just to slow down AI in general, I think the thing that we actually need to slow down or stop is the particular runaway in the direction of human replacing AGI and superintelligence. That is one direction that we're going in AI development that I think we do just need to halt, in particular superintelligence and at some level, AGI and overall human replacement AI. But that doesn't mean that we have to slow down or stop AI as a whole. We can redirect it into tools, like quite powerful tools that are under human control, tools that do what we want them to do and, like, allow humans to do things that they couldn't do before, as tools generally do, and tools that are developed in a incentive structure so that they're generally used for the betterment of people rather than in, like, highly negative ways. So I. The sell, sort of the case to be made here, I would say, is not that we have to have our mechanisms catch up with AI because it's, like, moving too fast or that we have to slow AI down, but rather we need to put into effect some basic incentive structures that will redirect the AI development effort away from the race to replace path that it's in now and toward this more tool pro human trustworthy AI direction that, that we would like to have instead. Yeah, that's great.
B
Let's get more detail on the better path for AI. What would a world with pro human tool AI look like for a scientist, for a business person? Could we have some examples here?
A
Yeah. So I think the critical thing that would be different is that AI systems instead of being like, let's make the, the single AI system that can score the best on all the different benchmarks and then put that out as one product that everybody uses. And that product does some things quite well where it's been optimized for those things and sometimes some things quite poorly and is okay at everything. Instead, there would be a whole lot of different systems that were created for particular purposes. So for example, if you're a scientist, you want both some specific scientific tools. Like if you're a biologist, you could be into alpha folds or alpha, I forget what the alpha. Something that predicts molecular reactions. You could be into AI that does simulations really well. You could be into AI that does biochemistry really well. You could be into mathematical systems that do mathematics for you and solve your physics problems and so on. So there are a bunch of quite specialized tools that have fairly different properties than the general purpose tools that we have and outperform them on the particular tasks that they're optimized for. So if you want to do protein folding, you do not start with GPT 5.5 and then give it some protein data and it figures out how to fold proteins. It's terrible at protein folding. That is not the kind of thing that you want to do for protein folding. Even if you're just a general researcher in science and you want something, an AI system to help you in your research, you want something that tells the truth and when it doesn't know something, you want something that says I don't know that thing. It's so hard to get I don't know out of an AI system. Have you tried? It's almost impossible to get, I don't know out of the things. And this is not inherent to that technology. It's because the way that they're trained, disincentivized, like they get penalized for saying I don't know. Because most people don't want to hear that, but scientists do. Scientists really want to hear if you don't know I don't know, you're doing nobody a favor in science if you, if you pretend to know something and give them an incorrect fact, like that's the worst thing you can do. And so if you're a scientist, you want a, just a very different flavor of AI systems, some of them particularly narrow, some of them just tailored for scientists that have like much better epistemic standards. And much like they act like a scientist or a scientist assistant or something, rather than like just a general purpose AI that does everything. So if you're a scientist, there are particular things you would want. If you're in business, like say you're in the financial industry, you want an AI system that understands how to work the numbers and like do the tasks that it's designed for and totally not screw it up because you are like dealing with blood giant piles of money. If you have an AI system that makes a mistake that a calculator would not, you have really like you've incurred a huge liability, you've lost a bunch of money like so. So if you are in a, so what you want in an industry like that is reliability. Like you don't want something that is necessarily going to be a super genius, but you want something that won't screw up the spreadsheet, won't make up numbers, won't mess up numbers, et cetera. So I think you will want a set of tools that are tailor made for the particular applications you want them in from the ground up. Not like a general purpose AI system that you've then tried to hit with a stick for long enough that it doesn't screw up the math. That is not going to get you the reliability of a calculator. You're going to hit it with a stick forever and it's never going to be as reliable as a calculator. You have to build something that fits the purpose. And so I think A, a major part of the better path is A building tools rather than beings or replacements for people, B building things that have a purpose like that are actually trying to solve a problem and are constructed from the ground up to solve that problem and to have the right attributes to address that problem and do so reliably and then see that these things are constructed with the right incentives so that they actually, you know, benefit humanity and the people that are using them. Like for example, you can build an AI system for a child who's using it in school that basically does their homework for them. That's what we have now. Like kids get on to AI systems They're like, I've got this math problem I can't do. And AI systems like, here you go. And then they write that down in their little book. Nobody's accomplished anything, right? The child hasn't learned anything. The teachers probably using AI to grade it and maybe used AI to generate the assignment and like, what are we doing here? You could very easily have AI systems that are like a teacher in that they interact with students but don't give them the answer. They bring them to the answer step by step and give them exactly what they need and sort of fill in the gaps and act as a real tutor. Those AI systems are possible now and would be a huge benefit to education. It's just not what is most easily sort of pumped out as a product to huge numbers of people.
B
This is different from what you encounter today, where it might seem like an AI system is purpose built for achieving one goal. Like I was on a real estate website earlier today and they have this chat agent and you can use it for the purpose it was put there to be used for, which is to get information about the house size and the cost per square meter and so on. But you can also just ask it to give you a recipe for pancakes or try to do math for you. And so this reveals sort of the underlying general agent like creature that they're trying to sort of squeeze into a box that's giving you information about real estate. So whenever a system you see today seems purpose built, it probably wasn't exactly.
A
And I mean, there are positive and negative reasons why they're building on top of these very general models, in the sense that even if the purpose is narrow, often you need more general intelligence than that purpose might suggest. So if it's some online chatbot to help you figure out your piece of software that isn't working, it needs actually a fair amount of general intelligence to be able to do that. But it doesn't need to be an expert virologist, it doesn't need to have recipes. Like all of these things are totally unnecessary. So I think there is a reasonable pathway that says, and I think things are at some level evolving a little bit in this direction anyway. There are sort of core cognitive capabilities that you have to have in a model or in an AI system that are general enough to be able to sort of backstop lots of different applications. But then there's a whole lot that you don't need to have. So you could start, and this is part of the technological roadmap of the better path would be don't give up general purpose systems entirely, but use the part. Like have the general purpose part be the part that has to be general purpose, like general purpose reasoning and language and things like that. It doesn't have to be an expert lawyer to be able to give you a recipe and it doesn't have to no recipes to give you legal advice for the most part. And insofar as it did, like if you're consulting a legal AI system and for some reason your case involved recipes, right, Then it could call the recipe AI system. Like you don't have to have them. Like it's okay to have multiple AI systems that can call each other. You don't have to have one AI system that does all of the things. And I think so I think driving in the direction of purpose driven models and modular models doesn't mean that when you turn on your computer you have 700 different AI models that you have to choose from in order to do your thing. There can still be a front end where you say, okay, I want a recipe. And the front end calls the recipe generating AI system rather than it all being part of one big thing. So I think there's a sense in which this would just feel similar but just be much better because the, the individual pieces of this composite system would be individually tested and developed to do the thing that they do really, really well. Right. Without the sort of trade offs that you have when you're trying to get something to do everything, but rather do them really well. And you can call those different experts. And this is what we do with humans, right? We don't expect everybody to do everything really well. We have experts at X, Y and Z and we don't need a certain thing, we call that expert. So it's just a little bit strange that we've developed AI systems around such a different model that one AI system is supposed to do everything.
B
And sometimes having limits to generality might make the system better at what it's trying to do. So we can go back to the AI system that's tutoring a child in math, for example. That system needs some generality in order to understand what's happening, in order to answer the questions and be smart enough to guide the child to find the solutions instead of just providing the solutions. But that system does not need to know, for example, the best way to harm yourself or the best way to hurt others, because right now they do. And yeah, yeah.
A
So in terms of the, it can be better at what it does, but also can be better at not doing all the stuff it's not supposed to do, right? So there's no reason why, if it's an educational system, it's trained on a bunch of stuff stories about self harm. It might need to know a little bit in order to call in the psychologist, AI or a person if it flags something like that. But an educational system doesn't have to do that. It doesn't have to be trained on terabytes of porn either. This is crazy that we have one system that does all of these things. And so another aspect of this I think is the safety features. Right now in AI systems, the way that it is generally done is through alignment training and refusals. So you put all of these capabilities in these powerful AI systems and then you try to again hit them with a stick enough times in the right way so that when you ask them to do something bad, even though they totally could do it, they refuse to do it. And this is very fragile. Like this I think is not a good direction to go down. Both because the current systems you can simply get around these restrictions. It's not hard even with the most secure ones, it takes a little bit more effort. But all of these alignment refusal systems can be circumvented and undermined. But also we have no idea if that's going to continue to work even as well as it does as we get to more powerful AI systems. And actually lots of good evidence that it's going to work less well. And the problem is that the capability is already in there. The virologist is still hiding in the thing that you want to do, your spreadsheet. And so you have to have the system itself refuse to do virology. Why not just not have the virology in there? So the ability to have safety in the systems is dramatically enhanced if we can narrow the scope of the systems to doing particular things and not just refusing to do other things, but not being able to do those things. Right. Similarly with reliability. So if you have an AI system and suppose you've got a totally general AI system like we do now, and you want to say is it reliable? What does that question even mean? Is it reliable at what? At doing what? Okay, well let's test it for the things that it does and see if it's good and reliable at doing them. Okay, let's start the list. The list is everything. You're never going to even make it down at 1% of the list to test it against. If you say instead, I want to make an AI system that is a tutor for middle Schoolers, then you can say, okay, well what is it supposed to do as a tutor for middle schoolers? Let's make some criteria of what makes a good tutor, what makes a reliable tutor, what are the things it should do, what are the things it really shouldn't do. And then we can test against that. So that's a thing. We can. Now we could take a general AI system and check whether it does all those properties. But suppose it doesn't. Are we going to go back and change that monolithic giant AI system so that it does a little bit better at tutoring the middle schooler? No, we're going to say like, okay, it's not so great at tutoring middle schoolers, but look at all the other things it does. So once we have purpose driven AI systems that have something that they're trying to accomplish, then you can also evaluate are they actually doing the thing that they're supposed to accomplish. Well, and you can have like a certification process that says yes. Okay, this is like thumbs up for being a tutor for middle schoolers. We can now deploy this to middle schoolers and not be freaked out. It's going to tell them to kill themselves or give them porn. Right.
B
So this makes sense for a lot of domains. For some domains, I worry that the capabilities we want, like coding, these capabilities come bundled with the capabilities we do not want, like being able to do autonomous hacking. Coding and autonomous hacking is almost the same skill set. It's, it's sort of, or maybe autonomous hacking is a subset of coding. And so when we, when we train for, for coding or when we imagine we have a limited AI, that's, that's, that's able to code. Can we avoid the dangerous capabilities in those sort of more difficult cases?
A
Yeah, I think it gets tricky with some of those where I think there just is a real danger in crossing the sort of human capability threshold. Right. So I think if you, we're actually in sort of a sweet spot at the moment with some of these capabilities in the sense that you can ask a, a coding agent to do at some level as well as you can specify the thing that you want the program to do, you can more or less have a coding agent write the program to do that thing. And that is good. This is a great tool to have at our disposal. The problem is that if we don't stop there, we'll go on to. Well, really what I want is I just gesture in the vague direction that I want this thing to do and the AI system will sort of figure out all of the stuff that I didn't specify and what I actually wanted unbeknownst to me, and create this whole thing for me. So that is a capability that they don't have now, but they could, if we could keep going down that path. Now is that a good capability to have or not? Right. So I would say no, that is not a good capability to have. It's tempting because you always want like the better thing. But what is that thing really doing? It's not a tool anymore because you're not really directing it. You're sort of nudging it in the right direction. But all the real decisions as to what the code in the end is supposed to do are being made by the AI system, like kind of guessing at what you want. So this is, I would say there's a sweet spot where there's a kind of like high level of empowerment as a tool where the thing is actually implementing the detailed intention and is like serving as an extension of the will of the human being that is using it. And then if you go far past that, then it starts not being so much a tool anymore. It starts being like a hard to control being or one that like you can't meaningfully control in the sense that you don't really understand what it's doing. You can't really direct it meaningfully because like you're just sort of irrelevant and out of the loop. So I think there's a. I think most of the trade off with powerful capabilities leading to also powerful risks is probably going to come past that threshold. I think there are always going to be dual purpose to powerful tools. But I think if we keep the power level well, if we make the design goal that this is a tool that is an extension of human will and that allows us to do things as directed by us better, then we're sort of going to miss out like the huge set of risks that are entailed by things that we can't control and that are superhuman. But what we'll be left with are the risks of people doing things with lots of capability. Right. So we will still have powerful tools that allow people to do things. And if those people are ill intentioned, then they're going to do bad things with those powerful tools. So I would say the trade off at some level in the tool picture is that if you empower people, you've empowered people. And so if you create an AI system that is a powerful programming system, there will be lots of people who use that to do great things with programming and There will be people who use it for cyber attacks. So I think there's some level of, and as you say, you can't divorce those two. You're not going to be able to create something that can only do the good things and not do the bad things. So tools are often going to be inherently double edged. So then the question is, how do we think about safety in that context where it isn't based on sort of alignment and refusals, but is based on having guardrails that allow the system to do things and not do things, but external guardrails rather than internal ones that rely on narrowing the scope of the AI system. So that again, you as much as possible don't build powerful, dangerous capabilities into systems that don't need them. And then I think there also will have to be levels of access that are allowed to different types of people. So I think if you have something that is the best in the world at creating cyber attacks, you simply should not make that available to everybody to use at every time. And that's going to be a complicated set of trade offs. But it's certainly something that we do now. We don't just let everybody use nuclear weapon design programs or we don't give everybody an M16 or an M1 Abrams tank like we although these things are tools, we still recognize that there are tools that not everybody should have even if they want them. And so I think we will have to do that with tool AI systems as well. So I think there are a few different safety strategies that are actually fairly different from the ones that are used now, which is mainly alignment and approvals and they are scope limiting that there are external guardrails. So like control systems that are outside of the system itself and prevent it from doing things it's not supposed to. And these are used somewhat today as kind of filtration systems and then access control. And then the fourth thing is defense. So whatever you do, there are still going to be bad people that use tools to do bad things. And so I think AI can help there too in building up the defenses against the systems that are being built. And this is something that is happening now as well. But I would like to see accelerated that we can build AI systems that not just find cyber exploits and, and, and like create mitigations for them, which is what's being done now. But for example, if we, if you ask what does truly secure software look like, it looks like it has properties you'd like it to have that you can prove that it has, like mathematically prove that it has, that it's formally verified to have particular properties. Now this is very hard to do like right now with software, because proving stuff is difficult. But as AI systems get better at proving things, they can prove like they can formally verify properties of more and more complicated sets of software. So I strongly expect that like in 20 years, assuming we're still here and still using software and so on, and the world is at some level being run by humans, the software systems that we will be mostly using will be formally verified. Like they will be. The properties that are formally verifiable that we would like to have will be baked in from the get go and they will be mathematically proven properties. We won't still be doing software bug patches on things anymore because the properties that are important will have been baked in from the get go by powerful AI systems being able to do those proofs. So it's a fairly different picture of safety than is currently being pursued, although with some overlaps.
B
You've actually begun answering my next question, which is that tools can be dangerous. Right. A knife is a tool, a gun is a tool, a nuclear bomb is a tool. And so just building tool AI is not enough. We also have this additional, we also have to have this additional layer of safety. Would you then say that formally verified software is also dual use, or is this one of these things where we just want to make differential progress in that direction because this is a safety enhancing direction and it's not going to come with bad side effects?
A
Yeah, I'd say formal verification is one of the more lopsidedly defensive ones that I can think of. So I'm pretty proud that I think most tools and most new technologies you can, you can certainly find both sides to it, but it's, it's rare that you don't want the, the software to do exactly what you want it to do. It's like almost, almost definitional like it. So you can, I mean, I think there are edge cases you could imagine where like perfect security may be slightly like, like there are debates as to whether, for example, it's good to have perfect security from eavesdropping. Governments will often be quite upset that they can't eavesdrop on certain people. And there's a sense that it should be very hard but not impossible or something like that. I tend to not subscribe to that. I feel like the trade off is worth it to have just actually secure communications and private communications.
B
There's a long debate there to be had about whether it's even possible to Build a system that is almost secure, but not totally secure. I don't think you can that.
A
Yeah, you probably can't. Yeah, so, but, but I think so you can imagine arguments like that where like too much of a good thing, you know, you don't want. But, but honestly I just sort of feel like as with private communication and secure software and like cryptography in general, you know, it working is good and, and it not, you know, and, and there are some things that are generally defense dominant and we should be leaning into them.
B
Yeah, so that's tool AI and there's also this aspect that AI systems should be pro human. And now when we think about pro human AI, it's how do you measure that? Right. If you begin measuring that, how they measure it on social media today, for example, you get this discrepancy between actual human flourishing and then what people are clicking on, or user satisfaction measured by some metric that doesn't fully capture what we want. When, when we say pro human, what do we mean? How do we make. It almost seems like you have to sort of solve moral philosophy to answer that question. But perhaps not quite.
A
Yeah, I think not quite. I think the measurement problem is a real one though. So I think something that surprised me was we created a convening to talk about what pro human would mean. And we got a whole bunch of, you know, this was not a giant, you know, globally diverse population of people, but it was very ideologically and politically and sort of socially diverse people who did think a lot about AI. And we got them together to sort of hash out what would principles, what would pro human principles look like? And we came up with a whole bunch of them. And you can find them at humanstatement.org, this list of pro human principles. And what was kind of shocking to me about this is how uncontroversial they were. You're used to sort of political debates and any sort of set of principles that you want to raise that people are going to vehemently disagree about them. And in this case that more or less didn't happen. And that really shocked me. And it made me think that there are plenty of things that people still very much disagree about, but they were able to more or less bypass those in defining some things that we as humans want. We want AI systems that augment and empower us rather than replace us. So that is, you can take the other position that we want AI systems that replace us in our jobs because the productivity will be worth it. And like the giant economic gains of automating all of labor will be amazing and we'll all live in utopia. Like, there are people who believe this. They're just not. I think they probably would call themselves pro human because they ultimately want in their vision. I think they want human flourishing. There are other people who want to replace humans with like superior machine descendants. And I think they're very much not pro human. But I think we were able to find a lot of agreement around like it is a worthy goal to have AI systems that complement and extend human capability rather than replacing humans. What that means in practice can be very slippery. But as a goal, like as an ideal, a strong one, AI systems that when we use them, make us more capable and smarter rather than less capable and dumber, right? This is what AI systems tend to do right? Now there have been recent studies that like pretty dishearteningly show how little time it takes offloading your cognitive labor to an AI system before you stop being able to do that thing as well. We've all felt it like, I can't find my way around most cities even when I spent a lot of time there without Google Maps. And so we know this. But if it becomes everything, then what are we left with, right? If we're outsourcing all of our cognitive labor to machines, we, we're obviously going to be fairly incompetent of just about everything. So we absolutely can design AI systems with a design criterion that as people use them, they become more capable rather than less capable. Now how exactly you measure that will be tricky, but it's not like impossible. It's not like there are. You can easily imagine ways to measure this. Like you test someone how good they are at something, let them use the AI system for a bunch of time and test them again. And if they're worse at it, you've done a bad job. Doing it well and in detail and so on would take a lot of thinking, but it's not like some impossible to test thing. But I think what is important is that it be a design criterion because right now it's not right. Like if you're designing an AI system, this is very much not in your list of things that you're trying to get it to do is make the people smarter who are using it.
B
I agree that we can probably solve the metric problem if we agreed on a set of principles for designing good AI systems and so forth. Perhaps the important first step there is to find these principles which we've sort of taken a stab at.
A
And again, I think not that hard in the Sense that we've got some, they're pretty good, people agree on them, could use some improvement. There's surely some other ones that should be in the list. But if we had a magic wand and adopted these principles as core aspects of engineering AI systems, I truly believe that I systems we would have would be just way, way, way better for us than the ones that we have now. Even if not perfect and even if there could be slightly better principles and even if there are trade offs, way better than the path we're on now given just already the principles we have.
B
And then there's the issue of time. So these principles are sort of from today. What if we agreed on a set of AI principles in 1900 or in 1700? We would want a way for these principles to flexibly change over time in directions that we would like. So we can't sort of have these principles set in stone, otherwise we risk having like an ossified civilization. Is there a general solution there?
A
I don't think you should think of these principles, at least the way that we were trying to develop them as moral principles or ethics for society or something like that, which I think do need to evolve. And part of the human project is the moral progress where we discover new ethical and moral systems that we prefer to the old ones and decide and decide our better. And we certainly want to retain that. I think it would be an interesting exercise that I haven't done of looking at the pro human principles and thinking well, what might people have preferred 100 years ago or what might people prefer 100 years from now in terms of these principles. But I tend to think that as with all the idea here is not, as is sometimes discussed, to find a set of ethics and bake it into the super intelligent AI system and put it in charge and then everything is either dystopia or utopia. Whether you get that right or not. That is not the plan here. The plan is to very much have these be principles that are human, decided, human, agreed, imperfectly instantiated in the real world and evolve like everything else in our society does. So I think there's no, there isn't the same sort of danger of like lock in of some set of principles that there is in some other pictures of reality. This is much more conventional, like yeah, we're going to change our mind about certain things at some point and we'll just change our mind and make new principles that encode how we prefer to understand things.
B
So we have this better path of tool like AIs that are pro human do you imagine that these will coexist in a world where we have more agent like AIs that are just optimizing and so do not have these constraints of having to be pro human? And if they're coexisting, won't the autonomous agent like systems simply outcompete the systems we want we're discussing here over time?
A
Yeah, I mean, I think we certainly do need. Just creating the pro human systems will be helpful in the sense that I think there are domains in which they will out compete the more sort of general systems. There will certainly be people who prefer a system that is under their control than a more agentic one. But it will always be the case that if you allow some companies to build sort of unlimited capability and agential capability in their systems, that's going to outcompete something that has to operate at human speed and with humans in the loop and is more limited in scope and capability. So we absolutely, if we want to stay on this pro human path, have to close off some development pathways towards AGI and superintelligence. And I think we also have to in terms of things that aren't AGI and superintelligence, but are not pro human nonetheless, we do have laws and regulations that create the right incentive structures to favor the pro human ones over the other ones. Some of that will happen by itself. So there's certainly a market incentive to have more reliable systems rather than less reliable systems. So like all things being equal, people will choose the more reliable ones. And that's just an advantage. I think if you properly internalize the cost of the risk that AI systems are creating, then there's going to be a strong incentive for less risky systems and more safe systems. And that's also going to push things in a more tool and safe direction. If you create the right liability structures, you will also disincentivize agents in the sense that if you have agents running around and screwing things up, either you have or don't have an answer to the question of who's now responsible. If we create a society where the responsibility lands on AI agents themselves, which is to say sort of nowhere, I think that's going to be a major problem. If you have a system where responsibility ultimately has to land on a person, which I think is the right place, then what has to be true for that to happen? The person has to be able to understand what the agent is doing and control it in some way. Like you can't have a person responsible for an agent doing stuff that they can't control and don't have any leverage over. So I think having the responsibility land on people will mean that the systems have to be constructed so that the people actually can take responsibility for them. That will require the sorts of control structures that allow the people to have the AI agents do certain things and not do other things and set boundaries on them and have oversight over them in the right way and so on. So I think there will be those control systems are limitations on the AI agents, but they're the kind of limitations that we want and they're the kind of limitations that will naturally evolve if you put the sort of accountability and responsibility and liability in the right place. So I think there will be some of these problems, will work them out by the natural dynamics of the market and the legal system if you create the right structures to have the incentives where they should be. So that's my belief. That won't solve all the problems, but I think we will much less live in a world of crazy competitive out of control agents. If somebody's responsible for everything that those agents are doing, like that is going to quickly put a lid on. Like if that responsibility is actually enforced on the people or corporations making them, they're going to get a lid on what those agents are doing.
B
Yeah. And this also sort of brings us back to the, the ideal of having to, to actually specify what you want a system to do for you. So if I'm just, I just want to make money, right? And I tell the system, just please go online, make, make me as much money as possible as fast as possible possible. The system breaks the law. If I'm now the system hacks into some crypto wallet or scams some people or something. If I'm now responsible for that, that's a very different world than me just sort of saying, you know, I didn't instruct the AI system to break any laws, I just said make money and sort of. It should have known implicitly that I didn't want to break any laws.
A
Precisely. And you know, part of the unfortunate path we're on right now is exactly in that direction. We're creating agents that nobody is really in charge of and where it's unclear who is responsible for what they do. And that is going to absolutely lead us to having people put powerful agents out into the world telling them to make us money. Send it to this bitcoin address, go crazy. I've got no responsibility except for the money that comes in and how I'm going to spend it. This is not going to be a good world if we're supercharging it full of those things. So I think that again, we have human systems built for all of these things. What we don't want to do is create AI systems that break the things that are working about the institutions that we have and severing the link between action and responsibility for that action is one of the ways to break it. And that I think is just something we have to very much avoid doing.
B
This is also in some sense about being conservative in, in the non political sense of that word, where we want to keep building upon the institutions that we've, we've built up over hundreds of years for handling conflicts, handling responsibility, handling liability and so on. And we don't want to take the radical, we don't want to make the radical choice of simply saying now everything is going to be very different because we're going to h, hand off sort of control of civilization to, to, to the machines. And so we want a continuity of, of principles from where we are today and, and from the past and into the future. Or at least that's, that's what I
A
think that that's exactly right. And, and I think that that is what's underlying the tool mentality. Like we, we know how to deal with tools like we're, we're accustomed to those and we know how to deal with agents that are people. What, we don't, we don't, we don't have the structures for our agents or that are a new sort of being. And at some point we may choose as a society to like, yes, let's bring a new species into existence and figure out how to coexist with that. But I think we are very, very much unprepared to do that at the moment.
B
Then there's the issue of international collaboration. On this issue there's a lot of talk of it being sort of unrealistic for the US to collaborate with China, China on this. And people are worried that the US will fall behind if we set any limits at all on, on how quickly we're developing AI and in what direction we're developing AI. Do you think, do you think collaboration is, is possible? What evidence do we have?
A
Yeah, so I, I think there's a few things on that front. One is that again, the, the idea here is not to like stop AI development but to redirect it. So I think you have to ask in a geopolitical competition, economic, military, et cetera, what is actually going to advantage my side, if you will. It's pretty clear to me that if you build powerful AI systems that you can't control and that undermine your society, you haven't actually given yourself much of an advantage. If the choice were between that and building controllable, empowering tools that allow you to advance your technology, be folded into your military in a reliable way, et cetera, that clearly seems like a winner. So there's a question of even what sort of race do you want to be winning purely from the racing standpoint? And right now the ideology is that the way that you win is by building AGI first and superintelligence first. And I think this is a bad plan. Not only bad because of the risk to humanity of the catastrophe and loss of control and so on, but just it isn't a winning plan for your side because if you're sort of undermining your society and then replacing your workers with AI systems and then losing control of the superintelligence you built, you're not winning. So I think the core to finding a way of cooperating, or at least not racing in a civilizationally destructive way, is recognizing what sorts of races we actually want to be having with each other. We don't want to be on a suicide race with each other. The US and China can have a suicide race. We kind of had one with the Soviet Union that we managed to not get all the way to suicide, but we ran it for quite a while. That was not a good idea. It was not good that we had 60,000 nuclear warheads in the world. We managed to survive in this timeline, but like many others, we probably didn't. We don't want to run a suicide race between the US and China. We want to run some other race. So what is the other race that we can, that we can run? We can run an economic race of the type that we are now, or a civilizational race. Like is it better to have like centralized control of industry, or is it better to have decentralized control of industry? Like we can duke it out on the economic battlefield. Like that's a good race for us to, to run. So I think it's about recognizing where is their mutual interest and where is their competitive interest. And there's mutual interest in not having runaway superintelligence systems that we can't control. There's mutual interest in not having huge proliferation of highly destabilizing, highly capable AI systems that the wrong people can use. So everybody having access to cutting edge hacking software, probably not great if everybody has nation state level hacking capability, that doesn't benefit The US it doesn't benefit China, it doesn't benefit basically anybody. So there are pieces of mutual interest that I think the US and China can build on. Where they really are, their interests really are aligned, and then there are pieces where their interests are not aligned and we have to negotiate those. But I think there is a pathway to finding international collaboration where there is mutual interest, but we're not going to find it if we don't look for it. And right now the dynamic is not even to look for it.
B
What do you think leadership in China and in the US stand on these sort of questions? Is it common knowledge what you just described, or do they actually believe in your opinion, that raising to superintelligence will make them enormously powerful?
A
Yeah, it's hard to know exactly. I think probably leadership in both countries doesn't fully take superintelligence seriously still. I think they imagine that AI is going to be a powerful tool, which is what we want it to be, and so they want to race for that really powerful tool. And they're not really getting the loss of control risk in particular. So I think that's where it is now. I think that is changing a little bit in the US like just now we've seen that the administration is suddenly realizing that AI systems are. I think they haven't realized the loss of control risk, but they've realized that the systems are powerful enough that they're a national security issue and that we can't just let industry kind of race along as it is. My guess is that the way that they're thinking now will still fold this into, well, we've got to have the super duper cyber attack system so that we can use it on our adversaries and use it for defense against them. This is probably what they're going to do. And that's fairly understandable. I think if the administration were to understand the risk of loss of control of very powerful AI systems in the same way they suddenly understand cyber risk, then we're in a different regime. So I think there's that. I think on the. I'm not an expert on China, so this is all sort of secondhand, but the sense that I get from the people who have better insight into what's happening there, they're certainly putting more effort into regulation and worrying about the large scale effects of AI on their society than the US is. They've done similarly with social media. Right. They're, they're not shy about having like pretty stringent crackdowns on the way that social Media companies and other tech companies can operate in their society if they feel like it's having a negative effect. I think the, they may be more sensitized to loss of control risk because, you know, the US is a fairly, you know, the, the US government, you know, feels like it's control of it, in control of itself, but it doesn't necessarily feel like it's control of the world or of society. I think there's more of a sense in China that the Chinese government, part of its goal is to be in charge of how society as a whole runs. That's part of its ambit, more so than in the US or in the West. And so I think they may be more sensitized to the idea of control loss and feel more of a threat to from AI systems that sort of threaten centralized control than we have in the us. Like if a corporation gets really big and powerful in the us, the US government has tended to be like, go company. And then China the same up to a point. And when I think in the past companies have become so big that they sort of get a little uppity and start to compete with the Chinese state, then suddenly they realize they need to study calligraphy for a few years as the CEO. So I think there are different attitudes toward that. I see no indication anywhere from anyone that China is actually driving the race toward AGI and superintelligence. I think basically anybody who is honest and looks at the different sides and what they're doing will agree that it's the US that is driving that race. China may be racing on other things. They're certainly racing on AI tools and AI adoption and diffusion. They're racing on AI and military technologies. Those they are absolutely racing on. But I don't think they are leading the race on AGI and superintelligence. I think they're trying to catch up and sort of stay in rather than pushing it.
B
What's the most important thing you've changed your mind about in the AI space in recent years?
A
Interesting question. I think probably on alignment. I think I used to very much buy the picture that the road to AI safety was in creating AI systems that were safe. That you'd create this powerful system and then you'd find a solution to the technical problem of how to make it safe. I just don't think that's the right road anymore. I don't think alignment is the right goal. I think control is the right goal. To make it a tool, like a tool is a controllable thing. So I think of Controllable as equals, sort of tooliness. So I think control is the right picture, meaningful human control, and that safety is found through a variety of mechanisms, as we described earlier. Rather than making the AI the sort of being that decides the right thing to do in the circumstances, I want the deciding the right thing to be a human thing. I want us to be deciding what is the right thing to do, not the AI systems. And so I think I've fundamentally sort of changed the way I view what the goal of AI development is and what safety looks like away from alignment and toward these other directions.
B
Yeah. So alignment as a sort of broad concept, alignment with human values, that seems quite, quite elusive and difficult to pin down. Do you also think that the sort of narrow concept of alignment where you're just saying, okay, we want the system to act according to these principles and you can slot in whatever principles you want there in the sort of technical sense of having a system that does what you want it to do, is that also problematic?
A
Yeah, I mean, I think, as you say, there's lots of different versions of alignment, and they run from AI doing what you want it to do to AI not doing what you want it to do. If what you want it to do is troubling to the AI for some reason and everything in between. So insofar as alignment is a word for having the AI system be constructed from the ground up to be under meaningful human control, then I like it. And that is one version of alignment. And that is what some people mean by alignment. That it is the AI system does what you want it to do reliably and doesn't do what you don't want it to do as the user. So I still support that picture, but I would more call it a control architecture than alignment. What I'm less enthused about is the idea of you make the AI system a sort of moral human being by giving it the right sort of morality and the right set of rules, and then it's. It's off as a sovereign agent operating in the world, but it's doing good stuff, you know, and we're happy that that sort of being is out in the world. So I, I just don't think that's, you know, I don't think that's a terrible goal. Like, I understand why people are attracted to that. And if we end up. If we, you know, we end up with uncontrollable, powerful sovereign AI systems running out in the world, I'm much happier if they're doing the right thing than if they're like terribly immoral and destructive. So there is a place for that. But I don't think it's not the goal that I adhere to at this point having sort of sovereign aligned AI systems that are doing the right thing and they can make the world a good place by doing all the stuff that humans can't get their act together enough to do.
B
It seems to me that we've recently made some progress in the direction of what you just described with constitutional AI. Anthropic in particular, is able to feed their AI a document of English text with general moral principles that it's supposed to adhere to. It seems that this technique works pretty well, at least for what they wanted to do. Is that not progress in the direction of alignment?
A
Well, I mean, I think it's progress in the capability of the AI systems to understand complicated like frameworks and know what they mean and act in accordance with them. I don't think that. I mean, the fundamental technology is still the same. It's like, let's make this big instruction manual and then hit the AI with a stick when it follows, when it does a good job of interpreting the instructions and give it a carrot when it doesn't.
B
It's just quite surprising that you can do the pretty simple thing of just writing text in English about moral principles and how to, how to weigh those principles. And the AI does seem to flow in the direction of acting on those principles.
A
Yeah. So I think it's good for the sorts of things that as long as the AI system isn't actually that powerful or actually that dangerous, I think this is fine and great. If the risk is really the AI system giving bad advice or, or more prosaic things if you work in the right incentives, I think this is relatively okay. It's that I don't trust that this is actually going to carry over to systems that are actually dangerous or actually powerful. And I also, at some level. So that's one thing. And I think the other part is that it still does lean into the direction that I don't favor, which is thinking of these systems as sort of independent beings that hold an ethical system within themselves. Like I want the ethical system to be held in the humans that are used in AI systems. And you know, I think this is something I have changed my mind on at some level. Like I. Well, I guess I've always sort of believed that if you're using an AI system and you want to use it to write like a racist screed or some like politically objectionable thing you should be able to do that. The responsibility for that should be on you. I do believe that you. So what I believe is that you should be able to use it to do that, but you should still hold the responsibility for that. I think what we don't want are mass producing zillions of AI systems that are under somebody's direction, faking racist screeds. Because then we've just got many more racist screeds than we have people that are able to take responsibility for them. So I like. And the fine. You know, it's a fine line between these. Right. If you. You want something that is a tool that will do what you want. But how do you make sure that the responsibility for the use of that tool still actually lands on the person using it? I think that is the fundamental question. I think alignment in some sense offloads the moral responsibility to the AI system and the AI company that's building it. And I understand very much the temptation to do that, like, because people are not always responsible and we can't easily be held responsible sometimes. So I think it is good that we have some guardrails. And I understand the desire to sort of put those ethical principles in the AI systems. I just don't fundamentally think that that's a human empowering or human flourishing world that you're creating that way.
B
What makes you optimistic about the next decade in AI?
A
That's not an easy question to answer because I'm pretty pessimistic about the path we're on. But I guess my optimism stems from the, as I said earlier, that all of the different parts of humanity that are now entering the conversation. So I think I'm very deeply pessimistic. If AI continues to be basically run by giant tech companies and giant tech company investors, with a little smattering of US national security mixed in like that is going to be a not good mix for humanity. To the extent that I'm optimistic, it's because I see regular people, I see political groups, I see religious groups, I see labor groups, I see academics coming off the sidelines and starting to actually be part of the conversation. And I think insofar as those groups can find their mutual interest as humans and channel that mutual interest into putting direction and incentives and constraints on the way that AI is developed, then I think we might end up somewhere truly beneficial to everybody. And the technology is in principle amazing. Like we could get amazing things out of AI that we really have not been able to do before without it. And I'm very excited for Those, but I just don't think that we're going to get there if we keep on the path that we're on.
B
As a final question here, what can listeners do if they want to be engaged on this issue? How can they act on this?
A
So, fortunately, we have a whole thing on BetterPath 4, AI, which is organized by type of person. So things for, like, general citizens who can get involved in various ways, including politically. You can write to your lawmaker saying you want, you want to say you want to regulate AI. You can learn more about what the situation is. You can learn more about how you can keep your, like, until there are regulations, how you can keep your kids protected and safe from the situation that we're currently putting them in, and so on. So there's stuff for students, there's stuff for policymakers, for creators, for technologists, like, try to push your company to build this sort of thing and not that sort of thing. Organize as employees. If you feel like the company, what the company is doing is wrong, if you see something that is very wrong, whistleblowing, et cetera. So we've got that laid out. And I would just, I would encourage everyone to do something because I think the, you know, this AI is for better or worse, coming for all of us. Like, this is the central thing that we're putting, you know, weirdly, we've decided somehow that we're putting more money and more intellectual firepower into developing AI than any other thing in human history. And it is going to come into pretty much everything that we're doing unless something very dramatically changes. And so it is everybody's problem. And all of the things that you're concerned about as a citizen, jobs and the economy and your kids and national security, AI is in all of these things. So it is your problem now, whether you like it or not. And so I do urge everyone to step up and do what you can and look for those opportunities to push AI and its development in a direction that they like.
B
Thanks for chatting with me, Anthony.
A
It's been a pleasure. Thanks again.
Date: May 11, 2026
Host: Future of Life Institute
Guest: Anthony Aguirre (CEO, Future of Life Institute)
This episode dives deep into the critical distinction between building AI as empowering tools versus developing AI with the aim of replacing humans. Anthony Aguirre, physicist and CEO of the Future of Life Institute, articulates why the current incentives and race dynamics in AI development risk replacing human roles—ultimately leading to scenarios where humanity cedes control to machine systems. Drawing on themes from his essay series "A Better Path for AI," Aguirre discusses the various "races" underway in the AI world, the economic and existential stakes, why tool-based AI is both safer and more desirable, and what concrete steps individuals and societies can take to nudge AI progress in a pro-human direction.
(Timestamp: 01:14 – 26:56)
Aguirre introduces four overlapping "races" in AI development, each driven by market and geopolitical incentives:
Race for Attention
Race for Attachment
Race for Automation/Worker Replacement
Race to (Super)Intelligence
(Timestamp: 26:56 – 29:17)
(Timestamp: 29:17 – 36:17)
Aguirre argues the "AI replacement" trajectory is deeply shaped by current economic and investor incentives, not by technological determinism.
Historical analogies: eugenics or human super-soldiers could provide advantage, but were socially and ethically rejected—indicating technologies are subject to cultural and political control.
"Just because something gives a potential technological advantage or an economic advantage doesn't mean that it has to happen." (29:42)
New waves of public awareness and pushback—from policymakers, unions, and the general public—are an opportunity to redirect, not simply slow down, AI.
(Timestamp: 37:05 – 53:30)
Specialized, Purpose-Driven Systems:
Examples:
"You want systems that tell the truth…and when it doesn’t know something, says 'I don’t know.' Scientists really want to hear if you don’t know." (40:30)
Limit the scope and capability of each AI to prevent misuse (e.g., an educational AI shouldn’t know how to execute a cyberattack).
Move away from making general AIs with safety limitations (“hit with a stick”), towards:
"Right now, the way safety is done is through alignment training and refusals…This is very fragile. The ability to have safety is dramatically enhanced if we can narrow the scope." (49:50)
(Timestamp: 53:30 – 61:46)
(Timestamp: 63:24 – 70:37)
(Timestamp: 70:37 – 77:28)
Tool AIs will not outcompete unconstrained agentic AIs if the latter are left unchecked.
Key intervention: enforce responsibility and legal liability for agentic AI systems; align incentives for reliability and safety.
"If you have agents running around and screwing things up…responsibility ultimately has to land on a person." (71:04)
Without such incentives and enforcement, risk of a world "supercharged" with non-accountable disruptive AI agents grows.
(Timestamp: 76:15 – 76:55)
(Timestamp: 77:28 – 85:28)
On AI Alignment:
On Being Proactive as a Citizen:
On Replacement vs. Empowerment:
"If people don't recognize that that is the goal and that that is where we're going, it will feel like, 'Oh, I'm getting empowered…Oh, and then I'm replaced.'" —Anthony Aguirre (11:06)
On AI as Uncontrollable Superintelligence:
"The idea that you're going to control that and meaningfully know what it's doing...is questionable at best." —Anthony Aguirre (18:58)
On Cultural Choice:
"We have for a long time had cultural decisions that we make, ethical decisions, moral decisions, feelings about what it means to be human...that we very much enforce on our technologies." —Anthony Aguirre (29:42)
On Modular, Specialized Tool AIs:
"If you're a scientist…you want a, just a very different flavor of AI systems, some of them particularly narrow...that act like a scientist or a scientist assistant, rather than a general purpose AI that does everything." (40:30)
On the Importance of Responsibility:
"If you have agents running around and screwing things up…responsibility ultimately has to land on a person." (71:04)
On Public Engagement:
"It is everybody's problem. And all of the things that you're concerned about as a citizen—jobs and the economy and your kids and national security—AI is in all of these things." (94:24)
Aguirre’s core message is that our choice isn’t between AI progress and AI pause, but between AI that empowers versus AI that replaces and even subjugates. Powerful vested interests are locked into a race for replacement and superintelligence, but history and current events suggest societies can and do choose to redirect technological trajectories. There is still agency—provided citizens, technologists, and policymakers act together to support tool-based, pro-human AI, grounded in responsibility, modularity, and societal values.
For more, visit BetterPath4AI.org to learn about action steps tailored to your role.
End of Summary