Anthony Aguirre (27:31)
Yeah, I don't think we want to assume that or anything else about what it's going to do, because we don't know part of the issue here is that AI systems aren't predictable. We don't understand fundamentally how they're working, and we certainly can't predict what they're going to do. And this is a very crucial point because when people think about AI systems as computer programs, they think, well, there's a bunch of lines of code that are all instructions that tell the AI system what to do. And this is not how an AI system works. It's much more something that has grown, you know, under this training system than something that is programmed step by step. And that's the whole point of AI. It can do things that we don't know how to tell it to do it, and it can still succeed in doing it. And what this means is that you can get a system that is extraordinarily effective, that you don't know how it does what it's doing, and you certainly don't know what it's going to do in some new circumstance. And this is a crucial thing to keep in mind that normally with the technology, if we build something that is much more powerful, like a faster, better jet fighter, that's because we got better at making, we understood better, the avionics and the electronics and the computers and all those things to make a better jet fighter. And so we're able to make one. With these new AI systems, they're getting more powerful without us actually understanding any better how they're operating. We're just using more computation and like hitting them with a stick and giving them more carrots for longer, and they're getting better at what they do. And so we don't have the sort of science in understanding why they do a particular thing, what makes them safe or unsafe, what they're going to do in a particular new circumstance. And so we're very much in the regime of building something more like a biological being that we can build it, we can grow it, we can see it going out into the world, but we can't really predict what it's going to do. And so I think we should be humble about that. There are, on the other hand, reasons to think generally about what biological or other systems will do. Like, we know that biological systems have to eat, so we know they're going to be hungry and they're going to seek food. We know that they want to survive because they want to reproduce. And so they're going to protect themselves. They're going to run away from things that are dangerous. They're going to lash out with claws if something is threatening them. And with AI systems The consideration situations are a bit different, but you can think about general properties that they must have just because they are systems that pursue goals. So for example, suppose you're an AI system and you're, suppose you're a robot. This is a famous example that Stuart Russell likes to give. And your job is to fetch coffee. That's all you do, you're a coffee fetching robot. Now if someone goes to turn you off when you're on your way back with the coffee, then you failed in your, in your task if they turn you off, right? So as a coffee fetching robot, you develop a self preservation requirement. Because if you're, if all you want to do is like be an awesome, awesome coffee fetching robot, that means that you have to survive and you have to stay on to fetch the coffee. Just in the same way that a bird has to survive in order to have children and have the next generation of birds. So they have a survival instinct. A machine system will, will acquire goals like self preservation, like accruing more power, like having secrecy, if it needs secrecy. So they're all kind of like developing new capabilities, like making itself more capable. It will acquire new goals just as sort of inevitable byproducts of whatever goal that it has. And, and this is something that people realized a long time ago, they're called instrumental objectives. And, and when you think about it, it sort of has to be true. Like if you're just good at making money, you also have to be good at a bunch of other things in order to be good at making money. And we see this in, you know, companies that seek to make a lot of money have to do a lot of other stuff. They have to have a public relations department and they have to have a controller and they have to like lobby Congress sometimes and they have to have really good organization and lots of employees that are happy. Like there are lots of things that they have to do. These are sub goals of making lots of money. And in just the same way, any like very capable system that is an AI or otherwise will have to have lots of other things that it does in order to achieve its goals. And so the real danger here is we don't know how they work. We can't predict what they do. We can't really understand in a mechanistic way so that we constrain them from doing some things and allow them to do others. And so if we give them some goal, if you just have, here's my super duper powered AI system, awesome, it can do like so many things. Really, really well. I know what I wanted to do. I wanted to go make me some money. So I'm going to send my AI system that's super powered out into the world to make me some money. Obviously, people are going to do this, like as soon as they exist, people are going to send them out into the world to make them money. And so those systems, just like corporations and just like people, are going to do all sorts of things to make money, some of them legal, some of them not legal, if they can get away with it. All like they, they will do what it takes to make money if that is their goal. And so there are all kinds of things like acquiring power, acquiring influence, manipulating people, lying to people that are things that we don't like. We don't want AI systems or people or companies to do them, but they are incentivized by any goal that they have. And so this is a long story to say that if we have something that is extraordinarily powerful, so imagine something that isn't as powerful as Google, but is, you know, and isn't as powerful as the United States government, but is more so at it, whatever goals it has, they're going to have a bunch of side effects that are going to be negative for humanity. And the only way to avoid that is to, you know, not have that system. To have that system somehow very closely under control so that we can stop it when it's doing something we don't like, or to have it very, very deeply aligned with humanity so that although it can do all those negative things, it will choose not to do some because it really, really loves us. And now the problem with, so that's three options. Don't build it or build it and. And like, really control it or build it and don't control it, but have it be really closely aligned with us. The problem is only one of those things we do actually know how to do. We know how to not build them. We have no idea how to control them when they're this powerful. And we have no idea how to align them to our values if we don't control them and they're that powerful. So this is the fundamental conundrum. We've got sort of those three choices and only one of them is really open to us, and the other two are very dangerous. And that's the road we're on.