Transcript
Max Holmes (0:00)
Humans are very smart. We're sort of the super intelligence of the natural world. Like, certainly compared to plants or bacteria or whatever. This has resulted in a pretty amazing transformation of the planet. We're moving into a potentially a world where we're no longer the smartest thing. When you have something that is significantly smarter than humans, it might start to reshape the environment towards its goals. And as a result, it has the potential to drive humans to extinction. We're going to build this powerful machine and then we'll use the powerful machine to align it, right? That's, like, very scary. I would much rather we, like, not build it. Slow down, like, take a breath. This is extremely dangerous. And anybody who's pursuing this project should be aware that they are, like, threatening every child, man, woman, animal. And I don't recommend it, but I'm like, but maybe there's also this sense of hope in me. AI is not a normal technology. The standard story is that we try to make an airplane, right? Maybe takes off, but then crashes shortly thereafter. And you go back to the drawing board and you say, okay, like, what happened with AI? Especially with building a super intelligent machine that has the potential to wipe everyone out. If you do make a mistake, it could be catastrophic, but once it's killed everyone, there's no ability to go back to the drawing board.
Rob Wiblin (1:22)
Max Holmes is an alignment researcher at the Machine Intelligence Research Institute, where since 2017 he has worked on the problem of aligning an artificial intelligence and keeping it steerable. His main research agenda is corrigibility, an approach that prioritizes making AIs, I guess, robustly rule following or instruction following, and willingly modifiable to the exclusion, basically of all other goals. He's also a science fiction author, having written the Crystal Society trilogy and the recently released Red Heart, which imagines an AGI being developed in a secret Chinese government project. Thanks so much for coming on the podcast, Max.
Max Holmes (1:54)
Yeah, it's an honor to be here. Hey, regular listeners.
Rob Wiblin (2:00)
We don't normally do these anymore, but I had two things to say that I felt I couldn't in good conscience skip over. The first is that if you're an AGI aficionado, you may feel like you've already heard enough about the book. If anyone builds it, everyone dies. It did have a pretty big launch last year. If so, I would at least urge you not to miss the second big block of this conversation about corrigibility as the thing that we should be building into our AIs. Max suspects that the currently overwhelmingly dominant approach of giving AI models good moral values that they really want to stick to. He thinks that that's potentially a huge wrong turn and we need to be doing almost the exact opposite and trying to give them no values whatsoever. It's a theory, a controversial theory that
