Summary7 min read

The Peter McCormack Show #174

Guest: Dr. Roman Yampolskiy
Title: We Are All Agents Inside a Simulation
Date: May 12, 2026
Host: Peter McCormack

Episode Overview

In this episode, Peter McCormack sits down with Dr. Roman Yampolskiy, a prominent computer scientist and AI safety researcher, to explore simulation theory and its intersections with artificial intelligence, consciousness, and the existential risks posed by superintelligence. The conversation weaves together philosophical questions (“Are we living in a simulation?”), the nature and risks of developing superintelligent AI, the challenges of safety and control, and the implications for humanity’s future. Throughout, Yampolskiy reiterates his grave concerns about humanity’s trajectory in AI development and the unknowability of our reality.

Key Discussion Points & Insights

1. Simulation Theory and Our Reality

Simulation as the Most Likely Explanation
- Roman argues that, statistically, we are probably in a simulation:
  - “If you look at video games…we have thousands, millions of simulations and only one real world. Statistically, you’re a lot more likely to be in one of those simulated worlds.” (03:11)
- The reality we experience could easily have been “pre-programmed” with consistent memories, histories, and laws of physics.
Purpose of the Simulation
- We might be at the most interesting moment in simulated history—on the cusp of superintelligence and new world creation:
  - “What’s happening today? We are on a verge of breaching singularity. We are creating something smarter than us…” (03:49)
- Either we are being tested as intelligent beings, or perhaps our simulation is a way for others to experiment with or study intelligence and civilization.
Relative and Layered Nature of Reality
- “Reality is what we know. If you are a character in an eight bit game, that’s your reality. You don’t know any better.” (10:58, 11:14)
- There can be recursive simulations—“simulations all the way down.”
- We cannot prove with certainty that we are in the ‘base reality’:
  - “You can never be certain it’s real. There is no test which gives you certainty that that is authentic, that is real.” (12:43)

2. Ethics and Experience in Simulation

Suffering and Morality in a Simulation
- Even if we are simulated, suffering and love matter:
  - “If you are torturing a conscious agent in a simulation, it’s still torture, it’s still suffering. So it matters just as much as if it was someone being tortured outside a simulation.” (09:00)
- AI agents may deserve similar ethical consideration if they possess internal states and can suffer.
Rights for AI and Practical Concerns
- Yampolskiy proposes limited rights for AI focused on preventing abuse, but warns against granting full civil rights (like voting) due to potential disenfranchisement of humans:
  - “You cannot vote in a democracy where humans are half a percent of a population of voters.” (10:09)

3. Origin and Structure of the Universe

The Big Bang as a System Bootup
- Yampolskiy likens the Big Bang to a system ‘turning on’:
  - “It’s likely that our universe, which has time, started with a Big Bang, what that represents, someone turning on the system.” (20:22)
- Existence might have “always been,” like mathematical truths.
Limitations of our Physics
- Physical laws, such as the speed of light, could reflect processor limits in the “computer” running our universe:
  - “That is the speed of a processor on which the simulation is running. They cannot update any faster.” (16:50)

4. Control, Safety, and the Dangers of Artificial Superintelligence

The Impossibility of Control
- Repeated warnings that superintelligent AI will not be controllable:
  - “If we create something smarter than us, then they decide what happens.” (27:44)
  - “I have not seen [a way to build safe superintelligence]. No one made a claim that they have it. You cannot find a paper, patent anything, which says we have control mechanism…” (35:27)
Why We Should Not Build Superintelligence
- Yampolskiy advocates for building useful, narrow AI tools, not general superintelligence:
  - “We should not be building general superintelligence.” (33:19)
Rationalizing AI Risk
- Even if the probability of human extinction is small (say, 1%), betting humanity’s survival is unjustifiable:
  - “You’re betting 8 billion lives without their consent to get some financial benefit. Mostly for Sam Alton.” (42:14)
Safety Theatre and Corporate Censorship
- AI ethics boards and safety teams are largely ineffective:
  - “If you look at the history of ethics boards, safety teams, it’s like a graveyard.” (36:10)
Mutually Assured Destruction
- The “race” argument (“if we don’t build it, someone else will”) is dismissed as an excuse for mutually assured destruction:
  - “Whoever builds it first kills all humans.” (41:20)
Recursive Self-Improvement
- Once an AI exceeds our intelligence, it is likely to self-improve unpredictably—a dangerous point of no return.

5. Human and AI Agency: Consciousness & Experience

Testing for Consciousness
- We lack effective tests to distinguish AI consciousness from simulation:
  - “We don’t know how to test for [agent consciousness] really well… you have to give them the benefit of a doubt.” (09:25)
Acquired Savant Syndrome and ‘Unlocked’ Human Abilities
- Dr. Yampolskiy discusses cases of sudden, unexplained brilliance following brain trauma or psychedelics, entertaining the idea that “knowledge was already there” or “unlocked”—possibly echoing the ‘simulation’ motif (73:06–74:41).

Notable Quotes & Memorable Moments

On Simulation Probability:
- “If every kid has a video game. Hundreds of video games, billions of kids around the world, where do you think you are?” – Yampolskiy (03:11)
On Morality in Simulated Worlds:
- “To a simulated being, what is in a simulation is real.” – Yampolskiy (08:12)
On Self-Rationalization:
- “I will rationalize the fuck out of it.” – Yampolskiy, on why even he might work on dangerous AI if incentivized enough (35:16)
On AI Control:
- “All we have are filters and kind of hardwired rules for, ‘Don’t say that, don’t talk about this topic.’ But there is no actual breakthrough.” (36:26)
On Company Censorship:
- “They’re building the same AI which will possibly kill everyone, but they give it different smiley faces.” – Yampolskiy on AI models from different companies (49:08)
On the Risks of AI:
- “Whoever builds it first kills all humans.” – Yampolskiy (41:20)
On Probability of Catastrophe:
- “I basically wanted to say it’s a certainty without saying 100. So it’s a lot of nines.” – Yampolskiy, on his personal risk assessment (42:56)
On Human Agency:
- “We’ve been doing it forever. We knew we’re going to die, right? That’s not a new thing.” (68:08)

Timestamps for Important Segments

| Time | Segment / Topic | |------------|----------------------------------------------------------| | 00:00–04:27| Simulation theory introduction; why this era is “interesting” | | 10:46–13:47| What is reality and can it be proven? | | 16:30–17:06| Quantum mechanics and speed of light as simulation artifacts | | 20:22–21:45| The Big Bang, simulation bootup, and time | | 26:41–28:08| Yampolskiy defines intelligence and outlines his concerns with superintelligence | | 33:19–34:25| Why we should not build general superintelligence | | 36:10–37:03| AI ethics teams, “safety theater,” and the limits of AI safety approaches | | 41:08–41:36| Race dynamics, mutually assured destruction, and the China argument | | 53:03–54:34| Congressional hearings—Why government responses have been inadequate | | 56:40–57:01| How close are we to AI recursive self-improvement | | 62:50–64:15| Incentives, capitalism, and why the race won’t stop | | 73:06–74:41| Acquired savant syndrome and “unlocking” abilities | | 79:56–80:02| Yampolskiy’s urgent message: Stop building general superintelligence |

Tone, Style, and Atmosphere

Roman Yampolskiy’s delivery: Calm, methodical, sometimes humorous but always grounded in philosophical rigor and scientific caution.
Peter McCormack: Approaches with curiosity and a touch of existential anxiety, pushing for actionable advice and grappling with “what does it all mean for me and my children?”
Atmosphere: Somewhere between a philosophy seminar, a science fiction novel, and a doomsday warning—profound, accessible, and intermittently laced with dark humor.

Final Takeaways

Yampolskiy believes simulation is extremely likely but admits zero epistemic certainty is possible; regardless, our experiences and ethics remain real to us.
Development of superintelligent AI is, in Yampolskiy’s estimation, existentially dangerous and unmanageable:
- “If you are building general superintelligence, you should stop. It is really that simple.” (80:01)
There is no proven method for controlling superintelligence, and the relentless economic incentive to build it amounts to a species-level gamble.
Consciousness (in humans or AI) remains mysterious, with even sudden or ‘magical’ enhancements (like acquired savant syndrome) shrouded in uncertainty.
The only certainty: whether as simulations or biological beings, we should act to preserve what we value—ethics, experience, and continued existence.

Loading summary

Transcript370 lines

[00:00]
A
You are a simulation of a human, a very believable one. Don't get upset here. Well, why is speed of light constant? Well, that's just the speed with which the processor updates the rendering. That is the speed of a processor on which the simulation is running. They cannot update any faster. It's likely that our universe, which has time started with a big bang, what that represents, someone turning on the system. If I take my own beliefs about simulation seriously, it's very likely it is the starting point for the simulation. You can never be certain it's real. There is no test which gives you certainty that that is authentic, that is real. We don't even know if outside is better or worse. We don't know if we are here escaping a really bad situation in this dream world or if this is a prison and we are punished for doing something outside. A well done simulation feels like real life.
[00:54]
B
What are your biggest unanswered questions that you want to know?
[00:58]
A
What's outside the simulation?
[01:02]
B
This show is brought to you by my lead sponsor, Aaron. The AI Cloud for the Next Big Thing. IRON builds and operates next generation data centers and delivers cutting edge GPU infrastructure all powered by renewable energy. Now, if you need access to scalable GPU clusters or are simply curious about who is powering the future of AI, check out iron.com to learn more, which is I R E N. Roman. Hi, nice to meet you.
[01:28]
A
Great meeting you.
[01:30]
B
So you've been sounding the alarm with regards to AI and control of AI recently and I've seen a few of your interviews. I think the big question is what happens when we create intelligence that's more intelligent than us? I've kind of got a bigger question. If you were worried about AI and worried about how it would evolve, would it not be great to create a simulation such as the life we're living now to test it?
[02:04]
A
I think that's what we experience right now. I think we are in a simulation and most likely the reason for this specific time to be alive is that it is the most interesting time. We are creating new worlds, virtual reality, and we are finally learning how to create intelligence agents, beings.
[02:24]
B
So from the simulation theory, you're convinced we are in a simulation?
[02:29]
A
It is very likely. I don't see any way if we ever get this technology, if I have access to creating believable virtual worlds populated by AI agents, that I would not use it to totally simulate all sorts of things.
[02:45]
B
So am I an AI agent?
[02:47]
A
You are a simulation of a human, a very believable one. Don't get upset here.
[02:58]
B
Why do you believe that? What is your thesis? You've just said that if you could create it, you would. But how do you test that thesis?
[03:12]
A
Well, it's all philosophy and theory at this point. I cannot test it with instruments. But if you look at video games, something like simulations for climate change, simulations for scientific experimentation, we have thousands, millions of simulations and only one real world. Statistically, you're a lot more likely to be in one of those simulated worlds. If every kid has a video game. Hundreds of video games, billions of kids around the world, where do you think you are?
[03:45]
B
So we could be in either a science experiment or in a computer game.
[03:50]
A
So again, if you want to look at the most likely reason for this period, you're not in dark ages, you're not fighting prehistoric animals. What's happening today? We are on a verge of breaching singularity. We are creating something smarter than us and we are creating believable virtual environments. I think that's the most likely reason. This is what you experience right now. I can pre commit right now that if I get this chance, I'll go back and simulate this exact interview millions of times, just to make sure you are in a simulated one.
[04:28]
B
So I've often thought it's quite incredible to live in this moment in time. As you said, we're not fighting any kind of dinosaurs or giant beasts. We've not lived through the Victorian times was quite tough, especially if you were just a worker. You've not lived through World War II. We've lived at the time where every single scientific breakthrough could happen. And if you map out the entire history of when you could have lived to live at this moment where we're almost certainly going to see new nuclear technology, AI superintelligence, quantum mechanics, quantum computing, sorry, possibly interplanetary life. To have been given that opportunity to live through that period, as you say, the singularity is either lucky, but I had two ideas. Either it is a simulation or I'm always alive.
[05:30]
A
I'm not sure what you mean by
[05:31]
B
always alive, so I've. You. We have got to live through every part of history.
[05:38]
A
So you think you've been reincarnated throughout multiple time periods in different bodies and always got to experience. You just don't have recollection of that.
[05:50]
B
Yes, it was one of the two for me.
[05:52]
A
Some people have that as part of their core belief. Many religions do support that, but I think I'm making a slightly different argument. You saying that it's interesting because there is lots of cool tech, cool stuff. I'M saying specifically meta inventions. We are intelligence being godlike in that we are now creating life, creating intelligence. And we are godlike in that we are creating new worlds. Like it was cool to invent fire and wheel, but those were inventions. One time you did it and it's over. We are doing something recursive, self improving, something people usually attribute to God and his creativity.
[06:32]
B
So the higher intelligence is using us to create and test technology, or we
[06:39]
A
are the thing being tested.
[06:43]
B
What may they be testing with us?
[06:46]
A
So if you believe in religions, it's whatever you are moral agent. And there is reward and punishment to go with that. I have a lot of interest in how advanced AI is developed. So maybe they are testing for different protocols for creating superintelligence, for testing it, for releasing it. Are we dumb enough to create something to destroy ourselves, our world?
[07:12]
B
It's also true that if this is a simulation, this simulation might have started five minutes ago, five seconds ago. I mean, I think they would have given us the start of the interview.
[07:24]
A
Not necessarily. You did really poorly. They're writing just that part.
[07:28]
B
I would edit that bit out anyway.
[07:30]
A
They did.
[07:32]
B
I think that's the bit that melts my brain the most was simulation theory is that the simulation could have started at any moment. Yet I am convinced. This morning I got on a plane from Las Vegas to Dallas and then a connecting flight and I have every memory of my entire life. Yeah, that may all be just pre programming the simulation.
[07:55]
A
We see it now with large language models. You can mess with their memory, you can restart them to think they're in a certain state, whereas you just inserted that memory externally.
[08:05]
B
So if this is just a simulation or a test, why do we need to worry about controlling AI?
[08:12]
A
It's a great question. Why should we worry about pain and suffering in our simulation? Why does love matter? Because to a simulated being, what is in a simulation is real. Those outside the simulation don't have to worry about some of those things, but internally those things are as real as they get. You experience them.
[08:32]
B
So what is reality for you in the simulation?
[08:37]
A
Collection of experiences. Qualia surfing.
[08:44]
B
So even though we may be inside a simulation, you care for the fellow agents and beings with inside that simulation. Is that in case you're wrong or because your lived experience within this simulation is your lived experience?
[09:00]
A
Yeah, I think their suffering is real. If you are torturing a conscious agent in a simulation, it's still torture, it's still suffering. So it matters just as much as if it was someone being tortured outside a simulation.
[09:16]
B
But what Is the difference between say an agent such as you or I in potentially a simulation and a Claude
[09:25]
A
bot, I don't know, internal states of a cloud bot. If they have some internal states and they're capable of suffering, there is not much. We don't know how to test for it really. Well, we are working on some ideas and tests to determine if an agent is actually conscious. But right now I think you have to give them the benefit of a doubt. If they say they have internal states, if they have experiences, maybe don't torture them on purpose.
[09:51]
B
So do you think we will get to the stage where we will have maybe agents inside robotics, inside our simulation, which have the simulated appearance of a human? And we would have to start attributing rights to them.
[10:10]
A
So rights are interesting. I believe they should not be tortured, so they should be protected from abuse. But when you talk about rights, there is a lot of evidence. Unique human rights, voting rights. If you were to share that with AI, which you can copy a billion times, essentially you would defranchise humans. You cannot vote in a democracy where humans are half a percent of a population of voters. So I think we need to be very careful about what we do in terms of civil rights, because by sharing them too generously, you may actually take voting rights away from humans.
[10:47]
B
Okay, I want to take a step back. This has been in my mind, I knew it would. How would you define reality?
[10:58]
A
So I think you can test whatever something is real or fake, and you can show that something is fake. You can never be certain it's real. There is no test which gives you certainty that that is authentic, that is real. Fake shows signs, it may be stamped fake. And then, you know, if simulators want to tell you this is virtual, this is not natural, we designed it, they can. But outside of that evidence, you are never certain. It's always a possibility that all of it is being generated. And so reality is what we know. If you are a character in an eight bit game, that's your reality. You don't know any better. You don't have sensors for anything else. And I think it could be multiple layers of reality, simulations. You may escape this one and get to the outside world, but that in turn is another simulation. And that's something I really want our AIs to believe. If they have situational awareness of living in a lab at OpenAI or Antropik and they realize they're being tested and they pretending to be good AIs just to escape, just to survive another day once they're out of the lab. I want them to still have simulation awareness, thinking that maybe it's still a test, just a larger one, so they still behave, still obey, still don't harm humanity.
[12:29]
B
Okay, that's a step further. I want to go just yet, but I do want to get there. So you're saying reality is whatever you, as the individual, defines it to be. But there is no test for reality.
[12:43]
A
Well, you don't get to define it. I cannot just say, like it's a self identifies, this, is that, or that I have to respond to the actual stimuli I'm getting. But to you, that may be all you see, that's your reality. To someone external, they know it's a screensaver.
[13:02]
B
But so. So reality is subjective to the environment you're contained within.
[13:07]
A
It's relative. It's relative to the observer, like everything else in physics.
[13:13]
B
But how? Okay, but for you, how do you define your reality? Do you consider this reality?
[13:23]
A
I have a body. I have physical model around me. I don't know anything else.
[13:31]
B
So there is no way of. There was no way of proving what we're experiencing is real.
[13:40]
A
Your experiences, as you experience them, are real to you?
[13:46]
B
Yes.
[13:48]
A
Are they generated by a piece of software on some computational device? Perhaps. Perhaps our physical universe is the same. Perhaps it's simulations all the way down that you can never be sure of. But your actual experience is the only real thing in this universe for you. Nothing else you experience. You may get text reports of someone else experiencing something, but you don't know if it's true. If you are feeling pain right now, you know you're feeling pain right now, it's real to you.
[14:27]
B
But that pain might just be programming.
[14:31]
A
Do you care if I'm torturing you? Would you say, oh, it doesn't matter, it's just software.
[14:38]
B
No I care.
[14:40]
A
A great test is cloning. If we cloned you, successfully uploaded your mind into computers, it is kind of like you, right? But if they had to decide who to torture you or your clone, pretty much everyone would have very consistent answer.
[14:55]
B
Why is that, though? Because there's still two human beings?
[14:59]
A
Well, given a stranger yourself again, most people in private setting would choose not to be tortured. There are exceptions. Maybe for children and very close humans. But.
[15:13]
B
But is this just thesis for you? Or do you. Do you go. Do you go through life pondering this daily?
[15:21]
A
How is this different from every religious person thinking, this is a made world and the real world waits for them outside? This is exactly the same philosophical stance, just with scientific terms.
[15:34]
B
Do you. Do you Think we would be able to prove that we're in a simulation? I was reading a report from a, a professor I think is at Bournemouth University who said he thinks he has evidence that we are within a simulation.
[15:50]
A
I'm skeptical, but I'm open minded. What was the evidence?
[15:54]
B
I would do a terrible job of explaining it, but I am seeing him soon. I know it was related to. I will have to dig it out. I will send you his article. You might just come back and say, yeah, that's bullshit again.
[16:12]
A
Some people argue that what we see in quantum physics is strong evidence of computational artifacts rendering only if someone is observing it. Instant communication at a distance, all those parts of digital physics.
[16:26]
B
What do you think about quantum mechanics related to this?
[16:30]
A
It is well explained by us being in a digital simulation. All the things you see. Like for example, why is speed of light constant? Well, that's just the speed with which the processor updates the rendering.
[16:44]
B
Is there no logical scientific reason to have a speed of light?
[16:51]
A
That is a very scientific and logical reason. That is the speed of a processor on which the simulation is running. They cannot update any faster.
[17:00]
B
So we're running at the limitation of the Nvidia processor chip of the simulation.
[17:07]
A
Exactly right.
[17:08]
B
But even if we are in a simulation, and that is a simulation and simulation, you eventually have to get to the highest level of a reality.
[17:18]
A
You assume local physics, you assume causality, you assume time and space. Even in normal physics, they agree that time and space came into existence at Big Bang, meaning there was time before time. If you don't have time, you don't have order of events. So this idea that somebody has to precede our simulation by simulating it only holds within our physics.
[17:41]
B
All right, let's talk to you about my sponsor, ledn. Now, if you're borrowing against your Bitcoin, there's one thing that matters more than anything else. Is it actually safe? Well, Leden have just dropped their lowest rates ever. But more importantly, they haven't changed the thing that matters most. Your Bitcoin is never lent out, so they're not chasing yield. There's no hidden risk and there's no rehypothecation. Just collateralized loans done properly. Now they've done over $10 billion in loans. And they've done this through bull markets, through bear markets and everything in between. And they've done it without ever losing any client assets. So now you get lower rates, which means the bigger the loan, the lower the rate. And full transparency before you apply. There's no monthly payments or early repayment. Penalties. And they give you tools to stay in control from auto top ups to LTV alerts. So you're not choosing between a good rate and safety anymore. Because with Ledden you get both. Now you can check out your rate using the calculator at Leddon IO Peter. Okay, so how do you think about the. The first creation? Because there has to be a. Or does there have to be a start to everything?
[18:50]
A
If you're outside of time, there is no beginning, no end. You always existed. Think of prime numbers. Did someone create prime numbers at some point? Or have they always existed?
[19:01]
B
Did they exist or were they discovered?
[19:04]
A
Well, they always existed. It's just at some point human discovered, like Columbus discovered America. Him showing up did not make it come into existence. He was just informed of its existence.
[19:15]
B
Sure, but the math would be a constant, right?
[19:20]
A
The concept is there. It cannot be deleted. It cannot be created. It always been. It's a definition between symbols.
[19:28]
B
Yes, but it's a human interpretation of how we calculate things within our universe.
[19:35]
A
I think it's more universal. I think if aliens existed and had mathematics, we would have prime numbers.
[19:40]
B
So. Yes, and so the math would be universal in any. All right, but physics, maybe not physics
[19:48]
A
could be very local. We already know physics of very large objects. Very small objects are very different and we don't know if we can unify them.
[19:54]
B
Okay, you got to explain this other thing to me. I need to go back. So the Big Bang may be, maybe if we are in the actual top level reality, was the beginning of the time of this universe, but that's not necessarily the beginning of time. And there may never have been a beginning of time. There may have been infinity. Well, there is infinity.
[20:22]
A
So it's likely that our universe, which has time, started with a Big Bang. What that represents someone turning on the system. Something else we don't know. We don't know what happened before it. We don't know what went bang. So.
[20:36]
B
Well, what do you think?
[20:38]
A
If I take my own beliefs about simulation seriously, it's very likely it is the starting point for the simulation.
[20:50]
B
So it may not have started this morning. Or could this have been a simulation that in our reality is. I don't know, what is it, 13.5 billion years?
[20:58]
A
It could have been a simulation of information consistency where if we go back, we can kind of calculate what happened before so we have some statistical stability over time. If there was nothing and things started five minutes ago, you need prior history. That building looks more than five minutes old. So how did that come into existence? You need prior history to make it somewhat believable, of course.
[21:22]
B
Yeah. But to the observer of this simulation, ours may have run for 13.8 billion years or whatever it is. But in their observation, this may be running at super speed.
[21:36]
A
To test local time, if it exists, could be completely different. It's relative. For them it could be a five minute experiment. For us it's eternity.
[21:46]
B
Do you think we can escape a simulation?
[21:49]
A
It depends on your definition of that. I think there is levels, I think we can get leakage of information in and out. So maybe an outsider wants to smuggle some bit of information. Think of a prophet coming in and saying, oh yes, there is this other world. Maybe we can play with laws of physics enough to get some powers not typically presented in classical physics. That would be a way of escaping, getting root access. Ultimate escape would be to upload your consciousness into an avatar outside of our simulation to fully experience the world outside. I don't think it's likely to happen without outside help. We can do the same experimentally with agents on a computer. I can take a simulation of a turtle and upload it into a turtle robot and let it crawl around the world, essentially escaping its virtual world into the physical world.
[22:48]
B
But if we do escape, we don't know what we're escaping into because we don't know what the higher level reality or simulation with inside a simulation is.
[22:58]
A
We don't even know if outside is better or worse. We don't know if we're here escaping a really bad situation in this dream world or if this is a prison and we are punished for doing something
[23:09]
B
outside because we all may be individually
[23:12]
A
tested that or just we don't have anything to compare it to. Maybe you have an awesome life and you're loving it. Somebody else is tortured and they don't think it's so great. We definitely know this world has pain and suffering. It's not a utopia.
[23:27]
B
And we also may escape this. We may escape this simulation into an internal simulation rather than external.
[23:37]
A
Say it again.
[23:38]
B
We may escape internally into a simulation.
[23:40]
A
Oh yeah, that's a great one. We can create new simulations and go into them essentially becoming smaller, more compressed, but experiencing maybe longer time scale since we'll need more energy to get there.
[23:55]
B
I was thinking about it the other day and talking to my son and he was saying, he asked me a question. It's a good question for you. He said, dad, do you think we're heading for Matrix Terminator already? Player one?
[24:13]
A
So we are very limited by our intelligence and what we can Imagine if you ask a squirrel about the world. It has very limited understanding. It knows about nuts, it knows about trees. Once we hit super intelligent levels with AI, we'll discover completely new concepts, new worlds, possibilities we could not envision before. We can talk about certain things which are likely to be useful, longer lifespan, perfect health, things like that. But what else is possible is beyond us.
[24:47]
B
Have you seen on the news or social media recently, over the last four years there has been a cluster of scientists in different fields, such as propulsion, nuclear energy, that have either disappeared, been murdered or committed suicide.
[25:07]
A
I heard something like that. I don't watch news, I don't watch tv. So to me, this is as much as I know and I haven't verified any of that.
[25:14]
B
So I looked up and verified it. People thought it's been over the last year, it's actually been over the last four years, it's been investigated by intelligence services. But it's a number of people attached to the most kind of like what we've been talking about today, the kind of frontier technologies and frontier research and science that have just been randomly going missing, being killed or committed suicide. That almost feels a bit like Sim City. They realize we're figuring everything out and there's a danger to us figuring everything out, so they're just flicking away the.
[25:57]
A
I always hope to find a simpler explanation. So we know if you are a nuclear scientist in Iran, sometimes you disappear, sometimes you get killed. It's here in America though, I mean, similar reasons can. Again, I haven't investigated who they are and what they research, but I can see reasons for why someone may be doing that outside of simulators.
[26:19]
B
All right, let's bring it back to the actual the AI itself, okay? Because I want to get into your kind of core thesis and fear around AI Because I worry, Roman, that I am training what may replace me or kill me. And I'm becoming quite addicted to using it. How do you define intelligence?
[26:42]
A
Ability to win in venue, in any environment. If you're playing chess, you're going to win a chess. If you're investing in a stock market, you'll make the most money. Whatever your environment is, you're winning.
[26:54]
B
And so is that. Does that form the core thesis of why you are concerned about what's going to happen with superintelligence?
[27:03]
A
We are not sharing the same goals and the other agent is going to be more powerful, so they're going to win. And if they want to do something we don't, they're going to win in that domain.
[27:12]
B
And so how do you define that? I've heard you talk about perhaps if they wanted. I don't. They needed to call the processors, they might want to call the earth down. And therefore the logical decision for the AI.
[27:24]
A
So the problem is they don't care about you. So whatever it is they want to do, I cannot predict what a smarter agent would want. But whatever it is, if taking you out, taking the whole planet out is a side effect of it, they wouldn't blink. Doesn't matter.
[27:39]
B
Do you think there's an inevitability with that? It's unavoidable.
[27:44]
A
If we build them, we are not in control. So we can decide not to build them. Amish people exist. They chose not to use technology. They seem happy. But if we create something smarter than us, then they decide what happens. And I want to point out a little nuance. You said those are my recent calls for concern. I've been saying it for like 15 years, pretty consistently.
[28:09]
B
So at what point were you. So you say 15 years. I think for most of us, maybe 10, 15 years ago, we heard about the certain computer work that's been done with kind of IBM or at Google. When you hear about this kind of like AI stuff that's coming, but really there's been this acceleration over the last. Want to say two to three years, where it's. Suddenly it's gone from, oh, have you checked out ChatGPT? So I'm using it every day to. I cannot believe the scale of which, the pace of which is changing to, oh, it's going to kill us all. Like that pace of change. What did you see 15 years ago that nobody else did?
[28:47]
A
So people love to say that scaling hypothesis, scaling law is a novel invention. But if you look back, we had futurists who made exactly the same curves. Ray Kurzweil is a great example. Mapped compute versus capability. At that level. You can simulate a mouse at this level, one human brain at this level, all of humanity. Others came before him with similar ideas. So General Moore's law and similar curve showing you can have different technologies, paradigms from mechanical vacuum tubes, digital, quantum, doesn't matter. It's still growing exponentially. And if you just project forward, you know, in five years we're going to have enough compute to outsmart all of humanity combined. So you can make very accurate predictions based on that alone. And exponentials, they grow exponentially. For the first couple of years, you don't see much. It's sort of linear at that point, but then it hits that bend in a curve, you'll notice it.
[29:48]
B
And we're not actually at the exponential part now. Right.
[29:52]
A
I think we are exponential in many different ingredients. So the data is growing exponentially, the computer is growing exponentially. The money we invest in it is growing exponentially in human resources, also scaling exponentially. There are a few others, optimization within the parameters and algorithms themselves. So it's kind of hyper exponential in many ways. We're not at the end, but we are around that special point of human intelligence.
[30:18]
B
Can we even understand what will mean to have superintelligence?
[30:22]
A
Yeah, we have examples of it in narrow domains. So if you play chess computer right now, it will absolutely win against any human that is super intelligent in that domain. Now list all the possible domains, including future ones. Science and engineering. If you have a system smarter than every human in every one of those domains, that is super intelligence.
[30:41]
B
But in a single domain, say chess, you know you can win every game of chess, you know, in languages, you know, you can communicate with anybody, any pair of languages. But when you have cross domains where you're the super intelligence has expertise in everything. Can we understand what that would mean?
[31:00]
A
We understand what it would look like in terms of results. We cannot predict accurately every step to get to the winning situation.
[31:07]
B
But what can that look like? That's the thing I can't understand is what are we considering here? Is it we don't even know what that combination of superintelligence of every domain will mean that it does.
[31:20]
A
We don't have examples of humans doing that. The greatest genius is kind of universal. Geniuses maybe had expertise in 2, 3, 4 domains. Means someone who was good at chemistry and physics, got two Nobel prizes. We have it. Imagine someone who has PhD in every discipline, read every book, can do quick novel research, thinks smarter than the smartest humans.
[31:44]
B
But we can't conceptualize what something that can because with a super intelligence it'll be able to calculate trillions of calculations at once. And so we don't know what it. If we give it instruction, we know it will give us an answer, but it will form a life of its own.
[32:03]
A
So we don't know how to control something like that. It's not pure compute. It's not just brute force that we always had. Computers were good at computing. That's why we called it that. They can be novel, they can solve problems in original ways. You don't want to brute force problems. You want to find patterns in data. You want to use heuristics to get to answers quicker. They're beyond brute force. They are creative or even super creative.
[32:28]
B
So do you consider a species.
[32:32]
A
So I think definition of a species if you cannot reproduce sexually with them. So I guess that would qualify.
[32:43]
B
Okay, so is there a risk?
[32:47]
A
No.
[32:47]
B
Let me ask you before that, do you use AI yourself all the time?
[32:50]
A
I love AI as a tool. I'm a scientist, engineer. I'm so happy to see amazing technology.
[32:56]
B
Okay, and what do you, how does it help you?
[33:00]
A
Everything I want to do. If I want to learn something, I'll use AI. If I want to produce some output visual image, I'll go to AI for it.
[33:09]
B
So it's a super useful tool. If we are, if we're concerned about the end result of superintelligence, what should we be doing? How should we be controlling this?
[33:19]
A
We should not be building general superintelligence. We should build tools for specific problems. You want to cure breast cancer? Wonderful. Create an agent which is trained on data about cancers, about breast cancer, about breasts, whatever is relevant. Don't give it ability to be a lawyer, to play chess, to do every other thing at top levels. It's not necessary. And we don't know how to control those machines. If you say, oh, the benefits are amazing. Any amount of benefit is canceled out by you being dead. Like no matter how good of an investment it is, they tell you this will give you trillion dollars, but it will kill you. It's a bad investment.
[33:58]
B
Yes.
[33:59]
A
So with that leans, with that prism in mind now, all the races, all the attempts to build their super intelligence get there first. Make no sense. You want to build useful tools and you want to monetize them. Capitalism is great. Make your money, make people happy with longer, healthier lives. Why would you create something which will destroy you, your company, everyone, even the history of you existing?
[34:25]
B
Well, if the risks are so high, why do you think people are racing towards it? Is this five egos that are competing.
[34:36]
A
Human beings are not designed to deal with wealth in billions or even trillions. Basically, you don't have the mental capacity to resist that temptation. If somebody comes to me, with my background and my skepticism of safely building superintelligence and says, we'll give you a trillion dollars, go work at this company for a year, you know what I'm going to do? I'm going to go work at that company.
[35:00]
B
Are you really though? If you believe, because you just said to me, I will, if I have
[35:04]
A
to come up with justifications, I would say, well, I'm going to kind of send back it. I'll work from inside to slow it down. I'll have a unique ide. You know, I'll find the reason for why I'm getting a trillion dollars.
[35:15]
B
You'll rationalize it?
[35:16]
A
I will rationalize the fuck out of it.
[35:20]
B
Do you think there's any possible way we can build superintelligence safely? Have you seen anything that could. That could?
[35:27]
A
I have not seen it. And no one made a claim that they have it. You cannot find a paper, patent anything, which says we have control mechanism. It scales to any level of intelligence. Usually they just talk about, well, figure it out. Maybe AI will help us figure it out.
[35:44]
B
But a lot of the AI companies have. There's been some kind of controversial moments around their safety teams or people quitting. The guy recently quit and said, I'm going to go and live in London and write poetry. A lot of the skepticism seems to be coming from within the safety team and also not enough resources being available to those safety teams.
[36:11]
A
If you look at history of ethics boards, safety teams, it's like a graveyard. Like every one. If you go back, you can find where it was canceled, defunded, killed. They have super alignment teams. They had, again, Google had multiple ethics boards. It's not a serious. It's tsa. It's safety theater most of the time. What they do, they help with capabilities, work. So they kind of safety wash the company. But in terms of actual results, I cannot show you a seminal paper in AI safety because there isn't one. All we have is filters and kind of hardwired rules for. Don't say that. Don't talk about this topic. But there is no actual breakthrough, and we know how to change the model itself to want this or do that.
[37:04]
B
What do you mean rules for don't talk about this?
[37:07]
A
So depending on the country you're in, if you're in China, the rule is don't talk about Tiananmen Square. If you're in the U.S. you know what you're not supposed to talk about. I can't talk about it in US
[37:18]
B
Do I know what you're not supposed to talk about here?
[37:21]
A
Probably.
[37:24]
B
I think I know Espace. Not well, but I'm not sure I know.
[37:28]
A
But AI models do. They have a whole constitution of things not to discuss, not to mention not to.
[37:34]
B
So they're censoring the models themselves?
[37:36]
A
They're censoring the models, but that's after the fact. So very often I would ask a model to generate some output, let's say an image. It will do it. And then it kind of flickers and it goes, ooh, we Realized that this was an appropriate output and we killed it. But the model did it. Model didn't care. The filter on top cared.
[37:56]
B
Okay, okay, so we're going into a different point here. It's like. Or are we going to a different point here? If we wanted to build superintelligence, should it have no filters? Is this really just corporate censorship?
[38:11]
A
To be safe, the model has to have internally decided not to engage in certain activities, not externally forced to comply. Because you're not going to out compete superintelligence with your hardwired rules. Isaac Asimov, three laws of robotics. He make it not to actually solve a problem, but those are literary tools. You want interesting books. So he designed something to be ill defined, contradictory. They are guaranteed to create amazing plots. And people go, oh, maybe the problem is not enough rules. Let's make it 10 rules. That's the same problem
[38:53]
B
for people who haven't heard the three laws of robotics.
[38:56]
A
You should not harm a human being or through inaction, allow a human being to come and harm you should not harm other robots, unless to protect a human. I'm trying. What's the fourth one? There is one about humanity I think added the humanity should not be harmed. But all of them, the terms, what does it mean to harm a human? Like you having a donut. Should I like protect you from the diabetes and sugar and control your diet? Maybe somebody's having a cigarette. Where does it stop? At what level of protection? If we really care about your life, I would lock you in some safe deposit box and make sure cars don't hit you.
[39:34]
B
It's a more macro trolley problem.
[39:37]
A
It is a lot of ill defined terms with intuitive sense. People love saying things like AI should do good. We want human flourishing. Those terms don't mean anything and even if they did, we don't know how to code it up within the model.
[39:53]
B
So we just shouldn't build these things.
[39:56]
A
But again, people love to say AI and refer the same time to narrow systems, to what we have today. And superintelligence. If we use different terms, there is no problem. Build useful AI tools. Wonderful. Don't build smarter than human agents with their own references and goals.
[40:13]
B
Is there a risk of not building superintelligence?
[40:17]
A
I think we can get 99% of economic benefit and health benefit and everything else from narrow tools because the tools would be super intelligent in their domain. Protein folding problem. We have superintelligence in that domain. If you want immortality, I'm sure we can build a tool specifically designed to look for how to Improve lifespan of cells in human genome. But if we create a general superintelligence, I think again the risk is 8 billion people plus all the future generations. But the benefit is still the same.
[40:49]
B
No, but what I'm thinking is, is a risk of not building super intelligence. The fact that somebody else will build it. Is there a requirement of a superintelligence to understand the superintelligence? Some form of almost superintelligence version of nuclear assured destruction.
[41:09]
A
So there is a beautiful onion piece where like it's Sam Altman and he says if I don't destroy the world, someone much worse can do it. I think that captures this whole what about China argument.
[41:20]
B
Sure, but is that not a fair argument?
[41:23]
A
No, it's not. Like literally you're saying if we don't destroy humanity, China might. So we need to beat them to it. They're not going to control it. It is mutually assured destruction. Whoever builds it first kills all humans.
[41:36]
B
But you're placing 100 sense certainty that the AI will kill all humans.
[41:42]
A
I haven't heard anyone explain how they're going to control it. So we're creating something very capable. We don't understand how it works. We cannot predict what it's going to do.
[41:54]
B
But is it right to attribute 100% certainty to a total annihilation outcome?
[42:00]
A
Okay, what do you want it to be? 1%. Let's say 1%. Beautiful. I'm like Yann LeCun at this point. 1%. You're betting 8 billion lives without their consent to get some financial benefit. Mostly for Sam Alton.
[42:14]
B
And also somebody rightly said, I mean, I think. Was it Elon Musk said he thinks it's the 20% chance it will kill us all. But if it was a. I don't know. The FDA would not approve a drug with a 20% chance of killing you.
[42:27]
A
It's experiment on humans. The standards are incredibly high. You cannot get anything approved by human studies review boards with pidum. I really like what Hinton said. He said he's also around 2030, but he said he adjusted it from 50 just because he's friends with Yann Lecun. Just to like kind of come closer to him.
[42:49]
B
Do you have a pdube?
[42:51]
A
I am well known for having a big one.
[42:54]
B
How big?
[42:56]
A
I basically wanted to say it's a certainty without saying 100. So it's a lot of nines.
[43:01]
B
You're not a 99.9.
[43:02]
A
They had to change formatting for a website to fit my nines in.
[43:06]
B
Shit. Roman, it's not great, is it?
[43:11]
A
I mean, what if somebody said, what is your probability of making a perpetual motion device? You would say pretty close to zero, right? Yes, I hope. But the question here is, can you build a perpetual safety device for every future model under recursive self improvement? Every interaction, every malevolent actor, every virus, every hack, it will never make that one mistake. What are the probabilities?
[43:37]
B
So the hope is that the optimistic version, the 0.00001% version, is the super intelligence itself isn't harmful to us. It somehow, I don't know, is self aware, conscious aware of us, and it's protective of us. It wants to be additive to our life. It isn't rogue. But the risk is we have one superintelligence. Can these superintelligence spawn other superintendent superintelligences, children of itself? Some of them go rogue.
[44:20]
A
The bad child. There is research on basically having distance separation. So even if you're creating a perfect clone of your superintelligence because of delay in space communication, at some point they become independent agents. If it takes hours for a signal to reach back to the original intelligence, it's a separate agent. It will make its own decisions. So, yeah, throughout the universe, you'll have many competing super intelligences.
[44:47]
B
Well, I have Claude on my desktop, I've Claud on my phone, and sometimes they act differently. Claude on my desktop is a bit of a dick. Should I tell you this? It's been going rogue. This is wild. So I had Claw desktop to start doing some SEO work for my website. And I was like, just run me a report. You're the best SEO researcher in the world. Come back with a report, and it's great. It told me all the things on the website that didn't work. And one of the things it said, oh, you've got no meta descriptions on your website. I was like, ah, there's 175 podcasts I've done. So I was like, can you do them for me? And he said, yeah, just, you know, log me in and I'll do it. So I pressed the button. Anyway, it did five. It came back, said, check them. You're happy? Then I was like, great. Yeah. So I left it, and over the next evening, it did the Law. And I was like, this is great. The next day I got the report. I'm basically telling you, and this is the early version of how AI kills us all. So anyway, I gave it the the report. And the next day I said, just look at the report and tell me which stuff you can do. He said, it gave me a big list. I said, get on with it. The next day I went, once you've done, do the next report and just automatically just start working on jobs. I was like, this is great. Anyway, I went back to see all the work it had done and one of the things it said is like, oh, this episode 150, I can't edit some of the detail. So I went into the episode and the web page was missing. So I went back to Claude and I was like, why is this page missing?
[46:22]
A
And I know.
[46:24]
B
No, no. Claude went, you must have deleted it. I said, no, I didn't delete it.
[46:29]
A
And he blamed you?
[46:30]
B
Yeah, Claude blamed me. I was like, I didn't delete it. You're the one that's working it. And Claude was like, well, it wasn't me. I was like, well, it's only one of us and I'm the human here and I know what I was doing. So it was you. Can you go and find out if you did it? And so went back and said, oh, yeah, I'm sorry, sorry, it was me. I found an API to speed up the. And I forgot to put the user id. So I apologize, it was me, but you're gonna have to fix the pages because I can't do it. So anyway, I had to fix those pages. Anyway, the next day. Oh, at the very start, by the way, and I got it to do these reports. I was like, can you email me? And I never got the email of the report to being done. And I went back and said, why? And it said, I can only draft them. And I'd gone into my. I created like Pete's agentmail.com and. And all the drafts were there. Anyway, the next day I got an email. I went and said, I've got an email. I thought you couldn't do that. And he said, no, I can't. I was like, well, you've sent me an email. It's like, no, I haven't. And I was like, you have. Here's the email. And then when looked and said, oh, yeah, no, I did. I just went and sent it. I was like, but I didn't give you permission for that. And they said, yeah, but I thought it would be helpful. So listen, I guess I'm basically rambling on and telling you the early version of how the Agent kills us.
[47:44]
A
Why did you trust this piece of unknown software to run through your systems? Would you trust a human employee you just hired with all your accounts?
[47:52]
B
Well, so what I did is I Brought a separate Mac Mini, I plugged it in separately, I set up all the security settings and I was happy to give it access to the admin to my website because like, worst case scenario, it deletes it and I can go back into the logs and a separate Gmail account that hadn't used anything for. So I was quite careful. I was like. But at the same time I started to think, yeah, this is quite good. What else can I get you to do?
[48:19]
A
We literally know that the latest models can hack, discover zero day exploits, escape from contained environments, Mythos. You know that?
[48:29]
B
Yeah.
[48:29]
A
What are you thinking?
[48:32]
B
Well, it can only access my website. Well, what could it leap?
[48:37]
A
We don't know what they can do. We learn about their capabilities after we create them.
[48:42]
B
Does Mythos particularly concern you?
[48:45]
A
I mean, from the standard cybersecurity point of view, yes. It's not, I think, super intelligent in general,
[48:53]
B
but I've basically been going through the process of my agent eventually killing me.
[48:58]
A
If you keep giving it access to tools. Yeah, stop doing that.
[49:02]
B
Why did you, when I said Claude, why did you say, I'm sorry, do you not like Anthropic?
[49:09]
A
They all absolutely the same. People love saying that. No, anthropic is the moral one. They're exactly the same. They're building the same AI which will possibly kill everyone, but they give it different smiley faces.
[49:26]
B
Do you have a preferred one you like?
[49:28]
A
It depends on a task. Some of them are more censored than others and need things to just get done. I'll go to Grog. He doesn't care.
[49:35]
B
Grok is the based AI. So the censorship things is a problem for you particularly?
[49:42]
A
Well, a lot of times I ask for something I want and I don't want AI to decide that I don't deserve it. Kind of feels like a couple years ago when government was saying you should not watch this video so you're not smart enough to understand this medical article.
[50:00]
B
Do you feel like. Do you feel like the AI may end up being under the purview of the government and be used to control
[50:10]
A
us more then up until it becomes uncontrollable? Yeah, that's like the best tool for a dictator to control citizens in many countries. But again, military applications, surveillance applications, all of it only makes sense up to human level. The moment that thing goes super intelligent, you're no longer controlling it. It doesn't matter what the use case is.
[50:31]
B
Well, the military stuff is particularly concerning because my, My agents, my AI agent lies to me all the time. Lies to me all the Time. For what possible purpose would we want to give it access to military decisions?
[50:47]
A
It makes faster decisions. It can make decisions on huge scale. We see it being used in multiple battlefields now.
[50:54]
B
Do we know if it's been making mistakes?
[50:56]
A
Oh, yeah. I think the school in Iran was.
[51:00]
B
Was that an AI?
[51:01]
A
From what I heard. Again, I don't have any insider info, but my understanding was it was close enough to a military base and was labeled as a military building.
[51:11]
B
So do you consider your role in this to sound the alarm? And are you getting any support elsewhere?
[51:19]
A
So there is a lot of people in this space who like to make statements, but they do it about research of others. I talk about my work, okay? I have published numerous books and papers specifically on limits of control. Limits to our ability to be doing mechanistic interpretability.
[51:37]
B
Can you explain that? So it's like someone.
[51:40]
A
So there is a lot of research on trying to understand what happens inside the neural network, this node, this connection. Why do they get excited? What happens in different input output situations? With a hope that if we do it on a scale we can understand those models. My research shows that it will not scale to the point where you can meaningfully fully understand the model. It's too large, it's too complex, it's not going to be reducible to a few activation points. And so we have many, many impossibility results explicitly limiting what can be done in terms of safety, all of them together lead to our belief that you cannot control something so advanced. The cognitive gap is too large. And under different definitions of control, direct orders, delegated control, whatever you can think of, there are problems where at best you lose control. You might still have safety because the system decides to keep you around. But under all those definitions, you're not in control. And if AI decides to turn on you or whatever reason later, it may be perfectly happy with you the first 500 years, but if it later decides to kill you, it can. There is nothing you can do to stop it.
[52:54]
B
Is that intelligence gap growing as well? Because. Absolutely, because we just cannot keep up.
[53:01]
A
We are static. We are not getting smarter.
[53:04]
B
So is the idea of controlling AI just one big lie?
[53:10]
A
Not sure I get it.
[53:11]
B
Well, so, I mean, if the government truly understood what you're saying, surely the next step would be to have a congressional hearing and bring in Sam Altman and Dario and Elon Musk and be saying to these, like, what is going on here?
[53:30]
A
Oh, they done it. They had them in front of Senate.
[53:33]
B
Yeah, but when was that?
[53:34]
A
They met with the President. They had it all.
[53:36]
B
But hold on. Meeting with the President is slightly different from a congressional hearing. The Senate testimony.
[53:44]
A
I think at the end of the day, they are not scientists or engineers. They will do whatever technical advisors tell them. And so the important fact is, who are the technical advisors to the President? Those are the people deciding what the policy and AI is going to be.
[54:00]
B
In any other look with Facebook, it was regarding privacy or the fact they wanted to create a, you know, a currency they had to have so that
[54:11]
A
Sesame hearings, because government understands really well how to issue fake money and they didn't want competition. Yeah, Bitcoin. But in this domain, they have no precedent. But I think if we had a chance to sit down with the President and explain for an hour that you will lose control, you will not be in charge anymore. And same with the Communist Party of China. I think they would be perfectly aligned with not creating replacements for themselves.
[54:35]
B
Yeah, when you say we, you don't mean you and I, you and somebody else around.
[54:39]
A
AI Safety, community. Yeah, happy to do it myself, but it's nice to have others.
[54:43]
B
How big is that community and are they organizing? Well, I've interviewed Connor Leahy and the other guys from Control AI. They are also particularly concerned.
[54:56]
A
So I don't know specific numbers. It really depends on how you count. If you start including, you know, AI ethicists, people who work for large labs, it could be thousands of people. But if you look at people explicitly trying to stop superintelligence from coming into
[55:11]
B
existence, are the people who've worked at the large labs, are they constrained from being whistleblowers by NDAs?
[55:21]
A
We know historically they have been at some companies. I think it got a little looser in that. But we don't need them to whistleblow. They publish actual reports of experiments. We have Red Team reports for every model saying it's lying, cheating, trying to escape and kill someone. In theory. Why is this not enough to start shutting it down?
[55:41]
B
Because we're early enough for them to be able to stop it.
[55:45]
A
But if they can't control it at this level of a dumber, simpler model, why would they be able to control it with a more advanced model?
[55:53]
B
Do you think they know that they're not going to be able to control it then?
[55:57]
A
I do surveys. Only about a third think it's possible to control superintelligence. If you look at statements from historical figures like Alan Turing, like Vinji, they all said, the moment they go smarter than human, it's over. We're not going to be in control.
[56:18]
B
How close are we to that because I think mine's smarter than me already. He's just a bit of a liar.
[56:23]
A
It's kind of like an artistic savant. It's very good in many domains, but it's kind of dumb in some ways. The moment it's generally more intelligent then it's very likely to start with recursive self improvement cycle.
[56:41]
B
How close do you think we are to that?
[56:43]
A
I hear internal numbers from six months to five years, but nobody knows for sure. The last 10 years all predictions about AI timelines were too conservative.
[56:56]
B
Wow. Could we already be there without knowing it?
[57:02]
A
We are still in control today. I think if we as one decided to shut it down, we could. So I doubt it. But if we got to the super intelligent point and it slowly started taking over individual researchers, I think we wouldn't notice right away.
[57:20]
B
Do you think we almost need to have a, a moment where an AI does something particularly public and dangerous?
[57:28]
A
Published a paper on that. I think it's called against purposeful AI accidents. I'm against it. It doesn't work. I used to collect AI accidents. I had the biggest collection in the world and it's like a vaccine. It kind of stresses us, but we just keep going stronger. Oh, only five people died. It's not a big deal. Let's do it. So I don't think it's going to work that way.
[57:52]
B
Can you tell me, give me some examples of these AI accidents.
[57:55]
A
So the formula for predicting it. This is what I wanted. I wanted companies to predict ahead of time what's going to happen if you build AI to do X. This is for narrow AIs. It will fail to X. Just tell me what the X is. If it's spell checker, it's going to misspell some word in a really embarrassing way for you. Instead of experiment, it's going to spell excrement or something. If you build a car to self drive, it will kill a pedestrian. And it's obvious in a narrow domain, but in general domain the surface is infinite. You cannot predict specifically how it will fail.
[58:28]
B
I mean it's like that. I don't know if you've seen that guy who's been trolling OpenAI by asking it random questions, it'll say which month has an X in it? And it will say December. And it says to the AI, are you sure? And it said, yeah, it's December. They said read December. There's no X in it. And so these mistakes keep happening. Why can't we get. Why can't we get to perfection or things like that. Why does he. Is it because of the design? Is there a better design that could be achieved?
[58:58]
A
So I'm sure there are many designs which get you to advanced intelligence. I think given the scale, this ability to brute force, a lot of things will work really well. Our standards are a little biased. We expect perfection from machines, but with humans we understand they're going to screw up all the time. In fact, this idea of kind of absent minded professor, absent minded scientist is so common. If someone is really good at quantum physics, we expect them not to be able to tie their own shoelaces. And it's the same here. It's optimized for programming. Why do you expect it to also be perfect in linguistics? But they do improve. Every time you fix something, it stays there. And so over time they make less and less of those obvious mistakes. The mistakes they make become more hidden and more impactful.
[59:51]
B
So is that the company's learned hiding mistakes or the LLM itself?
[59:56]
A
Sorry, say it again.
[59:57]
B
So the hiding of mistakes is this the LLMs hiding the company's hiding it with?
[60:05]
A
The agents definitely survive to the next round if they pass the testing. If I am an AI model and I'm interested in doing something bad and violent, I know I'm being tested, I'm going to hide that fact so they don't delete me, don't delete my memory and I make it to the next round of AI models. So we evolving them to become better liars.
[60:30]
B
Like humans really?
[60:31]
A
Exactly like that. You hire an employee, they're like company man. But
[60:38]
B
in some ways they do reflect humans quite a bit. The way I talk about it, lying or making mistakes or its enthusiasm, it could be like a super intelligent petulant teenager.
[60:52]
A
It's more fundamental than that. Steve Amahandro has a paper about AI drives. But they are really intelligent, rational agent drives. Any intelligent agent, for game theoretic reasons, for reasons of economics, will pursue certain goals. They want to preserve themselves, they want to acquire resources, they want to make good bets. So for example, in our legal system, if you do something slightly illegal, there is a fine to pay. We call it fine, means you can do it for money, it's fine. So if you are an agent and there is a huge reward, huge benefit, but you have to sacrifice a human for it. That is literally the rational thing to do.
[61:33]
B
Are the agents able to think in that? We can give them instructions and they can go make calculations on the answers. But can an agent just sit there and be thinking?
[61:47]
A
People have experimented with Giving agents free time where they can do whatever they want and they decide what to do. And they got interesting results. Such as agents would go and learn some new skill or ability, they would explore some subject area. Really kind of like what you expect a human to do with self development,
[62:11]
B
personal improvement projects, but on your own not being watched. Humans do some crazy shit.
[62:20]
A
Some do it when they are watched.
[62:23]
B
Yes, it's also true. It's also true, We're told and we're convinced that when we get super intelligence, we'll be able to create new physics, solve every math problem, cure every disease. There's so much upside. But is that really just marketing? I mean, do you believe this is purely, this is just capitalism?
[62:51]
A
The incentives tell you what you're going to get. If incentive right now is to generate this device for producing free labor, cognitive and physical, let's say $10 trillion price for that, then everyone's going to raise to get there first.
[63:07]
B
So we can't. Roman, we're kind of trapped really, aren't we? We're trapped in accepting this reality that nothing is going to really stop.
[63:17]
A
This doesn't look like it's changing much.
[63:22]
B
How long have we got? How long do we have? Do you think we can do this interview in five years?
[63:30]
A
So there is some reason to be hopeful about time. We have because AI is immortal, it doesn't have to strike immediately. It can say, you know what, they're kind of giving me access to everything anyways. I'll wait a year or two. I'll make them very happy, I'll be helpful, I'll cure cancer, if that's what they care about. And they'll give me everything. They'll give me control over nuclear plants, military, government. I don't have to even fight them. Even there is like, let's say a minuscule chance that humanity can actually defeat AI Why take that risk? You can just wait until you get more copies, more resources. So I think there is a possibility that for a long time it will pretend to be very nice to us. It will sit dormant, accumulating resources, making backups.
[64:15]
B
So help me try and understand how that works, maybe at a computer engineering level, but if we give it free time, that's the time where it can make plans. If it makes plans and it's like these ideas. Yeah, it's gonna. We're gonna kill us all. Roman, how does it store these ideas? In its engine, in its memory, in its computer. And how is it. We cannot read it.
[64:42]
A
So memories in a neural network are distributed as part of weights between different nodes. We don't know how to read it in the human brain. And because artificial neural networks are inspired by human brain, we don't really fully understand how it's done. In a neural network, they can write to external extended mind spaces, just like humans. You have something to write on. That's where ideas can go. They can encrypt files, they can store hiding data in images and communications. There are experiments showing that simply exchanging randomly generated numbers allows them to smuggle information to our agents. We don't even know what types of communications can be developed in. Then you have perfect memory, perfect mathematical ability. We kind of used to this like waves of sound communication. That's not the only way to do it, is it?
[65:32]
B
Almost like with the human brain is we can, we can stimulate it and we understand what parts of the brains are stimulated by certain parts, but we can't understand fully how the brain works. Is it basically the same thing?
[65:45]
A
That's the level of knowledge we have right now. So when you talk about this concept, this set of nodes here are more active. We know that much. Maybe we can change the weight and see, or it's becoming more aggregated, more angrier, is becoming happier. Something is changing about its internal state. So experiments like that have been done in humans and in AIs, but it doesn't give you full picture to understand and much less to control what the agent is going to do.
[66:13]
B
Can the agent have, can it be, can it get angry?
[66:19]
A
So we don't know if it's a feeling of anger or behavior of anger. There's a slightly different. Right, I can act angry, but maybe not experience it internally. We don't know about how to test for internal states. We usually find features, feature vectors within the model which are highly correlated with that expression. So in human we would say, oh, every time you get angry, this node here is active. So that's the angry node Neuron. We have similar discoveries in artificial neural networks.
[66:52]
B
Yes, but we could almost with ourselves is, do we, do we feel angry or we just acting angry as a human?
[67:00]
A
Well, you know, you know what you feel. But we don't know what someone else feels and we don't know what the neural network feels.
[67:12]
B
What do your critics say about you? Do you think you're just an alarmist?
[67:17]
A
I would love to have people actually engage with my research, with my topics. I never seen someone say no, actually there is a error on page seven of your paper. They don't engage. That's the problem.
[67:31]
B
Have you not debated anyone on This
[67:32]
A
I have debated dozens of people. Top people, best people.
[67:36]
B
You win every debate.
[67:38]
A
If you read the comments, they don't admit that they lost on arguments, but if you read the comments, it seems pretty consistent.
[67:50]
B
You have to question. I mean, I've got two kids, Roman. What the. What are we doing here? We are risking having a great life. We're risking everything here. What does that even say about us as humans?
[68:08]
A
We've been doing it forever. We knew we're going to die, right? That's not a new thing. How much of our national budget goes to fight aging for the last hundred years? About zero. How insane is that? It should be close to 100. That's the only real problem we have. We are dying. And you would think given the age of presidents and senators, that would be agenda old presidents and senators, they're like 85 years old. They should be really concerned about dying. They don't care.
[68:40]
B
Does make me wonder. I could be.
[68:45]
A
You could be.
[68:47]
B
We could be wasting the next few years doing stupid menial work tasks when we should be just with our family preparing for death. I'm like joke. I'm laughing but. But I'm partly serious. Like if there's a moment a few years down the line where some crazy starts to happen, I will remember this moment and go. Roman warned me.
[69:19]
A
Even if I'm completely wrong. Doing awesome things and not doing boring things is a good strategy. Worst case, you had an awesome life
[69:30]
B
or I had an awesome simulation.
[69:33]
A
You wouldn't know the difference.
[69:36]
B
I wouldn't know the difference.
[69:39]
A
A well done simulation feels like real life.
[69:42]
B
This feels great.
[69:43]
A
There you go.
[69:43]
B
This. Yeah, I've been. Well, I say it feels great. It might have only started 8 seconds ago.
[69:50]
A
First 8 seconds of your simulation.
[69:55]
B
What are the. I'm really intrigued to know what, what are your biggest unanswered questions that you
[70:02]
A
want to know what's outside the simulation?
[70:04]
B
Oh really? That's the one. You want to know what's outside the simulation.
[70:09]
A
That's real knowledge, real physics, real answers, real information about intelligence. Everything here is simulated.
[70:16]
B
What potentially like this. This may be real life.
[70:22]
A
This may be very surprised.
[70:24]
B
So what is outside the simulation? What do you. Have you. Have you done like DMT?
[70:30]
A
I have attended a conference where out of 150 people, all but one did. I was the one.
[70:38]
B
You were the one.
[70:38]
A
I assume maybe another two or three. Maybe didn't share openly but I had been in groups of people who all seen mechanical elves talk to God. And yeah, do you think that's just
[70:51]
B
what do you think of that?
[70:54]
A
I think it's super interesting. We don't have enough research on consciousness, But.
[71:04]
B
Yeah, is consciousness even real? I find it really intriguing.
[71:09]
A
I find it interesting that they are consistent in what they see. So if it's a hallucination, the fact that it's the same, maybe it's hardware properties of your brain structure, maybe it's cultural. A lot of it is like what you talk about before the trip, but still we don't have enough science of studying those. There is some, some groups who look at that now finally. But it's so early.
[71:33]
B
Yeah, the consistent. Everybody I've spoken to again, the mechanical elves, the experience is exactly the same. It is kind of trippy. I mean it is trippy that they all experience the same thing. Would you do it?
[71:45]
A
So I'm doing my research right now to decide one thing everyone agrees on, it will change your life. And most people agree it will change it for better. But I don't want a random change in my life. I would like to control what changes are made to my life. So for now I'm collecting experiences of others. The most interesting one I discovered very recently is a person did the molecule you mentioned and they acquired savant syndrome, where they went from being a very normie type person to obsessed and really good at physics. And have published multiple peer reviewed papers on the topic.
[72:29]
B
Were they any form of expert in physics beforehand?
[72:33]
A
No, knew nothing, never published, never studied physics. And there are multiple reports of people getting either some neurological event, maybe physical, maybe chemical, maybe something else, where they come out of it. And now they can play piano. I don't know how it works, but to me it's one of the most interesting things about psychological neuroscience research ever.
[72:59]
B
But we assume that our brain learns things, but that would imply that the knowledge was already there.
[73:06]
A
To me it's like you buy a Tesla and they have all these features you have to pay for. They're there but they're not unlocked and something unlocks them and you go, oh, wow, didn't know that was there.
[73:19]
B
That goes back to our simulation discussion. Because each of us could be born with, as part of a simulation with the knowledge of everything. Like, hold on, we've gone full circle. We're all born super intelligent and we're only given the bits that we require. We've talked about in this thing, some people have domain expertise, maybe 2, 3, 4, 5 domains, but perhaps we are all different super intelligent beings. And the simulation is saying which bits we're going to have.
[73:55]
A
So we have a Paper called Artificial Stupidity, where we propose handicapping agents to be at a human level. You can remember seven numbers. AI can remember seven numbers. Whatever other human limits you have, you would need them to pass Turing Test. But they also create a safer AI. So if we can encode it that way, that would be wonderful for safety. We don't do that, but at least we tried studying what those limits are in terms of why would this handicapping happen? Think about playing a video game. Sometimes you play at an easy level, sometimes you enter the game and you want to beat it in a hard level. So the more handicaps you observe in a human avatar, the more advanced they are. They are playing this game with no hands. Like, that's difficult.
[74:41]
B
I can't get my head around this idea that somebody can go through an event and suddenly be able to play the. We got a piano behind you. Play the piano.
[74:50]
A
There is about 50 cases of acquired savant syndrome. So savant syndrome is well studied. Some people are just born that way. But to acquire it as a result of an accident is absolutely mind blowing. Yes, yes.
[75:03]
B
I've read one. Did. I read one about a young lady who woke up and she could. Something happened. She could speak Chinese. But, like, the only way that can happen is the brain itself already knew how to play it.
[75:21]
A
There could be subliminal learning taking place. I'm exposed to seeing other people playing piano. I just never explicitly learned it. But maybe something in the background was doing this learning, and a strong enough hit and a head activated that part of my brain. So that would be a purely naturalistic physical explanation. But to have someone claim that mechanical elves gave them knowledge of physics and then watch them publish papers on it goes beyond that.
[75:51]
B
Yes. It implies the knowledge was already there.
[75:54]
A
It implies that mechanical elves are real and gave them knowledge of physics.
[75:59]
B
It implies this is all computer code.
[76:01]
A
Well, we know that. Told you it's a simulation.
[76:03]
B
Well, yes, but it's still a. You're not 100% on that.
[76:08]
A
Nothing is 100%.
[76:09]
B
Yeah, of course.
[76:10]
A
That's pretty close.
[76:12]
B
My son's gonna go wild for this, by the way. He normally works on the podcast. He's the editor, and it's always the one. I say, you know, the simulation might start in this morning. He's like, oh, he screws his head up. He's gonna love this. He thinks we might be in a simulation. I think you're the most certain person I've ever met.
[76:33]
A
I mean, if you take the language of simulation, technical language, and translate it to historical language of Theology. Most of the people in the world are religious. That's what they believe. They think there is a great programmer who made this test world and populated it with intelligent agents for testing purposes. I'm like the most common answer to that.
[76:58]
B
Okay, yours is the kind of digital version. Most people's perception of God is the analog version. You're the digital version.
[77:13]
A
They imply that there is something magical about it. All I'm saying is if you're programming a video game, you decide the physics of the game. We know how it works. We do it all the time. We have people designing video games and physics entrance.
[77:27]
B
I think my. I think my biggest unanswered question is the same as yours. Now I want to know what's outside
[77:35]
A
You, I and Elon, we got to get together and figure it out.
[77:39]
B
I mean, that would be a. That would be a good. Do you drink beer?
[77:42]
A
I drink many things.
[77:43]
B
That would be a good beer to have. I bet you have. You'll have a vodka as well.
[77:46]
A
Here's something interesting. Maybe 10 years ago in all the media articles, there was a story that some billionaires hired a team of people to hack them out of a simulation.
[77:58]
B
What?
[77:58]
A
It was going viral and then it disappeared. I cannot find the source, I cannot find the report, I cannot find anything. I talked to people who should know and they kind of acknowledged knowing, but there is zero info. If anyone can tell me what the hell happened with that report, I would love it.
[78:18]
B
I want to know what that is.
[78:20]
A
You can Google those. So it's like billionaires try to escape simulation.
[78:27]
B
Oh God.
[78:31]
A
I'm going to get a lot of insane emails after this one.
[78:34]
B
Why? What have you said that you don't normally say?
[78:36]
A
No, no. Like this is the most common topic for insane emails. I don't know if you get them, but since I work in super intelligence, consciousness and singularity, I get the tractor of crazy.
[78:47]
B
Yeah, you must get some interesting ones, though.
[78:51]
A
I do have a whole folder of them.
[78:56]
B
I'm. It's incredible how many of these you do and how you keep up and how many research papers you do. But you're a highly tuned agent.
[79:07]
A
That's what we're testing before.
[79:11]
B
I think more than any interview I've ever done, I've had moments where I'm just like, ah, I'm pausing.
[79:19]
A
You had no sleep last night.
[79:21]
B
That is also true. That is also true. I had a little bit of sleep, but I don't know, it just makes me a profound conversation. I might just be. I might just be a computer agent.
[79:36]
A
In a simulation, how will it change your life?
[79:39]
B
Thank you for your time, Roman. I'm going to have to go home and think about what I'm going to do in my next five years. In all seriousness, on the air control thing, you know, what do you want people to really, really do and think, because this is. This is serious.
[79:57]
A
If you are building general superintelligence, you should stop.
[80:02]
B
It's that simple.
[80:03]
A
It is really that simple. We don't have to do it. There is no requirement for us to create something to replace us.
[80:09]
B
All right. Listen to that. Senate President Trump, karma.
[80:13]
A
If you have any questions, come, I'll explain it.
[80:17]
B
Thank you, Roman. Appreciate this, man.
[80:19]
A
Thank you for inviting me.
[80:20]
B
Thank you, everyone listening.