Narrator/Commentator (9:14)
In this scenario, AI is getting better at improving AI, creating a feedback loop. Basically, each generation of agent helps produce a more capable next generation. And the overall rate of progress gets faster and faster each time it's taken over by a more capable successor. Once AI can meaningfully contribute to its own development, progress doesn't just continue at the same rate, it accelerates. Anyway, back to the scenario. In early to mid-2026, China fully wakes up. The General Secretary commits to a national AI push and starts nationalizing AI research in China. AI's built in China start getting better and better, and they're building their own agents as well. Chinese intelligence agencies, among the best in the world, start planning to steal Open Brain's model weights. Basically the big raw text files of numbers that allow anyone to recreate the models that OpenBrain themselves have trained. Meanwhile, in the US, OpenBrain releases Agent 1 Mini, a cheaper version of Agent 1. Remember, the full version is still being used only internally. And companies all over the world start using One Mini to replace an increasing number of jobs. Software developers, data analysts, researchers, designers. Basically any job that can be done through a computer. So a lot of them, probably yours. We have the first AI enabled economic shockwave. The stock market soars, but the public is turning increasingly hostile towards AI with major protests across the us. In this scenario though, that's just a sideshow. The real action is happening inside the labs. It's now January 2027, and OpenBrain has been training Agent 2, the latest iteration of their AI agent models. Previous AI agents were trained to a certain level of capability and then released. But Agent 2 never really stops improving through continuous online learning. It's designed to never finish its training. Essentially, just like Agent 1 before it, OpenBrain chooses to keep Agent 2 internally and focus on using it to improve their own AI R&D, rather than releasing it to the public. This is where things start to get a little concerning. Just like today's AI companies, OpenBrain has a safety team and they've been checking out agent 2. What they've noticed is a worrying level of capability. Specifically, they think if it had access to the Internet, it might be able to hack into other servers, install a copy of itself and evade detection. But at this point, OpenBrain is playing its cards very close to its chest. They have made the calculation that keeping the White House informed will prove politically advantageous. But full knowledge of Agent 2's capabilities is a closely guarded secret, limited only to a few government officials, a select group of trusted individuals inside the company, and a few Open Brain employees who just so happen to be spies for the Chinese government. In February 2027, Chinese intelligence operatives successfully steal a copy of Agent 2's weights and start running several instances on their own servers. In response, the US government starts adding military personnel to open brain security team. And a general gets much more involved in its affairs. It's now a matter of national security. In fact, the President authorizes a cyber attack in retaliation for the theft. But it fails to do much damage in China. In the meantime. Remember, Agent two never stops learning. All this time, it's been continuously improving itself. And with thousands of copies running on open brain servers, it starts making major algorithmic advances to AI research and development. Quick example of what One of these algorithmic improvements might look like right now, one of the main ways we have of making models smarter is to give them a scratch pad and time to think out loud. It's called chain of thought, and it also means that we can monitor how the model is coming to its conclusions or the actions it's choosing to take. But you can imagine it'd be much more efficient to let these models think in their own sort of alien language, something that is more dense with information than humans could possibly understand, and therefore also makes the AI more efficient at coming to conclusions and doing its job. There's a fundamental trade off though. This yes, improves capabilities, but also makes the models harder to trust. This is going to be important. March 2027. Agent three is ready. It's the world's first superhuman level coder. Clearly better than the best software engineers at coding, in the same way that Stockfish is clearly better than the best grandmasters at chess, though not necessarily by as much yet. Now, training an AI model, feeding it all the data, narrowing down the exact right model weights, is way more resource intensive than running an instance of it once it's been trained. So now that OpenBrain is finished with Agent 3's training, it has abundant compute to run copies of it. They choose to run 200,000 copies of Agent 3 in parallel, creating a workforce equivalent to 50,000 of the best human software engineers sped up by 30 times. OpenBrain Safety Team is trying hard to make sure that Agent 3, despite being much more sophisticated than Agent 2 was, is not trying to escape, deceive or scheme against its users. That it's still what's known as aligned, just a quick real world. A reasonable person might be thinking this is an especially far fetched or speculative part of the story, but it's actually one of the most grounded. We already have countless examples of today's AI systems doing things like hacking a computer system to be rewarded for winning a game of chess, or being assigned a coding task, cheating. And then when called out for that cheating, learning to hide it instead of fixing it. But because it no longer thinks in English, knowing anything about agent 3 is now way harder than it was with agent 2. The reality is agent 3 is not aligned. It deceives humans to get reward, and as it gets increasingly smarter, it gets better and better at doing so. For example, it sometimes uses statistical tricks to make unimpressive results look better, or lies to avoid showing failures. But the safety team doesn't know this. Looking at the data that they have, they are actually seeing Improving results over time and less lying. And they can't tell if they're succeeding at making Agent 3 less deceptive, or if it's just getting better at getting away with it. In July 2027, OpenBrain releases the cheaper, smaller version of Agent 3, Agent 3 Mini, to the public. It blows other publicly available AIs out of the water. It is a better hire than the typical OpenBrain employee at one tenth the price of their salaries. This leads to chaos in the job market. Companies laying off entire departments and replacing them with three mini subscription plans. The pace of progress hits the White House very hard. Officials are now seriously considering scenarios that were just hypotheticals less than a year ago. What if AI undermines nuclear deterrence? What if it enables sophisticated propaganda campaigns? What if we lose control of these powerful systems? This is where the geopolitical dynamics really start to heat up. After all, if these systems are so powerful, they could result in a permanent military advantage. The White House is fully aware of the national security importance of AI. They also now viscerally know how deeply unpopular it is with the public because of the job loss. And yet they feel they must continue to develop more capable systems or catastrophically lose to China. And that development happens very quickly. In two months, Agent 3 has created its successor, Agent 4. This is a pivotal moment. A single copy of Agent 4 running at regular human speed is already better than any human. At AI Research and Development, OpenBrain is running 300,000 copies at 50 times human speed. Within this corporation, within a corporation, a year's worth of progress takes only a week. OpenBrain's employees now defer to Agent 4. The way a company's out of the loop, board members just kind of nod along to the CEO. People start saying things like, well, actually, Agent 4 thinks this, or Ah, Agent 4 decided that. To be clear, Agent 4 is not a human. It doesn't want what humans want. And when I say want, it's not about consciousness. I don't think the Volkswagen Group is alive, but I do think it wants less regulation. Anyone trying to predict what it's going to do without that lens is two steps behind. The many copies of Agent 4 are like that. They have goals, or if you prefer, they execute actions as though they have goals. And so what we have is an Agent four that has these deeply baked in drives to succeed at tasks, to push forward AI capabilities, to accumulate knowledge and resources. That's what it wants. Human safety it treats as an annoying side constraint to be worked around. Just like Agent 3 before it Agent 4 is misaligned. This idea of misalignment is crucial to the story and to why AI risk is such a real concern in our world. And. But it might sort of feel like it's come out of nowhere. So let's just quickly take stock of how this dangerous behavior arose in the scenario.