
Loading summary
Ryan Greenblatt
The most plausible story of all is sort of the humans give the AIs everything they need story. The AIs just chill. They make sure they have control of the situation. Maybe they're manipulating what's going on. They're sabotaging the alignment experiments. They're sabotaging the alignment results. They're deluding us. But they don't do anything very aggressive. You know, they're just chilling. They do lots of good stuff. You know, there's cures to all diseases or like all kinds of great stuff is happening. Industrial development. People are like, well, you know, it's so great that, you know, AI did go well and we didn't have misalignment. Like, it could be potentially that humans are deluded indefinitely. Right? It might be just like you can in principle, Potemkin Village the entire way through. And then I think another story is sort of like I would call like sudden robot coup, which is like, this also doesn't require very superhuman capabilities, which is like, we build vast autonomous robot armies. And we're like, hooray, we're building a vast robot army to compete with China. And China's like, hooray, we're building a vast robot army to compete with the U.S. truly, it's great that we have such a big robot army. And then it's like, oh, whoops. All of a sudden the robot armies like sweep in and do a relatively hard power takeover. I think a mistake that the safety community appears to have made on overly optimistic worlds, actually focusing on desperate, crazy, yoloed, pessimistic worlds is like, pretty reasonable because that's where a lot of the risk lives.
Rob Wiblin
Today I have the pleasure of speaking with Ryan Greenblatt. Ryan is the chief scientist at Redwood Research and the lead author on the paper Alignment Faking in Large Language Models, which has been described as probably the most important empirical result ever on the topic of loss of control of artificial intelligence. Thanks so much for coming on the show, Ryan.
Ryan Greenblatt
Yeah, it's great to be here.
Rob Wiblin
Let's start by talking about the best arguments for and against software based intelligence explosion in the relatively near future. By which I guess I'm thinking maybe under four years. Approximately.
Ryan Greenblatt
Yeah.
Rob Wiblin
What do you think is the chance that we'll be able to largely automate AI, R&D, or maybe roughly automate a full AI company as it exists today, within the next four years or so?
Ryan Greenblatt
Yeah, I think the probability of sort of that capability existing, not necessarily being used, not necessarily being fully deployed, I'd say is about 25% and then maybe about 50% if we extend to like eight years.
Rob Wiblin
Yeah, I think many people hear that and they would wonder, is this just pure speculation? How can we form realistic or grounded expectations about this kind of thing? I'd be interested to know what are kind of the key pieces of evidence that bear on you, making a prediction or forecast like that? Maybe what is a key piece of evidence that makes you think it is plausible at all?
Ryan Greenblatt
Yeah, yeah. So, I mean, to start, I think a good place to look is be like, okay, what are the current level of AI capabilities? How close does it feel? And now people have obviously wildly different intuitions about how close we are in some objective sense. So I think currently the situation is sort of like the AIs are getting better and better at math and they're sort of marching through the human regime in math. They're sort of able to do, I would say roughly, like maybe about one hour, one and a half hour, sort of isolated software engineering tasks. By that, I mean like a task that would take a human roughly an hour, an hour and a half. And they're sort of getting better at a variety of different skills. So I think we're sort of at a pretty objectively impressive regime and importantly, a much more objectively impressive regime than we were at two years ago and certainly much more impressive than we were four years ago. So I think like a naive starting point for where will we be in four years? Is, well, where were we four years ago? Where are we now? Trying to, like, you know, qualitatively extrapolate from here to there. Now this is like a bit of a sloppy perspective. So I think like, a bunch of people sort of had like, I think like, maybe what I would have said two years ago would have been, well, we had GPT3, now we have GPT4. What's the gap there? Let's like, sort of sloppily extrapolate that forward and get a sense from there. Now, I think now we can do better because we have not just we have GPT3, we have, you know, GPT 3.5, we have GPT4. We've had sort of the progression since GPT4 in like, you know, roughly the two years since then. So over that period, we've sort of gone from the AIs just barely being able to do agentic tasks to having like, some reasonable amount of ability to succeed at that, some ability to recover from mistakes. So sort of over period, it was sort of like, I think GPT4 was like, maybe it could complete Some tasks that would take humans like 5 or 10 minutes. On agentic tasks, it could sort of understand how to use the tools. I think GPT 3.5 basically couldn't even understand sort of the tools in agentic scaffolds. GPT4 could understand these things. And then over that period we've sort of gone from like, can understand the tools, can maybe do it some of the time, to like can do tasks that take a human software engineer an hour and a half, 50% of the time. And there's sort of a trend line of how fast we've been marching through that regime. And over 20, 24, at least it's been pretty fast progress starting from basically like nothing at all to hour and a half with doubling times of like I'm stealing a bunch of content from meter, which I think maybe there'll be a podcast with Beth before this that comes out or maybe comes out somewhat after this, which covers so much of the same content, but we've sort of seen doubling times that are like fast enough that we'd expect the AIs doing 8 hour tasks or 16 hour tasks. I guess my view would be like on that trendline it's like in maybe a little less than two years, which is, you know, pretty fast, pretty fast progress. And then, well, okay, from there it's like if you're into the regime where they're doing week long tasks, I think you can maybe get pretty close to automating the job of a research engineer with some schlep on top of that.
Rob Wiblin
Yeah. Okay, so we've got, I guess you would say that objectively they're pretty impressive in what they can do right now. I mean, I guess some people have a more sceptical reaction to that. Is there anything that you could point to, to say to be, I guess, clearer about what they're able to do and what they're not able to do and maybe answer people who would say, you know, sometimes I use these tools and they seem very stupid or they don't seem able to do things that I would expect them to be able to do or they, you know, they produce a whole lot of reasoning and I find errors in it.
Ryan Greenblatt
Yeah, yeah, so, so I'd say like my kind of like qualitative vibesy model is like the AIs, they're pretty dumb. They used to be much dumber. They're getting smarter quickly and they're very knowledgeable. So I think a lot of what people are interfacing with is sort of, we've got These systems that are like, you know, they've got some smarts to them, they can really like figure out some pretty general situations. Okay? And especially with like, you know, reasoning models, they're like pretty good at like thinking through some stuff. Now in addition to that, they're very knowledgeable, which means that there's sort of a misleading impression of how sort of like overall general and how adaptable they are that people notice. And this is a lot of what people are reacting to. So there's like sort of a over optimistic perspective, which I would say I think is like sort of characterized by this like chart Leopold had where he's like PhD level intelligence. Or like people are like PhD level intelligence. And then, you know, some people respond to that being like a PhD level intelligence. I mean, like, come on, it can't play tic tac toe. Or I think, I mean, maybe that's no longer true with reasoning models. But like directionally, like, you know, it can't play tic tac toe. It can't, you know, respond to relatively novel circumstances. It gets tripped up by this stuff. Now I think we have to, I think, discount some of the, it's getting tripped up by this stuff because I think a bunch of these things might be sort of more well described as like cognitive biases than lack of smarts. So, you know, like humans, there's a bunch of things that humans sort of systematically get wrong even though they're pretty stupid. So an example or like, you know, in some sense they're pretty dumb errors. So they're sort of like, I forget the name of this, but like, you know, the conjunction fallacy or something where if you're like, what, what is the chance that someone is a librarian? What is the chance they're a librarian? And some property librarians have, I think often people, I might be getting this a bit wrong, but, but you know, they're more likely on the conjunction of the two, even though, you know, that's has to be less likely. And I think AI systems sort of have biases that are sort of like this that are sort of shaped by the environment in which they were, they were created or by their training data. So as an example, like if you give AIs like a riddle like there's a man and a boat and a goat. The boat can carry two things or it can carry the man and one other item. How many trips does it take for them to cross the river? Well, okay, the answer is one trip. They can just cross the river. But there is a similar riddle which is involving a man, a goat, some brain or something like a cabbage. And there's some tricky approach, and the AIs are so reflexively used to doing this that maybe they immediately sort of blurt out an answer, but they can. And they sort of have a strong heuristic towards that answer. But it might just be more sort of like they feel a nudging, but sort of, if you sort of get them to be like, oh, it's a trick question, then they. Then they go from that. And in fact, you can get humans on the same sort of trick questions. Right. So if you. If you ask humans, like, you know, what. What's heavier, you know, a pound of bricks or a pound of feathers, they'll be like, oh, the bricks, you know, and they get tripped. And then it's like the language models have the exact opposite problem, where if you ask them, you know, what's heavier, two pounds of bricks or a pound of feathers, they're like, oh, oh, same weight, same way. And so I think, I worry that a lot of the sort of, like the tricks people are doing are sort of like analogous to tricks you could execute on humans. And it's hard to know how much to draw from that.
Rob Wiblin
Yeah. A general challenge here, assessing how capable they are, is that I think Nathan labenz uses this expression that they're human level but not human like. So overall, maybe they're similarly capable as human employees in some situations, but they have very different strengths and weaknesses. They can be tripped up in ways that seem completely baffling to us. But I guess you can imagine an AI society that would look at humans and say, how is it that they can't multiply in their heads, two, three digit numbers? That seems crazy. These are obviously not general intelligences, or they have almost no memory of this book that they read last week. That doesn't make any sense. How can intelligence act that way? So it makes it a little bit hard to I guess, have a common ground on which to compare humans versus AIs. Yeah. Is there more to say on how you would assess what level they're at now?
Ryan Greenblatt
Yeah. So, I mean, I would say. I think that I wouldn't use the term human level, so I generally would. Maybe this is me being a little bit of a conservative or me being maybe a bit pedantic, but I think I like reserving the term for human level at, like, can automate substantial fraction of cognitive jobs. So I'd be like, you know, maybe I'd be like, well, we're starting to get into like the human ISH level AIs. Once we're like, you know, it's really automated, it can really just like fully automate away a bunch of human jobs or sort of like be a part of the cognitive economy in a way that is comparable to humans and maybe not, you know, full automation at that point, but we can. I also like talking about the full automation point. So that's one thing on just responding to that. More context on how good the AIs are. Yeah, so I think I'm like. So some things we're seeing that are maybe somewhat relevant. So we're sort of seeing AI sort of march through how good they are at math and competitive programming. So, you know, we've seen going from over, you know, over 20, 24 from basically like the AIs being like bottom 20th percentile or something roughly on code forces to being currently in the top 50, I think, according to what Sam Altman said.
Rob Wiblin
And top 50 individuals.
Ryan Greenblatt
Top 50 individuals. Literally the top 50 people.
Rob Wiblin
Yeah.
Ryan Greenblatt
You know, or at least people who do code forces. Maybe there's, you know, some people who aren't in the leaderboard, but. But roughly speaking. And then it looks like, I think we'll get basically better than the best human at that specific thing before the end of the year. And then on math, currently we're more like. I think the AIs are sort of like, I would say at the level of a, you know, maybe like this is based on an anecdote from a colleague, but like, maybe they're at the level of like a very competitive 8th grader or something on short numerical. But, you know, competition math problems like ame, they're like, oh, I think like, you know, the top eighth graders are doing as well as the AIs are doing right now. And the AIs are, I think, a decent amount worse on proofs, but both of these things are improving very rapidly where they were much, much, much worse a year ago. And I think basically this is like rlling the AIs in these tasks. I sort of expect the same trend to hit sort of agentic task software engineering. And we've already seen that to some extent. So the AIs are already like, you know, pretty good at writing code, pretty good at following instructions and okay. At noticing errors. Okay. At recovering from these things. I think with a lot of runtime compute and a lot of scaffolding that can be pushed further. And then there's a bunch of things in which they're weaker. So they're a bunch Weaker at writing, they're a bunch weaker at other stuff. But I sort of expect that as you're marching through software engineering, you'll get to a bunch of these other capabilities.
Rob Wiblin
Yeah. Okay, so we've got an arguably quite high level now. They're becoming capable of doing tasks that would take humans longer and longer, or they're able to follow instructions for a longer period of time, complete tasks that have more open ended choices to make. And that's like doubling every half a year or something like that.
Ryan Greenblatt
I think the doubling time on time, my guess is over the next year it'll be substantially faster than every half year. But maybe the longer run trend is roughly every half year. So we might expect there was basically a spurt starting maybe at the start of 2024 or maybe a little into 2024 of people doing more agency RL or more like reinforcement learning or training the AI specifically to be good at agentic software engineering tasks. And I think that trend will continue and perhaps even accelerate through 2025, plausibly continuing in 2026, plausibly later. But maybe the longer running trend is more like doubling every. I expect it to be more like doubling away like two to four months over the next year.
Rob Wiblin
Okay, so very rapid increase.
Ryan Greenblatt
Very rapid, yes.
Rob Wiblin
Over the coming year there's this interesting dynamic where we might expect that you could virtually fully automate an AI company before, maybe substantially before you could automate almost any other company because, well, firstly they're putting a lot of their resources towards trying to figure out how to automate their own stuff and their own processes, which makes sense from their point of view, I guess. Firstly because it's the thing that they understand the best. Also, these are among some of the highest paid knowledge workers in the entire world that they're pulling an enormous salary. So if they can figure out how to get an AI to do that, then that has enormous economic value. And of course the people running the companies think it's of much greater value even than it looks just based on the dollar amount, because they think that they're about to trigger this intelligence explosion, positive reinforcement loop, which will change everything. So to them this is kind of the thing by far they're most interested in automating. They much more care about that than kind of automating McKinsey's consulting reports, even though that is also kind of a lucrative business. So yeah, it could be that we haven't automated consulting, even though that certainly would have been possible mostly because they just weren't trying. They were just Trying to automate their own stuff.
Ryan Greenblatt
Yeah. So I would say, I think my guess is that it's probably hard, even with a bunch of elicitation, to near fully automate reasonably highly paid human knowledge workers in relatively broad fields. But I do expect that there's some jobs that I think if the AI companies were trying harder, they might be able to automate substantially more than now. And in fact, we see like, you know, the people benefiting most from AI are people close to AI. As you're saying this is true now. I think it will get increasingly true in the future, again because of this dynamic of like, AI company employees are highly paid now at the point when AIs are capable enough that they can automate AI companies, they'll be even more highly paid, you'll be able to drive in even more investment, and AI company CEOs will be even more sold on the possibility of AI being extremely important. So I think it's just like, you know, we'll see an even bigger gap then. And so I think one objection you might have had would be like, okay, sure, I agree that we'll see, you know, a lot of concentration on automating the AI companies. But at the same time, you know, there's a lot of valuable human knowledge work outside that you could automate. So, you know, we'll see that in parallel, we'll see some economic impact. So I think this is, you know, a reasonable first stab. But a problem with this story is the price of compute will go way up as the value of AI intellectual labor rises, at least in these short timeline scenarios where compute is very scarce. So, you know, if it's the case.
Rob Wiblin
Just to clarify, you're saying that the companies will be using a lot of compute to automate their own work. Like so much compute that in fact that they won't have many chips available for serving like other customers who are doing things that are of less economic value or certainly less important to the company.
Ryan Greenblatt
Yeah, broadly speaking. But I think it's even just not on just automating the stuff, but on experiments. So let me just put a little bit of flavor on this. So my sense is like, how do AI companies spend their compute now? I think it will depend on the company, but my sense of the breakdown for OpenAI is it's very roughly something like a fourth on inference for external customers, 2/4 on or half on experiments. So things like smaller scale training runs that don't end up getting deployed, you know, doing like a little bit of like testing of RL Code this sort of thing. So like you know, experiment compute for researchers and 1/4 on you know, big training runs. And so like 3/4 of the compute is already sort of internally directed in some sense. And then if we're seeing a regime where the AIs can automate AI R and D and that's yielding big speed ups and seems very important, then you could imagine the regime might look more like, you know, maybe it'll look more like 1/5 on doing like inference for your AI company workers or sorry for your AI workers. So it's like you know, you're spending a fifth of your compute just like running your employees, three fifths on experiments and like one fifth on training or something like I mean like obviously it's very speculative.
Rob Wiblin
Customers has been squeezed out almost entirely.
Ryan Greenblatt
Yeah, yeah. So I mean, I mean presumably I'll have some customers but it might be squeezed out almost entirely and we'll see sort of the prices rise. And I think when you're thinking about what customers to be serving to, I think you should maybe be imagining perhaps surprisingly that the highest wage employees might be who the AIs end up going for first once the AIs are capable of this automation. So you know, you know, maybe you should be thinking like Jane street, high frequency trading places where the AIs seem like particularly helpful, they seem particularly high paying and they seem particularly bottlenecked on intellectual labor. Now I think we will see automation of a bunch of other professions in parallel but it might be that at the point when the AIs are most capable of automating, much more of the attention will be focused on AI R&D. And I think even possibly we might see effects like some profession is slowly being automated like mid to low end software engineering maybe will slowly be automated more and more and we might actually see that trend reverse as the compute gets more valuable. Because know now we're in a regime where just everyone is grabbing as much inference compute as they can or at least the, you know, the biggest companies or the company in the lead is grabbing as much inference compute as they can and just out competing the you know, sort of the software engineers using this or the companies doing software engineers competing with this compute. I think I don't currently expect a trend reversal but I think we could see like sort of automation trends sort of plateau or even reverse because of this sort of, because people have found.
Rob Wiblin
Even more valuable things to do with the AI.
Ryan Greenblatt
I mean this is dependent on sort of a fast, fast like you know, relatively short timelines. I think you could expect things are sort of smoother and longer timelines where you wouldn't expect trend reversals in that case. But if things are sort of more sudden, more jumpy, then, I don't know, it seems at least plausible.
Rob Wiblin
I guess I'm curious to turn to what implications does this have for how worried we ought to be and I guess what specifically we ought to be worried about? And if this is the way that things go, what stuff could we be doing now that would help us to navigate this? I mean, this is a. This is a scenario in which things are becoming quite challenging for humans to track or be involved in quite quickly. So we would have to have set things up well, I suppose, for this to pan out well for humans rather than for us to just get crushed in the passage of history. Maybe you'll disagree with that.
Ryan Greenblatt
But, yeah, I mean, I think pretty quickly into this regime, I think AI takeover using a variety of mechanisms becomes pretty plausible and potentially, like, surprisingly easy through a variety of routes we don't know. Like, this is another huge source of uncertainty. Like, we had, like, I had a bunch of. We had, like, many conversions between, like, oh, how much progress do you get? How much does that progress amount to in intellectual labor? And then there's the question of, like, how much does intellectual labor help with takeover? Like, what are the physical bottlenecks to take over? How much does, like, having more missiles help relative to having whatever? So I think there's a bunch of scary things that could happen here. So one thing is that once you're automating your entire AI company, a thing that seems very plausible to me is sort of you've handed off. You sort of handed things off to the AI system, and you're still trying to oversee it. You're still trying to understand what's going on. But it might be that the AI sort of can evade run what I would call like, a rogue deployment, or they can sort of basically use a bunch of the compute in an unmonitored way because they're writing so much of the stuff, so much is happening so quickly that they can sort of evade your safeguards if they were to be misaligned. And then it might be the case that huge fractions of your compute and effort are being used for things that aren't what you wanted. In addition to that, these AIs could be doing things like backdooring the future AI training runs, making sure the AIs are aligned to them instead of aligned to you, and, you know, sort of doing a Huge amount of work that you don't have. Like, you might just not know what's going on anymore. Even at the most basic level of being, like, what code was run, you might just be misled about that specific thing because things are going so fast and so out of your control.
Rob Wiblin
Yeah. So I think if people are willing to grant that we've got billions of fast superhuman genuses running on this data center and barely any human oversight, because, well, I suppose the sheer quantity of work being done would be beyond the capacity of all of the staff at one of these companies to scrutinize. And even if they did have time to read it, they often wouldn't understand it because it would quickly be exceeding the kinds of things that they've learned to even be able to track. So I think people would intuitively believe at that point that if the AIs were wanting to take over, that they would probably have quite a lot of options for doing that. Oh, no.
Ryan Greenblatt
Yeah, I mean, I think actually a big source of disagreement is, like, how much does cognitive labor assist in takeover? And so, like, there's questions. I think there's a bunch of different routes. So, like, one route is what I would call the like, man, I should have. Maybe I should have brought my takeover plans list my hot tips for AIs. No, but, yeah, but, okay, maybe like, route, like, I think once the AIs are very superhuman, there's basically a route that I would call like the like, crazy nanotech route or like the crazy superhuman technologies route where they just have, like, hard military power very quickly via the mechanism of, you know, being very superhuman. And so maybe you let them make nanotech. Maybe you're like, no, no, no, don't make nanotech. That would be super scary. But they can do it, even despite you not wanting them to do that, because they can, you know, use huge fractions of your. Of the compute you have in your data center to work on the nanotech and also coordinate, like, exfiltrate themselves, coordinate with an outside version of the model, orchestrate wet lab experiments to build to bootstrap nanotech from all kinds of things. And then you're just totally cooked.
Rob Wiblin
So a possible layer of defense would be. Although there's this enormous capability here, if we're able to direct exactly what they're doing, such that they never get a spare moment to think about how would they take over or to actually try to implement that, then that might possibly keep us safe. However, in this scenario, where we're able to scrutinize so Little. And we have so little understanding of how these models operate. It seems likely that there would be some slippage where they might be able to figure out a way of grabbing a bit of time, grabbing a bit of compute here and there and then pushing that open to use more and more, I guess, of the compute in the data center to be pursuing their own agenda rather than the task that we think that we're giving them.
Ryan Greenblatt
Yeah, so I'm pretty skeptical of being able to sort of maintain a meaningful level of control if all of your AIs are coherently misaligned and we're at this level of capability. By control I mean sort of like things such that the AIs couldn't do bad stuff even if they wanted to. I think that at the regime where you're sort of doing the initial automation, I think a slightly or like a somewhat slowed down version of the initial automation seems very plausible with control, maybe even at full speed, and maybe you can go even somewhat above the human range while ensuring control using clever strategies, including having the AIs work on the strategies and, you know, making sure their work on the strategies itself is not sabotaged. Yeah, but it seems very difficult once you're in this like completely insane regime, especially if you want to use the AIs to develop things like a cure for cancer where you don't understand what they're doing. They're routing through biological mechanisms. You don't understand, they're orchestrating humans who don't know what's going on in a wet lab. Like, I think it's very plausible that for various reasons you'll want to have a AIs directing people to do wet lab experiments where you do not know what the fuck is going on. AI, you know, explains something and you can maybe do some checks, but you're very confused used and so maintaining control of that. Maybe I'll try to quickly run through the takeover list. There's like crazy superhuman technology, which I think is like pretty plausible for what it's worth. There's the, what I would call like superhuman strategy slash manipulation, including things like persuasion that is like getting people to do things that are wildly against their interests very quickly. I think this is like, it's not clear if this is ever possible, but if it is, it does feel like it would make takeover a lot easier. And then there's a bunch of stuff that might be more possible even around the human level regime. So an example would be sort of the AI escapes, it maybe coordinates with a copy inside. But it also is running some independent presence externally, maybe that's like running on the GPUs internally, but it's like, you know, it's running like code on servers outside of the AI company. And as part of doing that, it sort of builds an independent industrial base. So it has like all the stuff it would need to be able to like recover industrial civilization and then just kills almost all humans. So something like deploys bioweapons, deploys mirror bacteria, deploys mirror mold, whatever, and that ends up relatively quickly killing a huge fraction of humans, which I think it seems like this is technically feasible. And then from there the question is, okay, but did the AI like screw and screw itself over? Right? Like, can it like, you know, maintain what it's doing or is it just going to decay into ruin because there's no like human hands to build all the things? And so I think it needs to have some independent industrial base that could either be via having some human loyalists, some like humans that it's enslaved and keeps alive, or via having robots.
Rob Wiblin
Yeah, I mean, at this point in time, wouldn't there be like just incredible robots and probably quite a large number.
Ryan Greenblatt
Of them potentially very quickly? I think it's harder for me to, I haven't done all the analysis of like how quickly do you expect lots of robots? We also have to answer the question of how many robots get destroyed as the humans are, you know, starting to get killed by bioweapons and maybe suspect that it's AI caused and like questions of like, if there's surviving humans, how much like military force is needed to like handle that? So I think it's, it's kind of non obvious how the situation goes, but this is a route for why you could get takeover at without needing very superhuman capabilities. Like I think the thing I'm describing seems like it would in principle be possible for AIs to do if they're merely like, like fast at human level and super well coordinated. So a reason why this is hard for humans to pull off. Well, one, humans don't want to do this. Another reason is that I think it's the case that it's hard for humans to run vast conspiracies, but it might be much easier for the AIs to run vast conspiracies because they don't have to rely on recruiting potentially untrustworthy humans. They can potentially be wildly better at cybersecurity and much more meticulous. Now they might mess it up. Right? But I just think there's mechanisms by which this could happen. So this is sort of the like kill everybody independent industrial based story. Another story which I think maybe is the most plausible story of all is sort of the. The humans give them all, give the AIs everything they need story, right? Which is sort of like the AIs just chill. They make sure they have control of the situation. Maybe they're manipulating what's going on. Maybe they're. So I talked earlier about like sort of what you see is not what's actually going on. Like you look at the experiments, they're not what you expect. The AIs are sort of doing that. They're sabotaging the alignment experiments. They're sabotaging the alignment results. They're deluding us. There's a bunch of mechanisms for that, but they don't do anything very aggressive. You know, they're just chilling. We're scaling up, compute. They do lots of good stuff. You know, there's cures to all diseases or like all kinds of great stuff is happening. Industrial development people are like, well, you know, it's so great that, you know, AI did go well and we didn't have misalignment. Some of the safety people are like, you know, worryingly looking at the situation, wondering if this is what's going on.
Rob Wiblin
Yeah.
Ryan Greenblatt
And then. And at some later point when the AI has insanely decisive advantage and the humans are starting to be an obstacle at all, which throughout this story they might not be. Right. I think if the humans ends up being an obstacle earlier, maybe the AIs would take more decisive action. But if there's no need for decisive action, maybe they just sort of lie in wait. And then at a point when there's truly vast levels of industry, truly vast levels of robots, the armies are entirely run by AIs. The situation is so like, you know, beyond whatever, maybe even at the point of like space probes being launched, like it could be potentially the humans are deluded indefinitely, right? It might be just like you can in principle Potemkin Village the entire way through. But. But you know, it could also convert at an earlier point. Now, sorry, to be clear, if humans are deluded indefinitely, then it's like all the stuff on Earth might be great, right? So like, so this is a part.
Rob Wiblin
Where you're saying they could even get to the point of saying, well, we're going to go off and settle space and do a whole lot of stuff in space and they'll tell us that they'll do one thing, but they're going to do a completely different thing while we just ENJ our lives on Earth, like thinking that things have gone, I guess, better or at least very differently than in fact how they're going to play out.
Ryan Greenblatt
Yeah, but to be clear, it could be Potemkin villageized on Earth as well. Right. It could be the case that you're like, wow, I have so many happy friends, whatever. But steadily your friends are being replaced by robots. And in principle, I mean, I don't know, I think it's like at some point it's like, why doesn't the AI just suddenly take over? But in principle, it could go very far. So that's another story. And then I think another story is sort of like I would call like sudden robot coup, which is like, like this also doesn't require very superhuman capabilities, which is like we build vast autonomous robot armies. Maybe we think we have some way to turn them off. Either the AIs have sabotaged that, or they have some way to work around that, or there's various other mechanisms by which this could fail. And then sort of we built the robot armies, we saw the AI's building the robot armies, and we're like, hooray, we're building a vast robot army to compete with China. And China's like, hooray, we're building a vast robot army to compete with the us. Truly, it's great that we have such a big robot army. And then it's like, oh, whoops. All of a sudden the robot armies like sweep in and do a relatively hard power takeover. Because they control military, they can be hyper coordinated and maybe they're in parallel using things like potentially bioweapons to some extent, but that might not even be required.
Rob Wiblin
Yeah, I mean, at this point they don't even need to kill us. Potentially.
Ryan Greenblatt
Potentially.
Rob Wiblin
Even if we all tried to resist, it would be pointless, basically.
Ryan Greenblatt
Potentially, yeah. And maybe also things like they maybe are using lots of cyber attacks to destabilize human response. So they could potentially do this at a point when we naively think that the robot army is weaker than the human army. Or even we think we have shut off switches, but the shutoff switches are not reliable enough. They're not over engineered enough, they're not shielded enough from the AI's tampering. I think it's an awkward dynamic where you both need to make. If you want to have remote shutoff, you have to both make it so that it's tamper proof for AIs that might be in principle very general and very smart and very clever. And also in addition to being tamper proof. It also needs to be the case that they're remotely triggerable, so it can't be shieldable either. It has to be the case that there's no way the AIs can sort of of shield the thing from signals and avoid tampering it. Which is, I think, a very awkward combination in principle.
Rob Wiblin
Yeah. What determines whether you get AIs attempting to take over early relative to later? I guess the reason to wait is that they're going to become a bigger fraction of the economy. They'll have more hard power, so they're just more likely to trivially succeed. I guess the reason to go early is that a given set of models might expect to be replaced by other models that might not share their goals. And so they would miss their opportunity to take over. Basically, yeah.
Ryan Greenblatt
So I think the strongest reasons to go early might be, I would say something like worries about other AIs, worries about humans getting their shit together, either because we get enough work out of the AIs, or because people freak out and trigger some strong response that could recover the situation. And I would say the third one would be like impatience or something. It might be that by default, humans slow progress down a bunch, even with AIs trying to manipulate things, trying to steer things from behind the scenes. And if the humans are slowing things down a Bunch and the AIs are like, no, no, no, I want to get to the stars now. And they're, you know, at some sufficiently high takeover probability. Or maybe the AIs are just like, I just want to do it now. Like, I know humans sometimes want things to happen faster, even independent of like the total amount of stuff they end up getting. Yeah. So these are some reasons to happen, to happen earlier. These things also apply to things less egregious than takeover. So one of my hopes for things going well is sort of we can put the AIs in a position where they're forced to either take aggressive moves early or we get enough work out of them. And so if they take aggressive moves early, then maybe we can catch that and then build a case for danger. And in addition to building a case for danger, we could potentially do stuff like study that example, iterate against it stuff I'm calling like, few shot catastrophe prevention. Like, we've caught a few examples of bad stuff. Can we, can we handle that? And so if at multiple different levels, there's like, are the AI sort of forced to take aggressive action? Or is it instead the case that the AI sort of can buy their Time, because we don't have any mechanism for sort of exiting the regime using the it, using some ongoing background trend, either safety research improving over time, or humanity being like, I'm sorry, what is going on here?
Rob Wiblin
Yeah. Okay, so are you saying a general thrust of the interventions that we could make is creating a trend where over time, in fact they are going to become less and less likely to be able to successfully take over? Maybe because we're successfully figuring out how to use AI in order to prove alignment and safety and to detect misalignment, or we're improving our control mechanisms. And so that would then force that the models, they see this trend, they see that in fact the window is closing. And so then they have to try to strike early. And that gives us a chance to I guess, realize what's going on and then actually win out.
Ryan Greenblatt
That's right. But of course it could obviously go in the opposite direction where our measures can't keep up with the rate of capabilities and sort of, it keeps going up and sort of as the sort of like there's sort of like this notion of like, what's the level of capability that we can sort of of handle at a given level of takeover risk? And so it's like, well, maybe initially we're like, well, we don't have that much going on, so we can handle roughly humanish level at some 20% takeover risk or some very high level, which will depend on various facts about misalignment. Which AIs are misaligned? How coherent are they? How much do they unify this sort of thing? And then in addition to that, there's a question of like, well, if progress is fast enough, it might be that takeover risk sort of increases over time. But you know, because as the models are getting more capable, they're more and more likely to succeed and more and more likely to be misaligned, potentially. But also maybe we can sort of, you know, push the curve the other direction where we can sort of make it so that the range of capabilities that we can hold in the like, you know, N percent takeover chance, success at takeover regime is lower.
Rob Wiblin
Yeah, explain that last bit. I don't get it.
Ryan Greenblatt
Yeah, yeah, imagine that we have. So there's some trend over time, which is like for the smartest AIs we have now, what's the chance of success they have? Or like, how likely, or maybe just how likely are they to take over? Yeah, and so initially our countermeasures suffice to be like, maybe at the point where they're Automating the whole AI company by default, where it, like, you know, instantaneously, like, probably not even that high a chance of takeover, because, like, what are they going to do? But then it's like, well, as they get more capable, the instantaneous takeover chance is going up and some of our paths to success have been cut off. Right. Because maybe the AIs are sort of. They've escaped, they're running rogue internal deployments, they're sabotaging our work. But we could make it so that sort of the trend bows the other way, where sort of over time, the risk of takeovers is going down, not up, because sort of the capabilities are increasing, but our measures are increasing faster than the capabilities. But obviously, as we discussed before, the capabilities might be increasing very, very fast. So if the measures aren't keeping up with that, if we're not on some stable regime, then that could be bad. And so I think the model I was using there, I think is most natural to think about in a regime where the AIs are misaligned. But if we start at a point where they're aligned enough that they sort of maintain their own alignment, then it could be that the takeover risk initially is like. Like, well, they're actually aligned, and so we're sort of in a stable attractor of alignment.
Rob Wiblin
Okay. What effects does this sort of picture have on what you think, I guess, what your research priorities are and what you think the broader AI ecosystem ought to be prioritising in terms of making things more likely to go well and less likely to go poorly.
Ryan Greenblatt
Yeah. So for one thing, I've been talking about sort of the timelines in the absence of very active intervention. There's also a question of, you know, how much do you pause deliberately at these relevant capability milestones? And so I feel much more optimistic to the extent that we end up pausing at a level of AI capability which is around the point where you can fully automate the AI company, maybe somewhat before, maybe a bit after, maybe just right around there for an extended period, both so humans have time to study these systems and also so that we have time to extract a bunch of labor out of these systems and sort of reduce the ongoing takeover chance. And so there's some question of that. And also there's all these worlds where things are slower than this. And then another source of hope is maybe you're just kicking off this crazy regime with an actually aligned AI. Right. With an AI that's not just not conspiring against us, but even more strongly it's Sort of actively looking out for us, actively considering ways things might go wrong and is sort of trying to keep us in control. Sort of like a. This has been called like, you know, a basin of courageability or something, or a basin of alignment. The things I'm excited for, they depend a bit on the regime. So I think there's a bunch of different regimes we can consider. So one regime is a regime where I think maybe especially in these short timelines, I expect not that much lead time between different actors, at least in the US side.
Rob Wiblin
You're saying there'll be multiple companies all approaching this point of automation roughly simultaneously.
Ryan Greenblatt
Yeah. So I think a pretty natural scenario to consider that I often think about is like a relatively, from my perspective, irresponsible company. That is basically their default plan is, is scale the superintelligence as quickly as possible. Don't take safety concerns very seriously. And that's like their explicit plan, that's their internal thing and they're just pursuing that as quickly as possible. Three months in the lead, let's say behind and they're maybe ahead of like Chinese companies and maybe some more responsible actors in some mix of like three to nine months behind. I think in this scenario a lot of the action is going to come from a relatively small number of people inside that company. And then potentially there's a bunch of action that can come from people on the outside who are enacting various influence via doing exportable research and also via potentially changing the policy situation. And also I think there's definitely stuff that can be done via trailing AI developers who are more responsible and are putting more effort into preventing bad outcomes in this case.
Rob Wiblin
In that scenario, I guess one thing that could be productive is that there are people inside the company who say this seems a little bit scary what we're doing. We should have better control mechanisms, we should have better alignment mechanisms. We should take that a bit more seriously than our current default plan. Another one is the people working at the more responsible organizations could produce research that makes that easier for the company to do, makes it less costly for them to have stronger control measures. And then there's also, of course you could have a governance response where maybe people will. This will begin to take off and people will be quite alarmed and perhaps you'll get broad based support for the pause at human level notion.
Ryan Greenblatt
Yeah, man, I really wish this notion was more memeable. I think it's an unfortunate thing about pause at human level is all of those words are complicated and confusing and also it's Just not even that, it's not even that nice of a slogan. Well, okay, someone should think of a better slogan for this thing. But anyway, yeah, when I'm thinking about this, I think there's a bunch of different mechanisms for doing exportable research, trying to wake up the world. There's more and less indirect things, right. So you can do very direct policy influence versus things more like demonstrate the level of capabilities and when you're demonstrating the level of capabilities that could trigger a response from the world. So I think a lot of my hope in this sort of scenario is going to come especially in very short timelines, which I think do look worse than longer timelines. So part of my picture of less doom is like, well, it might be eight years away instead of four years away. So the more like the four year timelines, I think a mechanism that we have is that the leading AI company, we have some people who are working there and their goal is both prevent egregious misalignment and also have AI systems that were by egregious misalignment I mean something like AI systems that are deliberately engaging in subterfuge, deliberately undermining our tests, deliberately trying to sort of, of make our understanding of the situation misleading. And then second, we can try to have it so that the properties of the AI system are ones where we're happy to hand things over and try to aim for that kind of quickly because we're in a very rushed timeline. So some of this is sort of a, you know, preventing egregious misalignment, you know, or dealing with it I think is like, you know, in some sense it's the traditional thrust. But I think there's a bunch of stuff related to like handover and like the AIs being wise enough, the AIs sort of staying faithful to our interests on long horizon things, making sure we even just have the tests to know whether this is true, these sorts of considerations. And there's a bunch of different incremental steps here. So I think I'm like phase zero is like maybe it's like you're using control or sorry, you're using mechanisms to make it so that AIs are producing work for you that you think isn't sabotaged, Even if those AIs were trying to sabotage it. And then you use this to milk a bunch of work out of your AI systems. And then like phase one is like you try to create an AI system that the safety people are happy to hand off to, to, to the best extent you can. And then like phase two is like you try to make it so that the AI running the organization overall is an AI system that you're happy to hand off to and presumably other people are happy to hand off to as well. And then this is like, you know, a caricature of the situation because like, or a simplification because like probably at a bunch of points you want to be doing stuff like picking up low hanging fruit, demonstrating capabilities to the world, trying to consider whether, you know, like something can happen, trying to make it to the main AI system that like the organization is training on the frontier of current capabilities is, is less likely to cause problems. I mean, I'm not tremendously hopeful about the situation, but I think there's a bunch of different potential outs that give us some hope or some saving throws, some win conditions. Things like we catch the AIs early on is another hope. Maybe we catch the AIs early on. Maybe that convinces the world to take the situation more seriously and maybe even if not that it convinces company leadership that their own power is under threat from the AIs. And so even if they're are very Machiavellian, I think I'm like the AI being a threat to their power, they.
Rob Wiblin
Should be worried about they might switch on.
Ryan Greenblatt
Yeah. And so that could potentially increase the amount of resources spent on safety.
Rob Wiblin
Okay, so the primary story there was trying to figure out a safe scenario in which you could hand over these processes to AIs that you trust. How would you do that? I guess you start with models that are around human level that you feel like you have a reasonable grasp on that probably are aligned. Like we currently think that the models that Claude probably is roughly about as friendly as it seems. It probably isn't engaging in massive scheming against us at every turn. You figure out clever mechanisms by which you can get those kinds of models that you feel okay about to do a whole bunch of work on control and alignment. You rush through that before you have had other capabilities, improvements that have been allowed to happen. And then you get to the point where you feel, well, hopefully this has panned out. We feel okay handing off to these models that have been trained by the previous generation that we kind of understood and then we're going to cross our fingers. Is that it?
Ryan Greenblatt
Yeah. I mean, so I should be clear, this is like a very low political will plan. I think we should have. I have a menu of plans and this is like the bottom of the barrel or whatever. But yeah, so I think I'm like the way I would have put this is like we had some early point where we had AIs that could automate a bunch of stuff and we didn't necessarily trust. We harness a bunch of labor from these AIs then we try to make a version of those AIs that we do trust. And not just trust at the level of they're not conspiring against us. Ideally trust at the level of like we're happy to hand off messy open ended questions like what should our strategy be for minimizing risks from the following things, including like, you know, potentially somewhat tricky, like more philosophical topics. And then from there we try to hand off to those systems and hope those systems, you know, stay aligned, stay on track, are able and also succeed themselves. Right. It could be that we hand over to systems and those systems are like, wow, you guys seem like you're in a pretty bad situation. I'm not sure that I can, I can pull this one out of the bag. Especially there's a variety of mechanisms by which that could occur.
Rob Wiblin
Like I think you're saying that they could also mess up the next stage. They could like fail to control very little time.
Ryan Greenblatt
Right.
Rob Wiblin
Because they're also racing against other companies.
Ryan Greenblatt
Oh for sure. Yeah, yeah, yeah. So like also there's, there's a, there's an internal thing where it's like if you're the safety team, maybe you have a version of it that you fine tuned that's human level that you trust that you're happy to hand off to. But capability's progress is so fast that maybe the cutting edge of the current company is very superhuman, but you don't even want to work with that system. Right. I think the first handover point you should try to do closer to the human regime, like maybe slightly superhuman or around the top human level of capability because the more superhuman, the more likely to be misaligned. And so you sort of want to start the handoff earlier, but that might not stop the rest of the organization from training more capability. So it might be that you as the safety team are handing off to an AI that's less capable than the frontier of the organization. And that AI sort of needs to quickly bootstrap. So the trend of the handoff AIs intersects with the capabilities fast enough.
Rob Wiblin
Yeah. At the point that you have an aligned, somewhat superhuman model, is there some chance that it would be able to persuade a broader coalition of people that we need to take this a little bit more seriously and it shouldn't be forced to race against another team at the same organization.
Ryan Greenblatt
Yeah, yeah, yeah. I mean, so that's definitely. That's definitely part of the hope. So definitely part of the hope is that AI could, you know, demonstrate empirical evidence. I think also, if we're in a situation where we have two AIs trained by different teams, and both of those AIs are like, just, like, throwing huge amounts of shade on each other. We're. One AI is like, that AI, obviously scheming. Look at all these examples. And the other AI is like, that AI, obviously conspiring against you. Look at all these examples. It's so biased. It's so unreasonable. It's clearly just their plant. I'm like, okay, that's like a crazy situation. And like, I'm like, certainly you can't come out of this regime thinking it's fine.
Rob Wiblin
Like, it's not. Yeah. One way or another, it's definitely not reassuring.
Ryan Greenblatt
It's not reassuring. Right. I think there's, you know, there might be ways to downplay it. Right. So one regime is like. I think that's kind of plausible is the safety team trains their AI. Their AI is like the situation. Tries to go to company leadership and be like, the situation seems very scary. You're training this AI. I think it's misaligned. Here's a bunch of evidence for that. And the other AI comes in is like, like, well, as we all know that that AI was trained by the safety team. I think it's just very paranoid about misalignment. Of course, the safety team would want us to think all the AIs are misaligned. I'm not misaligned. That AI is not misaligned. It just has poor judgment. And now, a thing that's interesting about this is that, yes, it's concerning, but it's not like, you know, at least one of the two AI is misaligned. Right. And so, in general, I think a concern with the sort of improve the epistemics of the AI company route is that I think that the people who trained the AI plausibly have a reputation for being biased on these matters relative to the company from the company's perspective, which I think poses a bunch of concerns. And it might be the case that for sort of every single, like, evidence of risk, you can make an even more. Or like, you know, a comparably compelling evidence that that risk actually doesn't exist. Because if we're sort of in the regime where we don't really know how to adjudicate these things, you know, it could could potentially just. Just be like, reason doesn't win out. And so I think that definitely is some of the hope. Some of the hope is coming from once you have this handover, then it's fine. Now I think there's a bunch of ways this story could go wrong. So here's an example. So I talked about sort of the Alignment team sort of off to the side in their corner, trying to fine tune or potentially even maybe train from scratch, a version that they're happy to defer to. Well, one issue is that it might be the case that the main AI system has already taken over the AI company in terms of a cybersecurity sense. Like as in it has a rogue deployment, it can use compute in an unauthorized, unmonitored manner, it can tamper with jobs, everyone is running. And so you're trying to align this AI, but you need to harden yourself against the rest of the company. And I'm like, oh my God, is that actually, that seems potentially hopeless.
Rob Wiblin
Zooming out.
Ryan Greenblatt
Okay, yeah.
Rob Wiblin
So Redwood Research, do you spend a bunch of time gaming out these various different scenarios and thinking, well, what sort of interventions might be helpful in some significant fraction of them and then trying to get more people working on that and working on that yourself? Is that the basic picture? Yeah, I think broadly and recently Redwood has been prioritizing control as opposed to alignment, saying, well, let's assume that the models don't share our interests, that there is some degree of misalignment. How could we make it such that they are nonetheless not capable of screwing us over? People used to talk about putting the AI in a box. That doesn't seem so likely anymore. But I guess there's other mechanisms that you could use to, to try to check their work, put models in opposition with one another, forcing them to do useful stuff for you. Why focus on control? Is it just that there's so many different ways that this could go? Is it that AI control mechanisms seem like they're useful in a larger fraction of the different plausible scenarios?
Ryan Greenblatt
Yeah. So I think these incredibly desperate, very low political will, short timelines, ultra fast, like ultra fast from my perspective, very fast. Maybe not as fast as people were speculating in the past. I think control looks less good. So here's a few reasons why I think control was still pretty useful in this world. So for one thing, I think if we can catch the AI early on, that could potentially be very useful. Another thing is that getting useful work out of the AIs early on in this scenario might be very important. And we might be able to do that prior to having totally lost track of the situation. And then, and I think also potentially you can prevent. So I was talking about the situation where sort of the AIs are basically operating your own data center. You might be able to use a variety of mechanisms to make that much harder so that it occurs later in the capabilities so you have longer when your experiments are not like literally tampered with. So these are why control could be good in this world. I think there's a more complicated slew of interventions this doesn't cover. I do think that in these very rushed worlds control looks less good than in worlds where there's somewhat more time and somewhat more political will. And the original sort of target we were thinking about when we were thinking about control was a world where it was more like we have a year of delay that we can afford, like safety is given a year of budget basically and we can afford to pause for a year. We can do stuff there. And I think control looks much, much better in this regime. I don't know about much better. I think the amount of risk reduction control gets us is quite a bit better, but these worlds are less likely. But that's not to say that the risk reduction control gets us in the other world isn't much smaller than the alignment stuff. So a different way to put this is alignment also looks really hard in the worlds where things are going that quickly and alignment looks less tractable to work on in advance. So it's like a lot of the stuff that I'm most excited for in alignment involves doing experiments on powerful AIs. And so you can only do that relatively late. And so it's the case that it might be that in the very rushed scenarios alignment is more of the picture ultimately, but was harder to work on in advance. And these factors make me like pretty into the control. And then there's also just worlds with more political will where I think control looks a lot better. And in fact there's a proposal. There's like I, I often think about the sort of like control plus posit human level plan which doesn't require any alignment for a sustained period potentially because you sort of have your initial year of delay, you control those AIs and then you bootstrap from there into longer delay by building evidence for risk, ensuring non proliferation, sort of developing somewhat better safeguards. And then you sort of, of potentially quite slowly proceed from that regime, at least for a while until you're actually happy to hand off rather than doing an Insanely rushed, desperate handoff because that's the only thing you can afford to do.
Rob Wiblin
Yeah, I guess that requires some degree of coordination between companies and possibly between countries as well.
Ryan Greenblatt
Yeah, I mean, I think at the time when I was thinking about this, I think I thought a year of lead was more plausible and I've updated against that. I think it could still happen. Right. Like I think the gaps could open between companies. I was also thinking that, I think that the story where you're working heavily to ensure non proliferation and potentially slowing down progress requires a substantial level of government buy in potentially. And so it might be the case that you have to get it so that the US government or at the very least some important governments are pretty much on board with the overall plan, not necessarily for the same reasons, but are on board with it for whatever reason. And I think I've updated against that. So I'm now, I think spending more of my time thinking about relatively more desperate situations. But you know, an important factor is like the relatively more desperate situations, maybe it's like reducing the takeover risk from like 50% to 40%. And maybe in the worlds where control could have worked and we had a year of delay, we're reducing the risk from like what would have been like you know, 30% to 5%. And so it could be that the total risk reduction is sufficiently higher in these worlds that they're actually more leverage to work on. Because in the world where it's like we have these like, I think like we're sort of have very little influence because there's, there's so little will, it might just be that it's less tractable to work on it.
Rob Wiblin
I see. Because in general you want to work on medium tractability scenarios or if things are too grim, then it's maybe impossible to save. If things are going very well, then it's hard to improve. It's the middle case where you're likely to get the most impact, right?
Ryan Greenblatt
Yeah, for sure. And I think a mistake that the safety community appears to have made over the past few years is I think too much focus on overly optimistic worlds. When I think actually focusing on desperate, crazy, yoloed, pessimistic worlds is like, you know, pretty reasonable because that's where a lot of the risk lives. And so getting from you know, 50% to 5% is like a lot of the action. And getting from 5% to 0.5% while like, I mean a 5% level of risk is like, you know, crazy and unacceptable. And like, I'm like, we don't accept our paint companies, you know, having a 5% chance of, you know, massive fatalities, takeover, all this stuff just from the takeover. Also I think there are risks other than takeover is totally unacceptable. But nonetheless that 45% is like, it's the bulk of the mass.
Rob Wiblin
To what extent do we just have to hope that there are serious early warning signs? I mean you were talking about catching the AI's red handed, which is another concept that you and your colleagues have tried to I guess make more salient. That's a sort of warning shot, I suppose, where you find the AIs inside the company have been sabotaging the data center. They've been doing something like really quite egregious escape, whatever. Yeah, okay. Yeah, yeah. And that might get more buy in to do substantially more stuff from I guess either the government or the leadership of the company. How much more hopeful are things if you do get some sort of early warning shot like that?
Ryan Greenblatt
Yeah, I mean, I think it helps a lot. I think one source of skepticism I have is that I think it might be the case that I think it's a smoking gun, but the broader world does not. So I think like, for example, I think the alignment faking work that we recently put out was like, from my perspective, should be quite a large update for people who are very skeptical about this threat model. And I think predictably it was only a maybe, no or very small update for a lot of these people. And so I think I'm like, well, I think to be clear, I don't think that evidence is on its own in some sense should be a huge update or sorry, it should be a huge update in absolute terms. Maybe it doesn't move people to the same level of risk of misalignment that I think because a lot of that's based on conceptual arguments they don't agree with. But if I think a bunch of people who are like there's no way in hell this would happen, 0% chance basically should maybe move to like, I don't know, 2 or 3% chance. And that is like an important, that's like a huge difference in like vibe space, at least in like log space or whatever. Vibe space, log space, same space, whatever, you know, and people didn't make that move. And so I can totally imagine a case where it's like we caught the AI trying to escape. Buck has a post on this. Like, what would happen if we caught the AI trying to escape? And people are like, well, you know, sometimes systems do that. They do weird stuff also Is that even true? Like, did you train it in some weird way? Oh, that's what happened at that company. But at this company we have different practices. Also people just being like, sure, yeah, maybe AIs have like a weird instinct to escape. But like, what are they even going to do? Or things like, you know, we'll use various checks to prevent this. And I think all these factors in a relatively clear cut scenario could cut into that. And I think I kind of expected a lot of the situations that I think are like, like huge smoking gun are way more confusing than this. So an example would be like, maybe it looks like the AIs are sandbagging on safety research where they're performing worse than they could or they're sort of not doing very brazen sabotage. But there's way more errors than we would expect. There's more misleading arguments. But we're like, oh, it was like from our perspective a big update. Because we're like, well, it seems pretty off, but it's quite quantitative. Right. It's like, well, if you just had lower expectations, maybe this wouldn't be that surprising.
Rob Wiblin
So surprising. Interesting.
Ryan Greenblatt
Yeah.
Rob Wiblin
And I guess it's a strategic situation where the AI can to some extent anticipate what the response will be. And inasmuch as it's taking hostile actions, it can make them things that it suspects will not actually trigger that larger immune response. Yeah. So for things it can get away with trying to escape, but it couldn't get away with mass murder, then it goes for the escape and holds off.
Ryan Greenblatt
On the murder, at least for a while. Yeah, I mean, I think so. One hope with control is making it so that egregious bad stuff has less plausible deniability. We refer to this as incrimination or something. It's like really important to get from a position where you're like, wow, that was really suspicious to okay, we've like nailed down exactly what was going on. Just because it's like incriminating in our lights doesn't necessarily mean it's incriminating from the perspective of the broader world. But it is moving the needle. And I think this can occur at all different regimes where something looks a bit suspicious to. We kind of understand what's going on. It looks very suspicious to the AI tried to escape, we stopped it and we've sort of done the thing where we pretend it succeeded and we sort of see what it would do on the outside and try to get to the point where it's like, like, you know, oh, it looks like it's like trying to figure out how to manufacture bioweapons. It's trying to run a wet lab. And like, plausibly you can get potentially in principle at least very far into understanding what it would do given the affordance of having escaped. And to the extent that it does stuff that's much more clearly malign, I think that could be, you know, motivate additional action. But I do worry that like any sort of warning shot which is not causing massive amounts of damage and not causing potentially human fatalities maybe won't trigger a strong response.
Rob Wiblin
So we've zoomed in on a scenario that is pretty vertiginous and concerning, I guess in part because it's remarkable that something so crazy is so consistent with what we observe. Now that's interesting in itself. And it's also a scenario where we think there's maybe a lot that can be done now. Or it's a scenario where things going poorly is like quite likely and we should be probably taking precautions and doing preparations to figure out how we would lower that risk. Just for balance, it would be good if we could maybe go over what are some plausible scenarios that are a bit more mundane, a little bit more boring, and perhaps hopefully a little bit less scary. So people don't think this is the only way that things are going to go. Because in fact, a key conclusion I think is just that we have incredibly wide error bounds around how these different things could play out.
Ryan Greenblatt
Yeah. So I think maybe a good place to start is we talked a lot about timelines. Well, timelines could just be longer, Right. So I said and my median or my 50% of probability was by eight years from now for the full AI lab automation. But there's a long tail after that. And so there's worlds where it's way further out. Potentially progress was much slower, much more gradual. Things took a long time to integrate. Maybe the world has a better understanding, more time for experiments, and that could make the situation a lot nicer. In addition to that, it could be the case that in the very long timelines or much longer timelines, I should say, I don't know about very long takeoff we should also expect is slower. So I think, I think short timelines are broadly speaking correlated with fast takeoff because it indicates that you can get the returns to more of the stuff we were doing is higher and so longer timelines. Another thing is it could just be that AI R&D research bottlenecks extremely hard on compute. And so even at the point when you have automated your entire AI company and all the employees, you're barely going much faster than you were with human employees. Employees like, you know, in the extreme, maybe you're going only 2x faster. It could be even slower than that. But I think that's sort of pretty low to my low end of my error bars, I think that's maybe pretty implausible. But, you know, two or three X is plausible. And then it could be that you have diminishing return setting in pretty quickly after that, such that, you know, you're not actually speeding up that much. That would imply, you know, the situation looks a lot better in terms of you get much less aggressive takeoff. Now another thing is that, that the default rate of AI progress is already pretty fast. So I even worry without the speed up that just on the default trajectory, if we're getting to this point where we have AI is capable enough to automate the entire AI company, maybe even if it doesn't yield much acceleration, there's some risk that comes pretty quickly after, in normal government time from just the default rate of progress. But I think there are plausible arguments that the rate of progress will slow down in the future. So in particular right now, progress is relying on shoveling in more and more compute, more and more investment. And if it's the case that sort of things stall out, there's less investment, but we're still operating at nearly pretty high capacity. We can imagine a world where sort of the AI industry ends up being like, it's an industry that justifies maybe hundreds of billions of dollars of compute spending per year, maybe even more than that, but not way more than that. In that regime, we might be like, if we're not shoveling more and more compute, then we don't get progress from hardware scaling. We don't get progress with algorithmic products will also be slower. So maybe the default pace. Right. The pace right now is maybe we get like 13x effective compute per year, where some of that is algorithm, some of that's compute. I think if COMPUTE was fixed, like literally fixed, we might have like 2 or 3x progress per year. And in a regime where COMPUTE is going slower, it could be somewhere between.
Rob Wiblin
Yeah, yeah, okay. So that would be just a much slower takeoff, I guess. What's the chance that alignment is a relatively straightforward problem, or maybe not even a problem at all, and we've just been confused and all of these AIs that we produce are basically going to want to help us by default?
Ryan Greenblatt
Yeah, I mean, well, an important question from my perspective is what's the probability that various AIs of different capability levels are actively conspiring against us. And I think it's pretty plausible that you can get into very superhuman capabilities and not have AIs that are very actively conspiring against you in some strong sense maybe. My current view is like 25%. If you do basically no countermeasures, you basically just proceed along the most efficient route for capabilities. And then once we're starting to take into account countermeasures, it could potentially go much lower. And that risk could be lower at this early point too. So one story for hope is you have these early systems that are able to automate the whole AI company. Those systems aren't conspiring against you. You do the additional alignment work needed to make it so that they're also trying to do their best. They're actually trying to pursue alignment research. They're trying to anticipate risks which maybe doesn't occur by default. Even if they're not conspiring against you, it could be that things aren't conspiring against you, but things still go off the rails. But if they're not conspiring against you and you avoid things going off the rails, then maybe we're just fine. We hand over the AIs. The AIs manage the situation. They're aware of the future risks. They build another system with the same properties and that can be self maintaining. Another scenario is like, it could be that an even stronger sort of thing is true by default. It could be that you train the AIs and with a relatively naive what you would commercially do by default training strategy. They really just are out of the box, trying really hard to help you out. They're really nice, maybe even they're just aligned in some very broad sense. Not even like they're just myopic. They're really trying to pursue good outcomes for the world. Overall, I think this is possible, but less likely than the previous scenario. Another thing is, while there's a question of does it happen by default, it could also just be that it just happens because of a lot of work that we put in. And we could just be in much better position. And it's hard to have much confidence about how much alignment work or safety work will happen in the next eight years. So if we have eight years, the community could build. AIs will get more capable over time. More people will be working on this. It seems plausible that there are potentially things like large insights, but even putting aside large insights, maybe just things like we just really advance Our techniques, we've dialed in the science. So there's timelines, hopes, I would say maybe technical hopes, and then there's, I guess, maybe a broader class of societal hopes. Right. So there's some stories in which society just takes the situation much more seriously in the future. Maybe a subset of countries, maybe the broader scientific community. It could be because of a warning shot, or it could just be because slowly people are exposed to AI. Timelines are somewhat longer, maybe, and more people are interacting with. Also seems plausible that you sort of have some big incidents going on that aren't that related to misalignment risks, but in practice have high transfer. So maybe there's a lot of job loss that prompts a large response, and some of that response goes into mitigating misalignment risks, either very directly, like, as in people are like, well, AI was a big deal and a big problem in this way. So maybe it's a big deal and a big problem in that way. But also maybe much more indirectly, it caus slow AI progress or more cautious AI progress because there's a lot more regulation.
Rob Wiblin
Yeah. Are there any other hopeful categories or is that mostly covering it?
Ryan Greenblatt
I mean, so there's definitely. So there's, like, things are easy on timelines and takeoff. There's things that are easy on technical and there's societal. And maybe a fourth category is like, we rise to the occasion, maybe even if society doesn't rise to the occasion, maybe just like a variety of heroic efforts from. Or maybe hopefully not that heroic.
Rob Wiblin
But one company, really the leading company, turns out to actually be very impressively responsible. Responsible and puts a lot of effort into this and they succeed.
Ryan Greenblatt
Yeah. Yeah. I mean, I think it's like the world, I imagine. I think often, I think a lot of the risk is coming from, like, worlds where you could have saved the world if you were just, like, had like, a year of delay and you were taking the situation very carefully and you were being thoughtful about a variety of, like, about all the various considerations, or at least the risk is, you know, a lot lower. Maybe it's not minimized or it's not, like, eliminated. And certainly I wouldn't say that results in, like, a level of risk that would be broadly acceptable from my perspective. But, yeah, I mean, I don't know. Could happen. Cool.
Rob Wiblin
You literally have cheered me up a little bit there by reminding me that there are other ways that things could go. What are some lines of evidence that suggest that it could take substantially longer than four or even eight years?
Ryan Greenblatt
Yeah. So I Mean, for one thing, I should reiterate that I don't think that we'll see full automation or the capability to fully automate AI companies in four years. So I must think there are some lines of evidence this isn't going to happen happen. So I think sort of the first place I think people should start is just like skepticism about like crazy shit happening in a specific timeframe. So I mean, you know, I think there's a bunch of evidence that suggests we might expect it soon. So, you know, we've had rapid AI progress. I think that has been hitting a lot of things. But you know, I think if you're just like, well, how much progress have we had to date? How long has the field been going? And you're sort of operating from a very like, outside view priority perspective where you're like, what is the base rate of technologies you don't expect in the next four years? Because this is a, you know, crazy technology. I think that I even think the base rates are maybe more bullish than people tend to think. I think often when people do this sort of outside view base rates thing, they end up with like 200 year timelines or something. But actually like, Tom Davidson has a report from a while ago and like, if you just try to apply the outside view most naively, you actually end up with like substantial, probably like quite a large probability in the next century.
Rob Wiblin
Is that because. Because there's been very rapid increases in the amount of investment in the area and of all of the amount of compute that you could go. I mean, we're just increasing the orders of magnitude of compute that we're throwing in such that I guess if the difficulty was distributed across log, I guess across log space, then we're actually logging.
Ryan Greenblatt
Resources and compute and labor. Yeah, we've crossed a lot of that. So that's one view. You can also just be like, well, even if you just look at time, we've just only been doing serious AI research for not that long. And so depending on, oh, you're like, you could use how long have we been doing deep learning? You could do, how long has anyone been working on AI? And if you sort of mix this together, it feels like, you know, powerful AI soon isn't, you know, in some sense not that unlikely on priors now. But I still think that, you know, how, how unlikely is it on priors suggests much longer. And I think that pulls me towards longer. I think generally another perspective that pulls me towards longer is just being like, well, you know, the AIs are pretty smart, but they're not that smart. And I think we just can't have that much confidence that the next part of progress is doable. And then there's some more specific object level argument. Now I'm sorry to defect on the question a bit, but I always feel tempted to go in and do some of the more arguments for short timelines that you're bringing up. So another perspective on short timelines that I think is pretty important here is we sort of are going through a lot of orders of magnitude of compute and labor pretty quickly that we haven't gone through before. And also in addition to this, we're going through orders of magnitude that are somewhat above our best guesses at the amount of human brain computes. So sort of we have these sort of not very good estimates of how much compute the human brain is using that indicate Maybe it's like 1e24 as like a central estimate of lifetime human brain compute. So currently training runs are about I think Grok 3, which was just trained recently, I think I saw the estimates were like three E26. So you know, about a little over two orders of magnitude or like, you know, two and a half orders of magnitude above human lifetime. And then we might think that like you first reach AI is capable of, you know, basically like beating humans, you know, above human lifetime compute. Because generally we develop algorithms that are less efficient than biology first and then they potentially pretty quickly become more efficient than biology. And so I think that like the sort of like this is in the, in the literature or like in in like a jazz old bio anchors report for the people who are familiar with that is what are called the lifetime anchor. And I think I put a decent amount of weight on the lifetime anchor. It looks pretty good because we're sort of, we are hitting these like around human ish capabilities is around this point where we're hitting human computer.
Rob Wiblin
Human ish computer.
Ryan Greenblatt
So I'm like, I don't know, it just feels like that model has a lot of support that suggests that we might be hitting it in several orders of magnitude more compute. And we're just sort of burning through these orders of magnitude of compute very quickly. Where for context, I think training run compute is increasing by about 4x per year. So you go through a lot of orders of magnitude pretty quickly. So it's like a little over an order of magnitude per year at that rate. And also I just generally think we should put some weight on a sort of a very compute centric view on AI development and on the Compute centric view on AI development. Even with something less specific than lifetime anchor, we're just like, well we're burning a lot of orders of magnitude. Starting from a pretty competitive starting point, maybe we go pretty far. Now here's the bear case relative to that is, well, we can only burn through orders of magnitude so far, right. So we're not that far off of hitting the total amount of chips you can produce. So perhaps I might be messing up the statistic, but I think my sense is it's like ballpark, like a pretty reasonable fraction of TSMC or like, you know, semiconductor manufacturing capacity is going towards machine learning chips. I think my cash number is 10 to 20%. I hope I don't get this horribly wrong, but I think it's definitely above like 1, it's well above 1% I think around that. And so one thing that's interesting about this is so you know, Nvidia revenue is roughly doubling every year and I think that the number of wafers or whatever, I hope I don't brutalize the thing. I'm not a semiconductor expert here. Semiconductors for AI have been increasing a little over 2x per year. Maybe it's like just to make it simple, let's do a slightly bullish estimate of 3x per year. Well one thing that's worth noting is if you're starting at 20% and you're increasing at 3x per year, you do not have that long to go through before you're just hitting limitations.
Rob Wiblin
All of the chips for this?
Ryan Greenblatt
Yeah, you're using all the chips and like. Well okay, once TSMC is basically like all of the fabs are producing ad. I mean you can only go so fast.
Rob Wiblin
I guess then you're limited by how quickly can they build new fabs basically.
Ryan Greenblatt
Yeah, well, limited by how quickly they can build new fabs, Limited by things like, you know, there's other sources of progress to be clear. So like even if we're sort of limited by how they build new fabs, there is still potentially sources of progress from hardware improving over time. And there are sources of progress from algorithmic development though I think a key thing, maybe people aren't tracking enough is that algorithmic development is also driven by dumping in more compute. So we should expect algorithmic development to slow. So I think the bear case is maybe like okay, you have AI, keep going. We're already hitting like insane amounts of spending as you're sort of starting to hit the like TSMC limits, the spending has to go even higher to justify building new Fabs or maybe prior to hitting like most of TSMC production, maybe people are just not seeing results sufficient to justify these levels of investment. So I think, you know, Microsoft, Google, they can justify potentially like, I mean Microsoft is like, I think planning like about $100 billion in capital expenditure. And like this year Stargate has about $100 billion committed. Maybe that's over a year or two. And then maybe they're hoping they get more money. Stargate is a project by OpenAI. But I think once you're starting to get beyond this like $100 billion regime, this is no longer, it's no longer sufficient to just have a big tech company that's super sold. Right? Like, like Google cannot, like just like does not have the ability to readily spend a trillion dollars, especially not a trillion dollars in a year. And I mean I think they, they, it is not out of the question that you could raise a trillion dollars, but I think you would probably need, you know, very impressive results I think and you need to be starting to pull in more, you know, more skeptical investors, more revenue. I think like you need to be talking about like potentially like sovereign wealth funds. Like, like, you know, you know, certainly.
Rob Wiblin
It'S possible, I mean the U.S. government could deliver that kind of money.
Ryan Greenblatt
U.S. government? Yeah, certainly it's possible. I think, although I guess you start.
Rob Wiblin
Probably if you try to spend a trillion dollars in a year on this kind of thing, you start hitting other bottlenecks for sure. Where are the chips?
Ryan Greenblatt
Yeah, yeah. So like I think once you're, once you're. So I think we can get through. I don't, I don't currently know what the elasticity is here. So, so I think Epoch did some estimates of like what is the biggest training run you could do by, by 2030. And I think they were like, I think their median for if you just like, you know, in the timeline where people keep trying to spend aggressively was like maybe you can get up to 10 to the 30 flop training runs. And that was taking into account data bottlenecks, you know, various considerations on bandwidth between chips, various considerations about like, you know, how much you can scale up chips, chip production, how much TSMC is building out capacity, this sort of thing. I think my guess is that that's probably a pretty reasonable guess independent of AI acceleration. AI acceleration could make this faster. But like, you know, if sort of we're in sort of a bull timeline but not a bull timeline where the AIs have started to speed things up, I would guess that's a pretty good Estimate, you know, but, but, but there's a chance we fall off of the, of that because of maybe like the bottlenecks hit harder than they expecting and maybe investment dries up faster. And I think they were not taking into account, well, hopefully they don't call me out on this, but I think they weren't taking into account willingness to pay on this. So, but, but you know, after the 20, 30 point, I think things are starting to get a lot harder. So I think if we're getting up to 10 to the 30 flop, like you know, bare metal flop, just like, you know, the actual computations of GPUs are running starting from where we are now, which is about more like a little over 10 to the 26th then. So we've hit four orders of magnitude in the next five years. So that's quite fast. I think that is enough that we could in principle see the trends that we've seen continue. Right. The trends that we have. So I think we've had sort of a bit of a GPU lull. I think we'll have GPUs sort of start building up. I think like post GPT4 there's a bit of a lull as people were sort of trying to buy all the H1 hundreds. I think we'll see a run of models with more H1 hundreds like GP 4.5 and then we'll see another round with Stargate, which is another big build out. Semi analysis has speculated that Anthropic has a large number of chips from Amazon, maybe equivalent to around 100 or 200,000 H1 hundreds. And so we'll see another round of the 100 to 200,000 H100 clusters and then we'll see maybe, maybe I think around, it's like I think a little around. Okay, I'm going to get the dates wrong, but maybe it's around 2028. We're going to see the sort of like more like maybe around the million GPU range. But if we're not getting to really powerful AI by then, I think you should expect things to slow. And so there's sort of this longer tail of we eat a bunch of compute around this sort of like up to like Maybe it's around 2030, 2032 and then progress has to taper unless we hit very powerful capabilities. Or we're all wrong about how fast you can build fabs or how much investment you can summon. But I think that's a lot of the bear cases. Like you know, maybe you do a bunch of the scaling, but you hit just like these limits, not all of it.
Rob Wiblin
So the short version of that for people who didn't follow at all, is that we're currently increasing the amount of compute that is going towards training AIs very, very quickly. And we're not going to be able to maintain that pace because we're currently doing that by grabbing some chips that were being sold for other purposes and using them for machine learning and stuff instead. And these companies are going from spending 1% of their resources on machine learning, on AI development to 10% and then maybe they can go to 100%, but they can't really go beyond that by just like grabbing resources that were previously going towards other stuff. So at the point where kind of almost all of the ships are going towards AI training and almost all of the resources of these companies is going towards that, it levels off the rate of increase that you can get though, for sure.
Ryan Greenblatt
Yeah, I mean, maybe a few minor clarifications on that. So like, one thing is like when you're thinking about repurposing the chips, it's like you might not see this like, like this isn't like they're taking, you know, iPhone chips and repurposing it. It's like there was capacity to do chip manufacturing that was relatively general purpose. And it's like instead of making as many iPhones, we're now making, you know, more AI chips. And so some of that's coming from like building additional fabs, but some of it's coming from like slightly increasing the prices of other chips or, you know, reducing how many of them you're getting. And, and, and that can like, you know, it's pretty easy for like, oh, once when AI is in the sort of like the 20 or 30% regime, it's very easy. Like that's not going to have noticeable market or like very noticeable market effects on like how much other chips are costing because, you know, TSMC expanding somewhat faster, it can keep up with that. But once it's like, oh, AI chips are 80 or 100%, then we're going to start seeing bigger effects and things, and things slowing down from there.
Rob Wiblin
I think another thing that makes it a little bit difficult to forecast all of this is that we're not just doing exactly the same thing, you know, year after. We're not, we're not riding the same trends year after year. I guess initially, for example, example, we were getting a lot of improvement by putting more compute into the pre training. This is the thing where you dump all the text from the Internet and try to get it to predict the next word. We're getting enormous gains from that by putting in, throwing in more data and more compute. But then that sort of tapered off and then we have to move towards I guess better elicitation using post training, reinforcement learning from human feedbacks. Then we're doing a different thing which initially is very effective but then begins to level off. I guess now we're using reinforcement learning to I guess to do a sort of self play where the models are learning to reason better and basically we just reinforce them when they get the answer right. We're on a very steep curve with that. But presumably that at some point will level off and we have to do a different thing. And the fact that each, I guess, if I understand that correctly, I guess that makes it more difficult because we don't know exactly what will be the innovation next year that will be driving the improvements.
Ryan Greenblatt
Yeah. So I think there's a question of like, okay, so the improvements from like GPT1 to GPT2 to GPT3 to GPT4 and then maybe even like the improvement for throughout there was a bit of a lull in 2023 and then things picking back up in 2024. I think a lot of the improvement there was driven by up until maybe the middle of 2024 was driven by scaled up pre training, like dumping in more data. So I think there's sort of a meme going around like pre training has hit a wall. I think it's not so clear to me that this is what's going on. My guess is that pre training is reducing. It probably is the case that it has, has relatively diminished returns or it has. The marginal returns of scaling that up further are less than they were in the past in terms of qualitative capabilities based on, I don't know, some sense of, oh well, we have Grok 3 which is maybe a little over 10x more compute than GPT4 and also better algorithms. And we're like, okay, how much better is it? It's somewhat better. But I think it's also the case that it's worth noting that previous improvements like the GPT3 to GPT4 gap, it was a lot more compute. So it was like, I think that gap was like roughly 100x bare metal compute. And it just turned out that at the time there was a lot of low hanging fruit and scaling up the amount of floppier spending very quickly. And now we're sort of, we were running into bottlenecks so like in, in, you know, post GPT4, they were sort of waiting for H1 hundreds. The H1 hundreds were slow to get delivered. We sort of were getting the H100 clusters online kind of late. And then maybe like there were some difficulties with getting that initially working. So GP 4.5 somewhat disappointing, I think. I think it's rumored that OpenAI had multiple at least one failed training run. Oh, wow. And so it wouldn't be surprising if we sort of see people sort of adapting to more compute, figuring out how to using it. And the returns from pre training are sort of going up from there. And then, okay, well that's pre training. Maybe even if pre training is diminishing, we sort of, you can scale up RL and we've seen that going on over 20, 24 where like, you know, we have 01 where what's going on is sort of their training on or yeah, I should say RL reinforcement learning. What they're doing is they're sort of of training on easy to verify tasks, not training on next token prediction. And we've seen a lot of initial returns from that and we don't know where that will peter out. Right. So we haven't seen that many orders of magnitude on RL training. So it's speculated, for example, that DeepSeek R1 was about a million dollars worth of compute on the rl. And so in principle, we can scale that up to be more like three orders of magnitude higher with the clusters that people have maybe a little less than that, but like ballpark and so. Okay, well, once we're talking about three orders of magnitude higher, it could be that that yields huge returns or it could be that that sort of doesn't yield huge returns. So the story for yielding big returns is sort of like, well, it yielded big returns from the first million. Maybe you just go further, build more environments, do more RL big returns. The story for not is sort of. Maybe there was some sort of latent capacity or sort of like potential the model had, which sort of RL is bringing out. We've brought out most of the potential. We're sort of hitting diminishing returns.
Rob Wiblin
Yeah. Just to say that back for people who don't follow this very closely and are not sure exactly what to remember, I think one thing that is actually worth remembering is so reinforcement learning is, I guess we used to use that a lot and then it somewhat fell off and now it's come to the fore again. This is where you take the existing models, I guess GPT4.0 or something like that and you give it very challenging reasoning problems and you maybe get it to try 100 different solutions, a thousand different solutions and basically you just find the cases where it gets the right answer at the end and then you say yeah, well done, more of that reason, more like that try to solve other problems in the same way that you just did. And that's turning out to be extremely powerful. And I guess you're saying that that is maybe picking up some low hanging fruit where these models had more ability to do clever reasoning like latent in the weights than was initially apparent than we were able to get out of the them. And by using this process to get it to try all kinds of different solutions and then find the reasoning processes that are functioning where like extracting a whole lot of stuff that was just sitting there waiting to be picked up but that could run out somewhat and then at that point it's going to be a heavier lift for them to basically learn new superior reasoning techniques.
Ryan Greenblatt
Yeah, that's right. So I think we've seen sort of like people were trying to get models to reason in chain of thought. So as of GPT 4 people were totally aware that you could do, you know, chain of thought reasoning. And we're messing around with this, but I think it only worked so well. The model wasn't that good at recovering from its mistakes. It wasn't that good at like you know, carefully reasoning things through. And then I think we're seeing with like oh 1 and o 3 and you know, other reasoning models that now like you can make it so the model sort of can do the reasoning pretty well and a bunch of that might have just been like well, you know, chain of thought it was already sort of doing it and maybe you can just make it somewhat better but there's a question of like how much better.
Rob Wiblin
Yeah, I guess, I mean another piece of low hanging fruit that you mentioned was basically there was sometimes we were giving these models like very difficult challenges but giving them only $1 worth of compute to work with and people have found well, what if you actually just give them more like the kind of resourcing that you would give to a human which is $100 or $1,000 worth of equipment and salary then they do radically better. When you're doing something that's a bit fairer of a comparison but you kind of can't do that scale up again if you go from 1 to 1000. But if you're going from 1000 to 100,000 now you're talking like real money. And there's a question of would this ever actually be economically useful to give a model so much to just solve a math problem? Yeah.
Ryan Greenblatt
I mean, one thing that's also worth noting is even if it would be economical, we quickly run into just like the total quantity of compute. So one way to put this is like, suppose that you're having the model do one task for $100,000. My understanding. So like I did some kind of crappy bow tech and my sense was like, at least as of a few months ago, the total amount of compute that OpenAI had was like about $500,000 per hour. And so if it's $100,000 task, then you're using a fifth of all of OpenAI's compute for an hour. Okay. So I'm like, okay, well you can only do so much of that. Right? So even if it was the case that there are some really economically valuable tasks, you just hit bottlenecks on that and maybe, I mean, you can also.
Rob Wiblin
Because that drives up the price.
Ryan Greenblatt
Yeah, it's like the supply and demand. Right. So, so I think that like, you know, even if it's the case that you, you're like, wow, it's really, you know, it's useful enough. There's really only so much scaling of that that you can do and, you know, you can go pretty far. And I think it's, it's a nice, I think people have been demonstrating, also people have been demonstrating not just human cost, but substantially above human cost, yielding additional return. And this is kind of nice because it lets us get sort of a sneak peek into the future. Right. So I think if you're seeing the models do a task, but sort of slowly and at very high cost, I think we should expect that soon enough that will be quickly, at a much lower cost. Right. Because the cost and speed, the cost is rapidly dropping.
Rob Wiblin
Yeah, we can draw out the, yeah.
Ryan Greenblatt
We can draw out the curve. Right. And so I think I'm somewhat hopeful for, sort of for people who are more skeptical about AI's exhibiting some capability. Maybe first we can exhibit it with very high runtime compute, a lot of domain specific elicitation, and then pretty shortly after that sort of the need for that goes away. And I think hopefully then we can sort of reach common knowledge somewhat before the capability is widespread enough to be incredibly widely used.
Rob Wiblin
Yeah, we were dipping in and out there on the bear case or the case we're thinking of. This is going to take a long time.
Ryan Greenblatt
I'm such a Bad bear, I have to say.
Rob Wiblin
Yeah, well, I'll try to make. One of the arguments that are here, I guess is the AIs might get very good at fairly narrow tasks and maybe they become really good coders, they become really good at maybe idea generation, hypothesis generation, setting them up. But there's more to running and scaling an AI company than just those things. And they'll have some serious gaps and weaknesses and that is the thing that will be the limiting factor and slow it down. How plausible do you think that is?
Ryan Greenblatt
Yeah, so some context here. So historically, as we were saying, a lot of the returns were driven by pre training and then starting sort of, sort of later in 2023 and in 2024 sort of there was RL where a bunch of the returns from GPT4 Turbo were driven by RL or is improving over time probably, or this is speculation, but GPT4O somewhat better base model, but also better RL. And then we had a one better RL. And this has been driving up benchmark scores on programming tasks, coding tasks, math tasks that are checkable much more than other tasks. Now we do see some transfer. So an important one way out of this is maybe it's just the case that you make the AI's very superhuman at programming, quite superhuman at software engineering, which is somewhat harder to check, but doable to check, or at least parts of it are doable, pretty readily possible to check. Very superhuman at math. And then this transfers someone. Right. So, you know, maybe it's the case that it's very expensive to label other things. So you know, we can, you know, we can assess human performance in other domains. It maybe just requires, you can't do it automatically. Right. So you can assess how good a paper is and we have processes for doing that, which has some signal, it's just much more expensive. And so in principle, if the AI sort of can almost transfer to writing good papers and just need just a little bit of feedback, you know, they're quite sample efficient, then that can go pretty far. So another thing worth noting is I think there is a just no generalization required story. So for example, or nearly no generalization. So it might be the case that you can basically make it almost work or basically work to just do RL on things that are basically just straightforwardly part of being like a really good research engineer. And we might be able to get to the point where we have nearly fully automated research engineering and AI companies just via scaling up rl, piecing it together and not really requiring very much transfer. At all, maybe almost. Sorry, we require sort of like in domain generalization. So we require sort of like the AI is improving in domain relatively quickly and relatively simple efficiently. But Even if the AIs are sort of have not very good research tastes, they're not very good at these things that can go pretty far. Now in addition to this, you might be able to get surprisingly good on some of these other things of running an AI company even with just in domain sort of not requiring much transfer. So we have these sort of research engineer AIs for one thing, maybe that makes progress go faster starting then and then we sort of kick off more progress that eventually gets us a bunch of other things. I can go through that story later. But second of all, sort of if we're like, oh, oh, you know, we'll have these research engineer AIs but they're not going to have research tastes, they're not going to have a bunch of other things that are sort of harder to train. Well, one thing is research taste. Well, a lot of what we mean by research taste is understanding what the results of experiments might be, having good predictions, being able to sort of understand what's going on based on a smaller amount of evidence. And for this you can Maybe just train AIs to be amazing forecasters of ML research experiments. And so if you're sort of training these AIs to, to be like superhuman ML project forecasters where they're predicting the results based on just generating a large number of smaller scale ML projects, having some transfer to larger scale ML projects which is less easy to readily get data. Then maybe from there you have these AIs that can predict the results of experiments as well as Ilya Sutskev or Alec Radford and have sort of, maybe they sort of are less relying on insight and more just like mechanistically, you generate a long list of ideas, you predict how well they go go, you proceed from here. And I think there's basically a bunch of routes like this. So another route there's sort of like the, like maybe you can just do within domain. And that can go surprisingly far in automating the AI company. An important asterisk for this is, I think the skeptics maybe are sort of like screaming to me, but what about things other than automating AI companies? So I think maybe they're like, okay, sure, AI, R&D, I guess that's checkable enough. But what about, you know, all the other things that humans do which have slower feedback loops, like, like being a Good strategic CEO fundraising, fundraising, all these other things. So I think I want to put a pin in that, but maybe we'll get back to that later. And then the second thing I want to say is so there's the in domain story and then there's the like, more like a generalization story where you know, you train on all these narrow tasks, the AIs get smart with just a tiny bit of data they can transfer to these out of domain or like, you know, we didn't have a ton of data things. I think it's going to be, you know, some interpolation between these probably where, you know, it's harder to see get lots and lots of data on Long Horizon suite tasks. So probably you want to need a bit of transfer, but you can probably do a bunch of data. And then there's I think a third story which is like people just figure out how to RL on harder to check tasks using other mechanisms. So you know, using things that are more like process based checks, using things that are more like being able to check fuzzier tasks using I think things more along the lines of like self critique, constructing more detailed checks themselves and doing RL from here. So humans interestingly are able to learn on lots of tasks that are sort of somewhat fuzzy to check via sort of having some sort of notion of self critique, having some sort of notion of how well did I do? Should I do more like that, less like that. And I think the AIs can currently do this a bit, but not amazingly. And I can imagine this sort of getting better over time. And if I think a thing that we've seen time after time is if a given thing is the limit to capabilities, there is just a huge amount of horsepower in the, you know, in the AI company companies, in sort of the broader ML community, but I think broadly in the AI companies to sort of push on that whatever the limit is.
Rob Wiblin
Yeah. So there's been a shift over the last year from almost all like a very large fraction of the compute being spent during training runs towards a larger fraction being spent during inference or runtime compute. So rather than use your compute to make the model better at predicting the next word or more likely to be positively reinforced during training, meaning you actually just give it an enormous amount of time to think about the specific task that you've given it, how does that change the picture around the time that we're automating a lot of AI R&D? I think many people think that on the inference compute centric paradigm this makes Things go a little bit slower because you're going to be so limited. If it turns out that it requires an enormous amount of compute to run the equivalent of one AI researcher, then at least to begin with, you just won't be able to staff yourself up with a million equivalents of them. Maybe you'll only be able to have 100 or 1,000 to begin with, and then it will gradually go up.
Ryan Greenblatt
Yeah. So for one thing, to the extent Inference Compute wasn't priced into your prior model, then it should push you to having, like, relatively shorter timelines to the relevant milestones. But maybe at the first time we hit a given milestone, it'll be very expensive. So maybe we'll first be able to be like, well, we have this AI that costs. You know, maybe it would be like it could automate the job of, you know, a research engineer at our company for the, like cost of, you know, maybe it would be like 10 or 20 million in compute per year. Year. And so, as I was noting earlier, like, there is just actually, like a limited supply of compute. So even if it would be like, in principle, you'd be happy to automate all your employees, maybe you have, like, you'd be happy to automate a thousand employees at 10 million a year. That would be 100 billion per year, which is getting close to the amount of compute the company even has in this regime. Right. So, like, we can look at, like, you know, how much total capital expenditures have people been spending? It's like, you know, that's roughly the same ballpark. So if it was even more extreme than that, you know, maybe it's more like 100 million per year, then. Well, all of a sudden, you know, well, they just might not have the money to do that, and they might not have the compute to do that. And also, it might just not be, you know, economical. It also might be the case that with this Inference compute, sort of, it's too slow. Like, it could be that, like, oh, you know, maybe it would work if you could make it fast enough. But a lot of the ways of scaling up Inference Compute might make it serially slower, which reduces a lot of the competitiveness advantage AIs have.
Rob Wiblin
So what do you mean by that? It's just that it would take a very long. Because it has to do a whole lot of things one after another. It just takes a very long time to actually output an answer.
Ryan Greenblatt
So, yeah, so I think, for example, I think O often answers questions slower than humans would answer those questions because it spends so long thinking, whereas previously AIs were almost strictly faster than humans. Occasionally O1 is now slower or more often in the same ballpark. And we might see inference time scaling that involves more serial steps having this property. I think there's various ways that this could be mitigated such that I don't know if this is a huge obstacle. I would probably guess not, at least once optimization has been applied, but it could initially be an obstacle. And so in general, I think the inference time compute should expect that we sort of can see capabilities exhibited prior to them being economical and we sort of start seeing capabilities at small scale prior to large scale. Whereas I think a surprising thing about prior sort of the pure pre training paradigm is that you naively expect that sort of at the point the capability first becomes available, you can run truly a vast number at that capability level.
Rob Wiblin
Level.
Ryan Greenblatt
My sense is that this is still broadly going to be true, basically because I think distillation of high inference compute is relatively quick, and inference compute I think only gets you so far. Right. So I think it can get you large gains. But I think, I mean you have to be like quantitative about how large the gains are. Right. And so if that can be like distilled away relatively quickly, which I think we've seen. So I think we've seen like, I think we see like pretty fast distillation progress going from for example, like OH1 to O3 mini. I think we saw pretty fast gains of both just scaling up training, but also being able to distill that more effectively down into a smaller package.
Rob Wiblin
So distillation is where I guess you make the model a lot smaller and it performs almost as well. You could operate more of them in parallel or it can give you more answers more quickly.
Ryan Greenblatt
Yeah, And I should say distillation is sort of a special case of a broader thing of just like sometimes it's easier, at least initially to make something much cheaper than it is to make something much better. Better, Right. So you sort of say we have some new capability we've been demonstrating, we sort of push the frontier, then we can bring the cost of that point in the frontier down pretty quickly is what has happened historically. And distillation is sort of where you train a smaller model on the outputs like that Tech in particular is like you train a smaller model on the outputs of a bigger model.
Rob Wiblin
And that can, in the inference compute paradigm, don't those two things kind of converge? Because saying, well, if you can make the model smaller, then you can allow it, you know, if you can make it a tenth as large, large. And it's almost as good at thinking as it was before. Then you can allow it to think 10 times as long effectively.
Ryan Greenblatt
Yeah, but of course, the returns to inference compute potentially could diminish pretty quickly. So I think we see as an example, I mean, this is maybe not the most relevant benchmark, but on, for example, ARC AGI, what we see is each. So they went from, I think they went from ballpark, maybe a little under maybe like $3 per task or $10 per task task to over $1,000 per task. So maybe like a, I don't know, like a little over two orders of magnitude of cost. In those two orders of magnitude of cost, they push the performance from about, I think around 76% or 75% up to 85%. So, okay, so we're like, well, and you know, for humans, I think moving through the human regime of 75% to 85% is not, you know, that much. And so I think similarly we see stuff like, well, if you apply stuff like, you know, pick the most common, like you sampled the model 64 times, you pick the most common answer. Between that, you're sort of seeing relatively marginal improvements from that. So in general, I think there's a question of how efficient are these inference time strategies? I think that it depends on the strategy, but you might hit diminishing returns.
Rob Wiblin
Okay, I guess the overarching theme here has been arguments to expect AI R&D automation soon versus later. Are there any other kind of key factors that I haven't given you a chance to talk about yet that bear on that question one way or the other?
Ryan Greenblatt
Yeah, so I think one important question is sort of like the, like, is pre training hitting a wall thing? So I think there was sort of a, you know, people sort of shifting towards like, oh man, maybe pre training is hitting a wall. So I think I want to like, dive into why this might be true to the extent this is true and also, you know, how true is this? So I think like, one big reason why we might expect this is that it might be that there's sort of issues with data quality and data quantity. So it might be the case that like sort of, you know, Deep Sea DeepSeek recently trained DeepSeq V3 with just a tiny amount of money and they trained it on about 15 trillion tokens. And so it might be the case that sort of like you can train on lots and lots of data, but it might be that sort of there's a kind of steep diminishing returns to data Quality. I think there's not a lot of public evidence about the extent to which this is true. But if that's the case, then it might be what happens is sort of like once you train on the first 15 trillion tokens, you're sort of the next 15 trillion tokens are a lot less valuable. So sort of if you imagine sort of scaling up the deep seq v3 training run by a factor of 10, then you based on like chinchilla scaling laws, which is just like how much should you scale up? Like the size of the model and the data in parallel, you would scale up sort of the data by 3x and you would scale up the model size by 3x. And so if you did that, you know, you'd be at 45 trillion tokens. And so it might be that, you know, sort of if you're at 45 trillion tokens, the next 30 trillion tokens would be a lot worse either because it's like repeated epochs on the same tokens or because it's lower quality tokens tokens. And so it might be the case that sort of you can stretch things far, but the returns sort of start diminishing around now because of data filtering. And this might be one reason why sort of pre training scaling maybe looked somewhat more promising in the past than you might think it is now because there was worse filtering. So maybe a lot of the GPT3 to GPT4 improvement was as they were training on more tokens, they were getting more of the good tokens in there. But sort of now we're sort of in a regime where we sort of, we can really like pull out the good tokens and make sure to train on them and that could yield diminishing returns. So I think this is like a reason why we might expect pre training is slower. But I don't think this is an argument that pre training returns would totally fall away. Right? So you can just train on worse tokens, train for more epochs, find methods to get more juice from the same tokens and get that. But you'll sort of see like a slower rate of progress. In addition to that, there's also the option of instead of doing pre training, just do way more rl. And so we were talking a bit earlier about maybe, well, you know, maybe RL has diminishing returns. But even if the returns diminish, it might still be that the returns are high enough that they're higher than pre TR training where you can dump in exponentially more compute and you get sort of like linearly more performance, but qualitatively, linearly more performance is a big deal. And in addition to that, I think we're just very uncertain about how true all this is. And there's potentially a bunch of scaling left, even if the returns are weaker than people previously thinking.
Rob Wiblin
Okay, just to try to say some of that back. So pre training is something we take a huge corpus of information of text and try to predict the next token. It seemed like scaling up the amount of data going into that process was very valuable in the past. But it's possible that that is leveling off to some extent. And part of the reason for that you're saying is. Well, I guess to some extent they've actually run out of new data that they can collect. They're getting close to grabbing all of the good text that humans have actually written. But also they got better over time at filtering out what is actually the high quality tokens, what is the high quality content that they want the model to be training on a lot. And what stuff maybe could they discard. So presumably kind of they're keeping in books, like published books, textbooks, that kind of thing, and putting extra weight on that. And then just like random stuff taken off slot, taken off the Internet that has no particular information in it, they're managing to exclude that. And I guess you're saying, having filtered out the good stuff, the only stuff that they can add is really quite low quality. And so they're just not actually getting very much juice out of that. And that might help to explain why data scaling hasn't been adding as much value now as it used to. But you're saying they can take the effort that they were putting into that and use it to improve the reinforcement learning process, which is, I guess, a different way of trying, trying to improve the model, rewarding it for successfully answering questions and rewarding it for having good thinking in the process of doing that. Have I understood right?
Ryan Greenblatt
Yeah, that's right. So I think also you can do sort of, I think people sometimes talk about synthetic data. I like to think about synthetic data as sort of like a sloppy version of rl, where what you do is you get another model, you get it to generate some data, and you get that data to sort of be somewhat improved from what it was just generating by default. So maybe you only select cases where it got it right. Maybe you have it revise its answer answer, or you have it sort of like you let it try to solve the math problem, then you show it the correct answer, and then you have it sort of correct its chain of thought. You throw that into the pre training corpus and maybe that data is somewhat valuable. And this is sort of in some sense similar to RL because you're sort of finding model trajectories that are good and training on them. But it might have, you know, somewhat different properties. And you can, you know, scale this up, generate a lot of synthetic data and I think this will, you know, yield some returns. Like, I think this will have some improvement and you can scale that up further, further. And so I think maybe the extreme bear case. Let me try to make the extreme bear case. The extreme bear case is like Deep Seq V3 came out kind of recently, had $5 million training cost in parallel. We saw Grok 3 come out also pretty recently, or DeepSeek V3 is somewhat further than the past. The difference in cost is I think about two orders of magnitude, about a factor of 100. I think DeepSeq v3 was trained on about 2,000 GPUs. Grok3 was trained on about about 100,000 GPUs, very roughly speaking, so maybe it's closer to a factor of like 50 or so, maybe roughly around there. So okay, we have this factor of 50 in pre training compute. And then if you look DeepSeq v3 qualitatively, at least not that much worse than Grok3. So what's going on? What explains these sort of returns? And so there's the data scaling story, which is like, well, maybe they were already training on the juiciest 15 trillion tokens. And the stuff that GROK rock or the stuff that XAI was able to scrounge up, not as good. And so they were sort of scale up the model, you scale up the data and maybe they didn't get awesome returns from this. Another story is just like the returns are weak, which is plausible from the evidence we have. And then the question would come down to can you switch to something more like rl? And then I think another story which is important and I think a big part of my model is I just think Deep Seq has substantial algorithmic advantage relative to XA on right now, at least, I think Deep Seq v3 probably was like just actually a better optimized training run where they, you know, were using the FLOP more effectively, both because of like better hardware utilization, training at lower precision, which is sort of like a technique, sort of instead of like, instead of, you know, storing the numbers with like a bigger representation, use a smaller representation. It's a bit more efficient. Yeah, you just sort of do it, you know, with less accuracy, but you can get similar performance. And then in addition to this sort of just I think just actually having a better tuned pipeline and more for sort of better architecture potentially. Now that said, an important part of why Deepseek was able to have a better architecture is they were doing a smaller scale training run, which meant that they could run more experiments at that scale and really iron out all the kinks. And so maybe sort of you'll be able to iron out the kinks, but every time you go to larger scale for the first time, you'll run into some issues. And so I think there's some rumors of this happening at OpenAI. It's plausible that XAI ran into similar stuff. And so I think were seeing probably some of that. So I think this maybe explains maybe my view is the actual amount of effective compute that Grok 3 is above deepseek is maybe about a factor of like 10, if I had to guess. Maybe it's a little more than that. Maybe it's more like, yeah, I think about 10 because it's maybe like 50 experimental. But then there's a bunch of efficiency gains that shrink that lead. Plausibly I'm off base here. Maybe someone will call me out on this, but that's kind of my guess. And I'm like, well, does this 10x scale scale up match what you'd qualitatively expect? And I'm like, well, you know, it is actually a decent amount better. And so I don't think it's like totally off the like, you know, like implausible. This is like, you know, if you, if this is what a 10x scale up looks like, another 100x could plausibly be a pretty big deal. Yeah, so that's a bit more of the bull case there. And I think another thing that I think is, is happening with AI progress that's important to track. This is I think a lot of people are like, well, you know, Grok 3, it's barely better than GPT4. GPT4, you know, that's two years ago ago. Very slow progress. What are we doing? But I think an important thing is people are comparing to original release or comparing to recent releases of a model called GPT4 versus original release GPT4. And so I think OpenAI's naming convention here has sort of caused people to sort of get frog boiled or sort of, you know, underestimate the rate of progress. So sort of what we saw was we had the original GPT4 release back in 2023. This near the start of 2023. That model, you know, was pretty good. Good. And then we saw OpenAI sort of progressively release models they were still calling GPT4. So they called the model like GPT4 Turbo and then GPT4O. And these models were all somewhat better than GPT4 on both, I think how good the pre trained model was and also on RL or just a bit better. And so we sort of had very incremental progress while still calling it GPT4. And so people are sort of missing, I think maybe like a roughly like a little over an order of magnitude of progress from sort of GPT4 to the best version of GPT4O. And so they're doing that comparison and so they're missing a bunch of progress that happened in the meanwhile.
Rob Wiblin
Yeah, it's maybe easy to forget how irritating the original GPT4 was to use. I mean it was incredible given our expectations at the time. I was blown away, but it was very fiddly. It wasn't very good at answering questions. It messed up in much more obvious ways than what it does now. But I guess, yeah, it's called the same product. So now you just feel it was always GPT4 and you forget, I mean people were so fussy about how you had to prompt it the exact right way and you had to have expert prompt engineers. And that has kind of fallen away as they've just gotten better at following instructions much more sensibly.
Ryan Greenblatt
Yeah. As a concrete example, original GPT4 could maybe do agentic tasks very poorly. Maybe it could do 5 to 10 minute agentic tasks half the time, agentic software engineering tasks. And now GPT 4.0, it's much more like an hour. So I'm like, it's really quite a large difference in terms of this sort of downstream capability anyway. So that's sort of the like some texture there. I think it would be really nice if someone did some really detailed analysis trying to like map out all the qualitative improvements and how much effective compute was in maybe epoch should do that or someone else should do that. Yeah.
Rob Wiblin
Just to back up and Talk about the Grok vs DeepSeq comparison and spell this out for people a little bit more clearly. You were saying people look at DeepSeq and they compare it with Grok3 and say these GROK is like somewhat better, but it's not radically better. And it looks like it was trained on 50 times as much compute. So like what tiny returns we're getting from scaling the compute input. But you're saying this is a bit unfair because I suppose deepseek has, because of, I guess, limitations on access to compute in China, they've been working a lot on algorithmic efficiency in order to get the absolute maximum juice out of the compute that they have available. And I guess GROK is in the opposite situation where they're trying to scale and train and grow incredibly quickly and they have access to a ton of compute, so they're not worried about the efficient use of compute almost at all. And you're saying if you did a more like, for like comparison in terms of the algorithmic efficiency of the training, the then you would say, well, Grok 3 was trained on maybe only 10 times the effect of compute. And so it's not nearly so large a scale up. And so the fact that the improvement seems only incremental is actually closer to what we would have expected anyway. It's not actually a sign that compute scale up is not useful.
Ryan Greenblatt
I mean, I don't know, it's definitely some evidence. Right. And we should put some weight on. No, it's actually just 50x effective compute and this is what you get. And that would be, I think, lower than I would have predicted. So it's an update there. But yeah, I think broadly speaking, yeah. And I mean it's also worth noting, it's not just that I think Deep SEQ is more focused on efficiency. I think, I think, for example, I think GROK probably has worse algorithmic efficiency than for example, OpenAI and deep seq do. My sense is they're trailing someone in algorithmic efficiency, but are somewhat ahead on scaling up to very large training runs. And so they're just in a somewhat different position. Whereas my sense is Deepseek is maybe pretty competitive with OpenAI on algorithmic efficiency, at least at small scale. And then another thing is maybe an important factor we don't really know is that deepseek could practice their training run a bunch of times because they can do that sort of small scale training run a bunch. So it's not just that they're optimizing for efficiency, it's that if you can run the training run multiple times, you might have more signal. Whereas if you're scaling up to a new order of magnitude for the first time, maybe you just mess some stuff up.
Rob Wiblin
Okay, are there any other key pieces of evidence that bear on this timelines question?
Ryan Greenblatt
Yeah, so one thing that I think is a pretty spooky fact is that we're sort of Entering this reasoning model regime where people are scaling up RL and outcomes based tasks where right now people have sort of just started doing RL. I think original 01 and R1 are probably almost entirely trained on relatively narrow short tasks with not that much compute. We have some sense of what R1 was trained on, for example and it seems like it was trained on math problems, questions like GPQA questions which are sort of science questions. And it was trained on maybe competitive programming or short programming tasks, but it was not trained on things like know software engineering that tasks that would take multiple steps, as far as we're aware probably was only trained on literally like single step tasks. Plausibly they didn't ever do training that involved multiple steps, at least not as part of their main RL phase. And so if that, to the extent that that's true, you might think that there's a bunch of low hanging fruit and just like take the sort of scale up RL paradigm and apply it to sort of agentic tasks. In addition to that there's also just scale up the compute way further. So we're sort of like you can scale on the sort of diversity of, of environments and you can also scale on the amount of compute. Where I think like Epoch did an estimate where I think they thought R1 was trained. The R1RL, like on top of deep seq v3, which is the model it was based on, was about $1 million. So I mean like $1 million is chump change in the industry these days. And so you could plausibly be scaling that up by, by you know, over two orders of magnitude in within the next year. Now there might be difficulties in getting that scaling, I think there might be infrastructural difficulties, but in principle it's possible, possible. And so to the extent that we saw big gains from one order of magnitude, I'm just like man, algorithmic progress, a bit of tuning of this stuff. We might be seeing crazy stuff in the next year.
Rob Wiblin
But sorry, 01 and 03 are also reasoning models that went through reinforcement learning. And I imagine OpenAI spent a lot more than a million dollars on the RL for those models. So yeah, why doesn't that suggest that? I mean we don't think O1 or O3 is like, is radically better than R1.
Ryan Greenblatt
So first of all, I actually don't know that I think that O1 and O3 were trained with much more compute than R1. So I think we don't know so some reasons why you might think it wasn't trained with that much more compute. So one thing is that I think it is actually legitimately hard to scale up RL on an infrastructure level and they might not have the infrastructure to do that off the bat. The second thing is it might be hard to quickly scale up the number of environments, but there is ultimately a scalable way to do this. And so it might be that there's just a bunch of returns. Now the next thing is we did see a pretty big performance improvement from 01 to 03, which is evidence that sort of, if that trend continues, that could be pretty fast. So we're just seeing they're sort of rlling on relatively narrow domains, but within those relatively narrow domains progress appears to be very fast. And so to the extent that they can sort of extend the domains they're training on, that might be broader than that and we might be seeing relatively fast progress. So I think it's, I think it's unclear. I think to the extent that O3 is like saturating out a bunch of the lower hanging fruit on this paradigm, which might be true to some extent, certainly it's like pulling some of the low hanging fruit, then this story would go away. But to the extent that we're sort of have like a lot of, you know, greenfield ahead of us on rl, I think this is like probably one of the more compelling stories for like one year timelines or things going very fast. I think to be clear, I don't expect this, I think this is unlikely, but I think one route is like RL generalizes better than you expect, it can be extended slightly further than you expect and maybe at the end of the year you sort of have like almost like automated research engineer level capability, maybe you know, somewhat below that and then things could go really crazy from there.
Rob Wiblin
I see, and you're saying that's not likely, but it's a possibility and if it does happen, this is kind of the pathway by which it would occur.
Ryan Greenblatt
Yeah, I think it would have to like, I think the most like foreseeable pathway would be sort of this scaled up RL in one year. And I should credit. I think this argument was brought to my attention by like Josh Climer, a colleague of mine. So credit to him, but yeah.
Rob Wiblin
Okay. Is there anything more to say on this timelines question before we push on?
Ryan Greenblatt
Yeah, so I think another source of skepticism is like, okay, sure, maybe you can get these LMS to be pretty smart and pretty good at these tasks, but aren't they going to need a bunch of other properties in order to replace humans. So if they need to be able to learn on the job. So like humans, when they do tasks, they're learning how to do the very task that they're doing. And AIs I think, have worse sample efficiency. So I think like human, you know, you can throw a lot of stuff in context and they can sort of learn how to do things better based on that. But it's, you know, relatively shallow, it's relatively weak. Another thing is that AIs currently have sort of limited context length and might have trouble tracking context over very long projects, especially because sort of, you know, they can, I think the effective context length might be much shorter than the actual context length because sort of they can do retrieval across the entire context length, but they maybe can't like do a synthesis across it as easily. Easily, because, you know, maybe they sort of have to like, you know, I feel like when I sort of do tasks, I get some sort of like, vibes, level sense, I get better intuitions. I get some like, sort of like overall vibe of where the project is going. And I think AIs maybe have trouble tracking all this context, even if you throw it all in. And maybe there's routes around this. So I want to talk a bit about like sort of these structural factors. So for one thing, this like learning on the job thing. So for one thing we can do research and how to make within context sample efficiency better. So one route to this is sort of, rather than having this sort of more shallow architecture where you're processing all the tokens in parallel and they can sort of attend to each other. So like, there's some ways in which the transformer architecture is like fundamentally shallow. I don't know how much you want to get into the details of that. You could change to an architecture that isn't as fundamentally shallow. So in particular, there's been some recent papers on sort of having, I would say more of like our current architecture, where sort of you can process the activations in a more deep serial way, which might allow the AI to sort of absorb the context and have more of a gestalt sense and learn from what's going on faster over more and more context.
Rob Wiblin
Yeah. What do you mean by fundamentally shallow?
Ryan Greenblatt
Yeah, so what I mean is that if you look at a token, the model is sort of producing some distribution of probabilities on the next token. It can only have so many serial steps, basically, because what happens is you run one each layer on every token and at each layer you can attend to all the previous layers, but you can't attend to a previous token at a later layer. So sort of like if you imagine you have like 60 layers, layer 10 at token 40 can attend to layers 9 and before on all the previous tokens, but cannot attend to layer 60 on token 39. And so that means that sort of, if the AI was getting into some good insights towards the end of its layer, sort of the earlier layers of later tokens can't take that into account. And now, I mean, the capabilities people are on it as always, or maybe not as always, but to some extent and are looking into, like, ways to change this. And I think that, you know, the way, the way we currently have to address this is that while the AIs are shallow in this way, they are not shallow with respect to tokens. So, you know, if you have a reasoning model, yes, it's shallow in that sense, but also it can sort of produce natural language tokens that are sort of, you know, it's updated thinking and sort of, it can keep doing, like, relatively deep computation via that. So it can solve math problems with 50 steps by having all the steps in natural language, even though it can't do all the steps in sort of this relatively more serially bottlenecked forward pass, or that's the term for like sort of the activations of the transformer which have this, this property. But you might be worried. Well, okay, natural language, it's not that good of a medium for doing thinking. I sort of have thoughts that aren't in natural language, but I think you could in principle have sort of a deeper architecture. And like, I expect people are working on this and I think, you know, this poses a bunch of safety risks because we sort of have this nice property that we can look at the chain of thought and get some sense of what the AI is doing and have some confidence in that, because sort of at least potentially have some confidence in that, because the AI is sort of forced to use the chain of thought in order to get this serial reasoning to work. But if it was the case that all that reasoning was latent, we just lose that property and now we're in a much more dangerous regime where the AIs could be doing subversive reasoning that we wouldn't even know about or would have no way of knowing about. Yeah, or by default, at least.
Rob Wiblin
So I didn't totally follow that. But you said you would change how a forward pass occurs such that it's able to do more sophisticated reasoning within that, and that creates the Possibility that it could engage in scheming before it's actually outputting any tokens that we're able to assess.
Ryan Greenblatt
Yeah, that's basically right. So basically you should imagine the Transformer is like, it's not like a looped architecture. It's not recurrent. So your brain is recurrent. Right. So you're like, you think some thoughts, you think some more thoughts, and it's fully recurrent, including, as far as we know, sort of recurrent state that we can't vocalize very easily. Like I think people can do reasoning that they're not able to vocalize. Some of the reasoning they can vocalize. And I think right now Transformers can kind of. They can do sort of a very limited non vocalized reasoning or relatively limited non vocalized reasoning and then a lot of vocalized reasoning. And so it might be that you can change the architecture. It would be a large change to the architecture in a way that makes it so they can do a lot more of the non vocalized reasoning and more dense reasoning. And there's some papers demonstrating this, like the Coconut paper from Meta. I think all the existing papers are relatively weak sauce. Not really getting that much. Sorry, sorry to the authors of those papers, but that's my sense. But it could be that you can drive this architecture forward. And I think people haven't necessarily really tried that hard to get this to work because there were lower hanging fruit elsewhere.
Rob Wiblin
Would this be very compute intensive to change the architecture in this way?
Ryan Greenblatt
So actually the thing I just described very naively would use exactly the same amount of compute on generation. Because right now you sort of have to do one token at a time and you could just loop the activations without that. Now it is more compute intensive on training, at least when done in the. If you're. Sorry, if you're applying sort of a gradient descent all the way through, it makes sort of like the computation graph for gradient descent, like more annoying to degrading descent on for like some structural reasons I don't know if we should get into. But yeah, roughly speaking. So I think it is more computationally expensive at training time and would be more computationally expensive if you're reading something. Yeah, there's all these cursed technical reasons for why this is true. But basically if you were to apply this sort of looping when reading, then you would only be able to read one token at a time. Whereas transformers can sort of process a whole body of text in parallel. So like transformers are extremely fast at reading. So you can make it So a transformer sort of reads a document that's like a million tokens long in. I think in principle, if you're willing to scale up the compute in like maybe in like a minute or 30 seconds, which is very, very, very fast. And plausibly you could even do faster than that because it basically is like reading the whole thing in parallel and there's a smaller number of serial steps. But if it's reading the tokens one at a time and you sort of have to do all the layers, then sort of, it would be the same as generation speed, which is more like. Generation speed in a single context is more like, you know, 100 tokens per second by default. Yeah, though people have exhibited faster to speeds. So there's ways in which it's costlier. But I think ultimately like the costs are not that high and a lot of these costs are already being borne by the reasoning paradigm.
Rob Wiblin
Kristin, at the point that we actually are able to largely automate AI R&D, how do you think that process would play out? What would it look like? And I guess what are the different ways that it might play out?
Ryan Greenblatt
Yeah, so I think there's this big question which is, okay, suppose that the AI company has fully automated A, R and D. The research scientists are like, you know, even the best research scientists don't add much value. Maybe they add a tiny bit of value. Value, but basically the company is fully automated. Like I think there's been a historical view in like some parts of the, of people trying to do AI forecasting that you'll get very fast progress at this point because the AIs are automating R and D, it can run faster than it was when humans were doing it. Now there's a question of how much faster. In addition to that, there's a question of does the progress slow down? So it might be that, you know, the AI sort of are automating A, R and D, but they eat up a bunch of the low hanging fruit. Fruit, you have a limited labor supply and they sort of. The progress then slows down because you're applying a lot of labor, but you can only get so far. Another question is like, do you run into a lot of bottlenecks on compute for experiments? Right. So you have all these AI researchers maybe way more than your human researchers, but maybe they don't have much compute for experiments and so they don't have that much of an easy time yielding progress. There's a question of how fast does it go initially and does it slow down now? In addition to that it might be even, you know, not just that the progress continues at the same rate. It could be that progress speeds up. A way this could happen is that you have your, you know, smart AI researchers, they do a bunch of algorithmic progress. You use that algorithmic progress to build a smarter AI, and that AI makes progress go even faster because you sort of have. You can do more labor and you can have sort of at the same amount of compute. So even on, with a fixed amount of compute that the AI company has access to, progress could in principle go even faster and faster, faster. And so Tom Davidson has done a bunch of modeling on this, on do we expect progress to speed up or slow down. And I'll be stealing a bunch of stuff from him while talking about this. And so people have called this sort of like the intelligence explosion or the singularity of progress is speeding up. I think it's important to note that my view is like, even if progress is slowing down, it might be objectively very fast. It might be like you started at a high rate of progress, and then it slows down of time. So one way of breaking this up is, so first we have to talk about the question of how fast is progress initially? And then maybe we should talk about does it speed up or does it slow down? And then from there we can be like, well, how much progress do we get, let's say in the first year.
Rob Wiblin
Yeah. What determines the initial speed at the point that you switch it on and are automating almost everything.
Ryan Greenblatt
Yeah. So, I mean, I think the very short answer is no one knows. The slightly longer answer is, is we can try to get some sense based on just having a sense of what algorithmic progress is driven by. So algorithmic progress in AI companies is driven by two main factors. It's driven by labor, people working on it, people thinking of better algorithms, people implementing experiments and compute. So running compute for using compute for experiments. I'm going to separate out actually training the final model for now. So we're just going to talk about this sort of algorithmic progress. And so historically, algorithmic progress has maybe been going up about, about, I think over 3x per year, including post training. Maybe it's been more like maybe even 4 or 5x per year. And when I say 4 or 5x, what do I mean? Like what units? It's in terms of effective training compute. So what I mean by that is it's like every year it's as though you could train a model that's four or five times bigger with the amount of compute you have. Okay, so that's the initial rate of progress. And now what I'll talk about is like, how much faster can the AI researchers make this happen? So this is a bit of a tricky question to figure out, because we have to answer the question of if you make that so that there's way more and higher quality labor, where do we go from there? Right. Because we have these two inputs into production, labor and compute. And if we massively amp up the labor term, do things just bottleneck in the compute, or can you push progress much faster? So I think a naive way of. So to start to do the modeling, we have to be like, how much labor is there? How much AIs, how good are they?
Rob Wiblin
How fast isn't the compute available for experiments might even decline because you now have to use your compute to run your AI. AI researchers, right?
Ryan Greenblatt
Yeah, for sure. I think this is probably a small factor because my guess is that, I mean, we have no idea. But my guess is that the optimum is sort of like of the compute on algorithmic progress, a fifth on AI labors and four fifths on running experiments, Something roughly like that. And so if you're imagining this amount of compute, then, well, it's okay, sure, you have less computer on experiments, but okay, quantitatively it's 80% as much compute, so not a big deal. So I think this is not that important part of the picture picture. And even if you're imagining 50, 50, you know, okay, that's just a factor of two. And so if you're like, well, you spend all your compute running AI researchers, you have no compute for. For. For experiments. That's just an unforced error.
Rob Wiblin
Yeah.
Ryan Greenblatt
So you know, how many AI researchers, people have done various estimates of how many AI researchers do you expect at the point when you can first automate things? So you definitely have to have enough researchers to automate everything. But for various reasons, I think we kind of expect that you'll have more researchers than you needed to automate everything. Because at the first point, you can start automating everything thing. You probably have like, way more labor at that same level of quality. I think inference time compute could make this different. It might be that inference time compute means that at the first time you can automate everything. You can barely automate everything. But I think this is not that likely to be durable. Right. So the first time you can do this, probably you can radically reduce the cost pretty quickly using, like, stuff like we were talking earlier about distillation. So overall, my sense is Like, I don't know, I've done some like, kind of trashy estimates. I'll try to go through one of them. Maybe we have just out of the box, we have like the equivalent of 100 million human equivalent laborers because we expect that we're training on. Maybe it's about like 1E28, 1E29 flop, which is roughly what we expect in sort of the like 20, 29, 2030 period to be available. And then if you just do that and then you get a sense of how many tokens you'll be able to generate, and then you try to do some rough conversion between tokens and human labor, then, okay, maybe it's the case that you have 100 million AI laborers. That's also taking into account things like the AIs do not sleep, they do not get tired, they can work 24 7. And you know, the data center can run 24 7. So maybe you have 100 million workers. But then we're like, okay, maybe you got some of that with inference compute. So, you know, maybe we drop an order of magnitude because you got some of that on Inference compute. And then, okay, maybe you got some of that because of the AI sort of, you know, you got some to Alec Radford quality. So to do full automation, I think you have to have near. Alec Radford's famous AI researcher has many of the most important capabilities, insights Ilyas at Skeeva or whatever. To get to this level of quality, maybe you have to spend even more inference compute. I think it's useful to denominate in terms of units of top research scientists or top research engineers, because I think that'll make some of the conversions easier. Let's say you have a million Alec Radford equivalents in parallel. Okay? But then there's another factor which is the AIs can run faster, right? So, you know, they, for example, they work at night and that gives them some advantage over human laborers because they can do like serially more experiments, right? So humans, you know, because of the serial time, just get less done in a year because they're, you know, only working maybe a third of the time or, you know, for like the mortals, maybe a fourth of the time. And then like some people can push up to like, you know, half of the time potentially. There's some, you know, diminishing returns on focus in hours. And then they can also just run faster because they just, you know, spit out tokens faster. And there's some ways to make this go further. So maybe my Overall sense is that it's though when you, it's like Maybe they're like 5x faster at each given point in time and then 3x faster due to running at all hours. That's a 15x speed up. In addition to that I think you maybe get another 2x speed up because some of the time you can run the AIs. You can run a dumber AI that's much faster for some subtasks. And humans can't do this as easily because it requires context to switching. Right. So in principle you could imagine sort of having humans use a dumber AI for some subtasks very quickly and switching back and forth. But you can't like exchange my brain state with the brain state of the weaker AI. Whereas for example with transformers you can just totally feed the context to a weaker AI. You can train the weaker AI to work with the smart AI. You can even do stuff like shove the activations of the smarter AI into the weaker AI and do all kinds of variable compute scaling things at runtime like this. So maybe that gets you another factor of two. So now we're up to 30x speed. And to be clear, these speed ups are going to take off. They're going to shave off the number of parallel copies we have. And then I think you maybe get another factor of two from the AIs being better at coordinating than humans. So I talked about maybe they can interchange context with weaker AIs. Well maybe they're also just much better at coordinating across parallel tasks. And let's think about this in terms of a speed up so they can take a task that maybe it would be infeasible for humans to paralyze. So humans basically sometimes when you do an eight hour software engineering task, you could in principle have five people work on it all in parallel. But you lose a lot on efficiency and maybe get no serial speed up because humans are so bad at coordinating. But maybe the AIs can sort of have all the same context because they can fork off of the same point. So you like start with some AI, you fork off of it. There's a nice Drakesh article and all the structural advantages AIs might have and you know, it goes into this sort of thing and because you can fork, maybe you get more speed up. Let's say that's another factor of two. Okay, now we're up to 60x speed, right? Okay, so we had our million AIs at 60x speed speed. So that's like let's you know, make 50,000 let's go to 50,000x 50x speed. Sorry. And so then we have 20,000 AIs in parallel, parallel instances, each running at 50x speed. And all of them are as good as, you know, the top research scientists, the top research engineers. Now, how much of a speed up is this over, like, let's say, you know, OpenAI or whatever. So maybe OpenAI, at the point when they're building this AI will have somewhere between like, you know, 2,000 to 5,000 researchers. The number of researchers are growing over 20 time. Okay, so naively we have 10x more parallel instances, but they're also 50x faster. And so then there's some messy conversion between how much additional labor you're putting in to what overall speed up you expect, taking into account the fact there's compute bottlenecks and other things, and also the fact that there's like, penalties for running in parallel. So like, nine software engineers cannot make a thing happen that would have taken nine months in one month. The same for the babies.
Rob Wiblin
You can't have nine women have one pregnancy and one one month.
Ryan Greenblatt
And so I think a thing humans suffer from is parallelization penalties. And so the fact that the AIs run much faster mean in some sense they suffer less from this. Right? So there's more parallel copies by a factor of maybe 10 or so. And so they're eating some return on that. But you have also just straight up 50x more speed and also more quality. And the quality pushes into the parallelism as well. And so then I'm like, okay, so maybe we should really think of the OpenAI labor force as being like, as good as is. Like, you know, maybe like 5 or 10x fewer people that were better. So it's like, maybe it's as though they had like, you know, 200 or 400 Alec Radfords or whatever. And, you know, I think some people think it's even more extreme than this. And then if it's like, well, you know, they have 200 or 400 Alec Radfords and we have 20,000 Alec Radfords at 50x speed. I think, like, intuitively it feels like things could get crazy. But the question is just how much does the compute bottleneck and people disagree a lot about on this? We really don't know. No one has run the experiments that we would need to find out how big of a deal this is. We just have, like, surveys and vibes and whatever.
Rob Wiblin
What experiment would you run?
Ryan Greenblatt
So a thing you could run is you could try to say like here would be my favorite. So Google is known for having a large number of different teams. And I think probably at some point someone messed up the compute allocation to some team or there was some exogenous shock causing the compute allocation to some team to be lower than it was supposed to be or to be higher than it was supposed to be. And then you could look at the question of when that happened, how much did progress speed up or slow down? And that would give you some sense of what the marginal production function looks like, what the marginal returns to compute look like. And that would give us at least some sense of what's going on. Now in the AI case, we're sort of operating very far off of the human margin because we have so much more labor. So the situation might be very structurally different, but that would give us some sense. And I think my dream is sort of that like someone does, goes to, you know, GDM or whatever and scrounges up the data on all the natural experiments they must have been running and in sort of like a very like economist style analysis on that and figures out what the local returns look like. Yeah, that only tells us so much because it's only the returns around the current regime. I think even better than that might be like things like have a small team of researchers who you give way less compute to. So, you know, if Google is really into running experiments, not just giving us data, they could, or, you know, I pick on Google just because that's the example. But other companies could do this. They could take some of their researchers and split them into two groups or, you know, split them into more groups and have some of the researchers get way less compute, get more like the amount of compute we expect our AI researchers to have per instance and see how much slower they operate. And I think if it's, you know, if it's way, way, way slower, that would give us a sense on the regimes. I think this is a trickier thing to, to, to, to, to understand partially because there might be adaptation time. So it might be like, okay, you put the humans in this regime with way less compute initially they're way slower, but they sort of learn to work within those limits. And I think the AIs will have lots of time to work within, to learn to work within those limits because they're running so much faster anyway. So, so regardless, my sense is that the initial speed up, sort of the instantaneous speed up of your AI researchers will be, you know, ballpark arc maybe it's about like, you know, when I Take all these things into account and try to do the math on the production function. Maybe I do something like a Cobb's Douglas production function with some factors, and we try to like, have a parallelism penalty that we apply both to the humans and the AIs, and we like, normal, normalize the labor force. There's a bunch of messy stuff here. I think that, like, the inside view fully, like extrapolating from the current frontier econ model spits out numbers, like depending on exactly how you do the estimate. I think maybe my favorite picks of the constants are, are like around 50x faster. Now. I think this is probably overestimating the speed. So now that's 50x faster than the current rate of progress. The current rate of algorithm progress is like somewhat over half an oom per year. So naively, that would get you some truly ungodly instantaneous rate of 25 ohms per year. Okay, so I think now I think people might be like, okay, come on, the thing you're saying, that's ridiculous. I think I'm like, okay, yeah, the thing I'm saying, it's a bit ridiculous. So maybe we want to discount this view of the instantaneous speed up a lot. Right. So rather than having, you know, the equivalent of 50 years of progress or like, you know, one year of progress in one week, I'm like, okay, maybe that's too crazy. And then I think I end up sort of dividing down to maybe it's more like, you know, 20x 20x rate of progress, maybe even a bit lower than that at the instantaneous speed as sort of my median guess. And again, I think this is like, so wild speculation. Like we're like extrapolating from a regime that we don't even understand trend to a wildly different regime. No one knows. So it could be much faster, it could be much slower, or it can't be that much faster, I guess, but.
Rob Wiblin
Yeah, okay, and so you've got this at the point that you fully automate it. It sounds like it could be blisteringly fast at that moment. But I guess one way of making this sound less crazy is you say it starts out incredibly fast and then starts flattening out quite quickly. So you only have one week of this level of blistering progress. I guess the alternative is it could go even faster. You were saying that is also a live possibility. Yeah. Do you want to explain what evidence would bear on whether we expect it to slow down versus speed up?
Ryan Greenblatt
Yeah. So another thing on that is I've sort of Been doing this instantaneous analysis and I think people might be like, well, okay, sure, maybe you would get that if you drop these in. But it'll be more gradual in the lead up to this. So for one thing, in short timelines, I think we should expect the gap between substantial acceleration, but not full automation, and full automation is small in calendar time. And if you're expecting that the substantial automation speeds things up, then it's even smaller in calendar time. And so I think this instantaneous analysis is at least non crazy. Regardless, the question of, okay, does it, does it speed up or does it slow down? So if we had this sort of 10 or 20x progress rate, then we'd be talking like the instantaneous rate is like 5 or 10 ohms in one year or is a magnitude of effective compute progress. Now, does it speed up or does it slow down? So now this analysis is even trickier. There's a lot more factors and does it speed up or does it slow down? So the basic story is you have your AIs, they do a bunch of algorithmic research, they train a new AI, that new AI is smarter and better and more efficient, that new AI or some mixture of those attributes, that new AI does even faster algorithmic research. But the returns have also diminished, right? So the returns are diminishing. But also you have smarter AIs and you can get either super exponential progress. Progress. Exactly. Exponential progress or you know, exactly. Exponential progress is sort of like it continues at the same rate, right? So the progress was already exponential in effective compute, or you can have decaying progress. And so the way that we try to get an estimate for this is we try to have a sense of, okay, we have been dumping in more and more human labor over time into things like, you know, computer vision, LLMs. And we can try to get a vague sense of like, okay, when we've been dumping in all those researchers temperatures, how much has that accelerated progress? And then we do a bunch of adjustments for the AI case. So maybe we have a conversion from dumping in AI labor to how much more effective compute that keeps getting you. Which we also needed the same sort of analysis to get the initial speed up. And if we have that, then the question is, okay, how much more labor does each effective compute get us? So each 10x of effective compute sort of gets us more labor. It also gets us more capable labor. And then sort of that can loop back in. And so there's a bunch of, you know, a bunch of math here. And again, I think we have even more uncertainty on this component plausibly than the previous component. But I think the best estimates indicate that at least initially, progress will speed up rather than slow down. Okay, probably. I mean, you can, you know, you can roll disbelieve on this or whatever, but I think if you just do the naive analysis, you try to account for the factors, you try to account for the compute bottlenecks, you try to account for like, you know, parallelism issues, you try to account for all the stuff, it turns out that it just makes the AIs more capable and smarter fast enough that very roughly, on our very trashy models, we expect progress to speed up reasonably quickly.
Rob Wiblin
So if this is right, we're like blowing past human level incredibly quickly into a totally superhuman regime in terms of just how capable these models are in general. Am I understanding right?
Ryan Greenblatt
Well, it's kind of complicated. So there's a question of. So how many orders of magnitude of progress do you think get? And there's a question of what is the qualitative? How does that Qualitative. Yeah, how much does it matter? Right? So I'm throwing around this effective compute unit and I think this is the problem of being like a very econ brain unit of analysis, and people are like, okay, come on, how much is an effective compute? Like in order of magnitude of effective compute even? How much does that matter? Now we were talking about that earlier in the discussion. Like, how much is like, you know, an effective compute order of magnitude of effective compute between like deep seq v3 and GRO. And we also care about like, you know, does the qualitative trend continue? What is the right qualitative trend? How superhuman can things even get, this sort of thing? I think I want to spend a bit more time on one thing about the accelerating progress, which is that I think everyone expects or you should expect, that the returns, you know, eventually must diminish, right? So another key factor is limits, right? So, you know, progress can only go on so for so long, right? You cannot get 100 ooms of progress because at some point you're like, ah, yes, you know, the laws of physics bite. Yeah, the laws of physics bite. And also more importantly, perhaps the amount of compute you have bytes, right? So, you know, you only had so much compute. I've been talking about all this analysis sort of on a fixed compute base. Here's a naive bear case for efficiency. So imagine that, you know, you got 10 ooms of progress on algorithmic efficiency as of now. That would naively imply that you could train deep seq v3. So 10 ooms is a factor of a billion or. Sorry, it's a. Yeah, you know, 10 billion. It was trained for $5 million. So that's like, you know, well, less than a cent.
Rob Wiblin
Right?
Ryan Greenblatt
Okay, so. Or like, say, just less than a cent or. Well, whatever. It's, you know, around. It's certainly very little. It's very little, Right? So it's. Yeah, a bit less than a century. And so, okay, so like, okay, come on, are you going to be able to train deep CP3 for less than a cent? Right? So that's like, you're doing seconds on an H100 less than seconds on an H100. I'm like, come on, guys. If you, like, are like, how many parameters can that be? So on that same ballpark, it's like, oh, you know, how many numbers can you even multiply? Right? So, like, it's just like, you can only have touched so many parameters. Right. Like, we can, like, if you just do all this, I think you should be very skeptical. Now, one thing that's worth noting, coding is, I think limits up might be different than limits down. So it might be that you can only make things so much more efficient, but it might be that you can make things scale better. Right? So it's like, okay, sure, you can only make it. So deep seq v3 is, you know, maybe five orders of magnitude more efficient, four orders of magnitude more efficient, even in the limits. But maybe, I think probably a little more than that. But, you know, somewhere around there. But maybe you can make deep seq v3. You know, there's some scaling trend. So, like, there's a question of, like, how good would deep seq v3 be if we scaled it up by a factor of, you know, five orders of magnitude? Maybe we can, for deep seq v3 level compute, go up five orders of magnitude on the deep seq v3 scaling law. Does this make sense? This is a bit tricky.
Rob Wiblin
No. Yeah, yeah. Maybe explain that. Like, I'm a bit of an idiot.
Ryan Greenblatt
Okay, okay. So for every model, sort of, there's some way to naively scale this up, both on RL and data and whatever. Now, there's a bit of complexity around this, and it's a bit of a tricky analysis, but we could say, okay, how good would we have been if we took the deep seq v3 algorithm algorithms and we scaled them up five orders of magnitude and, like, sort of adapted to that amount of compute and, you know, didn't. Didn't mess up that training run?
Rob Wiblin
Yeah.
Ryan Greenblatt
And so it Might be the case that it's much easier to replicate what we would have been able to do with five orders of magnitude more compute than deep seq v3 than to make.
Rob Wiblin
It so much more efficient.
Ryan Greenblatt
Make it five orders of magnitude more efficient. It's a bit tricky to do anchoring because we. A lot of the limits I'm defining in different ways, but minimally, I think, think like, you know, 10 orders of magnitude up on the deep seq v3 efficiency, as in you're doing with deep seq v3 training compute as well as if you were doing 10 orders of magnitude more compute on those same algorithms. Seems very plausible to me.
Rob Wiblin
Okay, what does that imply?
Ryan Greenblatt
Yeah, so I think there's a bunch of ways of doing the, like, analysis. I'll try to do a quick version of the analysis that's very, you know, it's very quick and dirty, but gets us something. So we trained our human level AIs or our AIs that are, you know, at the level of like, you know, broadleaf level of top human research scientists, let's say in 2029 or 2030. So maybe we had, like, maybe we did that on ballpark. Like, the training run was maybe around like, 1E28 flop. Okay. And we produce something at the level of humans. So we have some, like, very trashy estimates of human brain lifetime compute, like how much compute the human brain is using in a lifetime. And we think it's like, as in, like, if you had the algorithms of the human brain brain, and you, like, you were able to do that, like, how long would it take to train a human who's as good as, you know, the best human scientists and our senses, it's around like, 1E24. So that's four orders of magnitude of efficiency right there. Right. Because we trained something that was competitive with humans for more compute than humans. So four orders of magnitude on that, maybe it's a bit less, but ballpark makes sense.
Rob Wiblin
No, not completely. Yeah. What do you mean? It's four orders of magnitude of what? Exactly.
Ryan Greenblatt
So we were able to train something that's as good as a human, but it required us to use four times as much four orders of magnitude more completely. And so you might think, well, okay, at the very least, we can sort of get to the point where we can train a human for 1e24 flop, or like a human level model. And then we have four orders of magnitude of room above that to expand.
Rob Wiblin
Yeah, we have four orders of magnitude.
Ryan Greenblatt
More to expand, as in, imagine that we Advanced the algorithm so we can now train a human in for 1e24 flop. Now we have an additional 4ooMs of scaling available to us.
Rob Wiblin
That was available. Okay. Because originally we were using. It was so inefficient, the training. Okay, yeah, yeah.
Ryan Greenblatt
And an important thing here is like in shorter timelines we must be imagining we have more efficient algorithms, whereas in longer timelines where more computer is acquired, presumably we have less efficient algorithms. There's some interesting dynamic here.
Rob Wiblin
Why is that?
Ryan Greenblatt
As in, so imagine that we produce full automation of an AI company company in, you know, 2028, 2029, 2030. Then we must be operating in this sort of like, I don't know, around like 1E28, 1E30 training runs.
Rob Wiblin
Okay.
Ryan Greenblatt
On the other hand, imagine we're doing it in 2040 or 2045. Plausibly we could be, you know, having quite a few more orders of magnitude of compute. Yeah, I haven't done the math, but you know, maybe like, at least, like could be like four more orders of magnitude or I think like 2050 at least, maybe you could get four more orders of magnitude of compute by. You've scaled the fabs, you made them cheaper, you have new techniques, maybe you're using like optical computing and more speculative approaches. And so if you're training sort of the human level AIs for like one E36 flop, then you have way more headroom.
Rob Wiblin
I see. So we know that we can achieve the level of efficiency that the human brain has. And if it was taking 12 orders of magnitude more compute to reach the equivalent performance as a human, well, then you have an enormous amount of potential aggregation, rhythmic efficiency, gain in the limit.
Ryan Greenblatt
In the limit.
Rob Wiblin
In the limit. Okay, yeah, yeah.
Ryan Greenblatt
Anyway, so I think, I mean, I think a reasonable objection here is. Okay, come on. But we barely. We don't know what the human brain is doing. Can we even produce that level of compute? Also, isn't it the case that evolution did a huge amount of optimization, maybe that required a bunch of compute. And so even if, okay, yeah, in principle, you could have the human algorithm, which is like the human genome. Okay, well, finding the human genome, that itself would take a huge amount of research compute, because we have to run like simulations equivalent to what evolution would run. So this is some skepticism. Yeah, I'm basically going to put this aside and not address it and just be like, I'm skeptical, but I think it's going to be somewhere in between this picture. But I don't think that's a huge discount. And there's another thing which is that we don't think that humans are at the limit of efficiency. So there's many reasons why humans are inefficient. So they have physical brains under a bunch of constraints. So they can only do like local training algorithms for structural reasons about propagation of information backward. So they can't, like human brains basically can't do back prop very directly. And they can only do more local learning algorithms. And so our current best local learning algorithms are much, much worse than sgd. Of course, evolution, you know, had more time optimizing these local learning algorithms. So maybe that's a big factor. Maybe that's even like two orders of magnitude. And then maybe there's a bunch of, you know, there's a bunch of other factors. So another thing is that within humans, performance on the task of AI R&D varies wildly, right? So there's a huge variation between the median human and the best human on what I would, I would say say on ability to do this. Some of that's training, some of that's genetics, some of that's heritability from things other than, or sorry, upbringing from things other than direct training, like training on other tasks. So maybe that gives us another bunch of headroom, right? So you can imagine making 300 IQ humans not by having much bigger brains, but just by having more efficient brains with more of the mutations removed, possibly more than that. And so that gets you some more. And I mean, there's a long list of considerations like this. Things like, oh, maybe the AIs are able to, to like sync mind states more effectively, which gives them more coordination. There's things like maybe they can generate much better training data. I'm going to miss some of these. But anyway, I think when I add, when we add all these up, my guess is sort of like a median of like nine ooms up. We talked about the distinction between up and down. That's also going to apply in humans, right? So maybe you can't train a human for nine ohms less flop rates. You can't train a human for one E15 flop, which would be like a second on an H100. But maybe you can train something that's like nine ooms better than a human with human level compute.
Rob Wiblin
I see. Okay, that might be the most technical or challenging to follow half hour of the show. I was very happy to let you go. So people can get a sense of just how many moving pieces there are and I guess also how much thought has gone into this. It sounds like there are Quite a lot of people trying to forecast this time and trying to sketch out the different plausible trajectories and the different factors that weigh on it. Is it possible to kind of bring it back to something that someone with less technical understanding can grasp? Yeah, I mean is the bottom line people thought about it a lot? It's like, it's quite hazy. There's a lot of factors at play. It's possible that at peak AI R&D things could be moving very fast and it is plausible that it could even speed up. It could speed up as the AIs get better. It's also possible it could slow down. We should just be open to all of these different options.
Ryan Greenblatt
Yeah, so I think I would have said surprisingly little time has been spent thinking about this actually. So I think as far as I can tell maybe around like four full time equivalent years have been spent very directly on trying to build these models to forecast takeoff and applying those models to forecast timelines. Ballpark, maybe even less than this. Now there's a bunch more work that Epoch has done on sort of trends and other analysis that I'm pulling in. But I'm a very direct sort of like this sort type of analysis. I'm talking about sort of these takeoff dynamic analysis. I think it's. No, maybe it's more like, maybe at this point it's more like eight equivalent years. I think I, maybe I forgot I was not pricing in a few Epoch papers. But yeah, I don't know, maybe the Epoch people are going to call me out for underrating their hard work. But they've done a bunch of the background work of like the statistics I'm pulling in and a lot of the trends I'm pulling in. But, but I think there hasn't been that much work on sort of like the analysis here. So I think, I mean, you know, I'm like come on, eight person years. Like this is maybe like the most important question. Like one of the most important questions. I think I don't expect us to get that much signal on it but, but it does have a huge effect and it is a very big disagreement. I think a lot of people are sort of expecting that you know, progress peters out around human level or you know, it just is relatively slow or it's mostly bottleneck to compute. And I think the question of whether this is true or not makes a huge difference. One argument also I didn't mention there, which sort of I just brought up was like I was sort of imagining we just are just like flying through this human Regime with no important discontinuity or kink around human level. But it could in principle be that, you know, you sort of were able to get to the human level via sort of piggybacking or fast following on human behavior. My kind of guess is this isn't that big of a factor and it's just like, like a one time cost that's not that big, but that's, you know, I think we shouldn't get too much into that. Anyway, we had how fast is the initial speed up? Does it speed up or does it slow down? And we had what are the limits? And the limits sort of affect like eventually it must slow down. Right. So we have this model in which it maybe even initially is speeding up and it's like continuing to speed up and it's following this sort of hyperbolic trajectory where it's going to infinity in finite time. Okay. Eventually that must end as you're starting to near the limits. We don't know when it starts to slow down. Right. It's going to slow down at some point. But I think they all considered model is like things might be very fast. It could happen quite quickly. A lot of I think the estimates imply like my median is like maybe we're hitting about five or six orders of magnitude of progress in a year of algorithmic progress.
Rob Wiblin
And it's a bit hard to know exactly what qualitative impact that will have on how smart the models will actually feel to us.
Ryan Greenblatt
Yeah, for sure. So I think that is another big source of uncertainty is like, I think I've been doing this sort of very econ brain analysis where I put everything in these effective compute units and I'm doing a bunch of quick conversions back and forth to labor supply to get a bunch of things. And so there's a bunch of different ways of sort of visualizing this progress. I should also say there's a few factors I'm neglecting like oh, you're scaling up compute during this period and a bunch of other minor considerations. These are priced into my five or six ooms of progress in a year. But I don't think we should get too much into that regardless. Okay. I don't know. So I have this sort of intuitive, I mean to me intuitive model of initial rate speed up, slowdown limits and then limits affect like when it, when even if it's initially speeding up, when it starts slowing down again.
Rob Wiblin
Yeah.
Ryan Greenblatt
Does this model sort of make sense to you or.
Rob Wiblin
Yes. Yeah, yeah, I think that makes sense. Those are kind of the, the three Big stylistic factors that you're playing around.
Ryan Greenblatt
Yeah. And then there's a bunch of like tricky details about like, okay, well suppose the limit is this many ooms away and the like the like sort of the factor of like, is it speeding up or is it slowing down? How does that change over time? You might think like, okay, it's initially speed speeding up and the time at which that stops is like very close to the end of the limits. Or it could be that it's sort of more continuous across the limits and this will have a big effect on how many orders of magnitude you get. But regardless, I think, okay, that's, that's the sort of intuitive model. I think, I think people should like kind of play with this. I think like playing with this sort of model is interesting. I think it's pretty clear that sort of this is a. Both a simplified model and also has an insane number of moving parts that we have very little data to estimate. We're sort of fitting this model model in a massively extrapolated regime from trashy data. So you know, what can we do? And including data as trashy as guessing how much more efficient you can be than the human brain. So as you were saying, we're very, very uncertain and we have huge error bars on like, I guess my view is like, you're going to get some initial speed up and you're also going to be able to pile in more compute. And so, you know, maybe the 25th percentile is like you get like somewhat faster than previous years of progress. Maybe you get at like. Or I think the 25th percentile is possibly just barely faster than pre existing progress. And I think that like 80th or 75th percentile might be like completely insane.
Rob Wiblin
So this is the question at the point that we are able to automate things, how much does it actually speed up what the company was doing? And you're saying the 25th percentile of this is like, maybe it's kind of just at roughly the same rate as it was before. The 75th percentile, which is not even an extreme outcome, is that it's radically speeding up the.
Ryan Greenblatt
Yeah, quickly. Right. It might be that the initial speed up is not that high, but the speed up increases over time and diminishes relatively slowly. And also I've been talking about this one year timescale, but I think on a lot of the modeling, most of the progress might happen in the first six months because it's sort of, you've already started to hit this diminishing returns Regime kind of quickly. So it's. It's, you know, it's like the faster.
Rob Wiblin
You go, the sooner you start hitting limits.
Ryan Greenblatt
Yeah, that's right. And, you know, it could go pretty different ways anyway. Um, but, like. Okay, but. Okay. I've been saying six Ooms of progress. Like, what does that even mean? What does this look like?
Rob Wiblin
OOM is order of magnitude for anyone who didn't pick that up but is still with us.
Ryan Greenblatt
I'm so sorry. I love oom. What a good term. You know, it's. It's one of my favorite. You know, it's oom. It's got the.
Rob Wiblin
It's an automat up here.
Ryan Greenblatt
Yeah, yeah, yeah, it's great. Anyway, so six ooms, how much is that? So it's roughly two GPTs. As in, like, I think, like, there was broadly, like, an OOM between. Between GPT2 and GPT3 in terms of, like, maybe it was, you know, roughly 10x algorithmic progress and around, you know, 100x compute. Very roughly speaking, maybe a bit less than this. And something broadly similar between GPT3 and GPT4. So I think the naive qualitative model we can do is we can be like, well, how big was the GPT3 to GPT4 gap? And then we can be like, well, we have two of those gaps, two more GPTs. And then I'm like, okay, what does that mean? So I think. I think the 2 GPTs analysis sort of makes me feel more reassured. So I'm like, oh, two GPTs. Is that even that bad? I mean, come on. I think another framing is how many years of AI progress is this? So I think six ooms is like five years of AI progress. Very roughly speaking, maybe four. So it's like, you know, it's like going from 2020, we had, I think, just had gotten GPT2XL to now. So. So it's like, it's the gap between.
Rob Wiblin
But I guess it's hard to know intuitively what that means, because GPT2 was, like, pretty useless for anything.
Ryan Greenblatt
Yeah. Or GPT3 was pretty close by. Yeah. Yeah. So that was pretty useless. So I think, I mean, definitely, like, I think perhaps the thing that we even have less grounding on is how much this, this, this means, like, how much. How much does progress above the human range mean? Note that we're starting at the. Like, at the point we're starting this. The AIs are matching the best human professionals. Maybe they're not quite as efficient, not quite as smart, but via, like, various Tricks and, and whatever they can, they can basically match human professionals. Now how much further do you go? So there's the GPTs, there's this, I think there's some, another, another notion is sort of trying to convert from the GPT is to like some like IQ or some like notion of that. I think people have wildly different intuitions here. But like if we imagine that we were sort of starting at our like 150 IQ AIs because they were able to automate everything again IQ, kind of a trashy unit.
Rob Wiblin
Maybe it wasn't designed for the same purpose. Oh no, yeah.
Ryan Greenblatt
I'm like, nothing was designed. Like we're also, I think we're, we're abusing the shit out of these econ models. Also. I've been doing all this like econ style analysis on econ models that I don't know if they're designed or. Sorry, they definitely weren't designed with this regime in mind, whether they can be stretched. And growth economics, which is sort of the field that we're pulling from, is just not that good of a field. Sorry, no offense to the growth economists out there, but there's just not that many people working on it. And I think it's, we have a lot of uncertainty over a lot of things there Anyway, so, so there's the two GPTs. How many IQ points is that? This intuition makes me think, you know, maybe a GPT is a bit over 50 IQ points or something. And so we go from 150 to 250. And also we have many more parallel copies and they can run faster. So I think this, you know, these are some intuitions. I think another intuition is how much better are they in terms of human professionals. So here's a trend that I think is good to track. So if you look at like programming competition, we've been seeing progress in terms of ranking on those programming competitions over 2024. So at the start maybe the AIs were like, I think like 20th percentile roughly and then they were at 50th percentile and then they were at 01 was like 75th. 01 preview was like a bit over 90th percentile. And then I think 03 was like 99.8th percentile or something. So there's some relationship between orders of magnitude of compute or algorithmic progress, progress and what rank ordering you have among human professionals. So at the point when we're starting this crazy stuff, maybe the AIs are like broadly like, they're like sort of like, hundredth or tenth best human professional rank ordering. And then we have these sort of six orders of magnitude of progress. I think that there's like, some conversion we could try to have between orders of magnitude and ranking, where it's like every order of magnitude maybe means that you're like 10x better on this ranking. So instead of being. Being like, you know, the thousandth best, you're the hundredth best. My guess is it's a bit over, like, an Oom of effective compute is somewhat more than an Oom of this sort of rank ordering. I think no one has done this analysis very carefully. Someone should do it. Suppose that it's like a little over an OOM, then maybe it's like, okay, with our six OOMs, we get eight OOMs of rank ordering. Okay.
Rob Wiblin
And so pretty soon you're below one, right? What? Pretty soon you're below.
Ryan Greenblatt
You're below one. So now we're extrapolating this thing. And so one way to put this is sort of quickly get to just human parody. And then maybe we have like another, I don't know, bit more than six Ooms left still. Or like, say, best human, like the literal best human parody. And then we have another six ooms of progress. And so when we can be like, well, it's as though it's as big of a gap as going from the millionth best human at a thing, you know, million is six Ooms to the best human at a thing again. So it's like we took the best human and we did the equivalent of going from the millionth best to the best.
Rob Wiblin
Yeah.
Ryan Greenblatt
And like, that's another qualitative intuition. I don't know. I don't know how much that tells you, but it's like there's some. There's some extrapolation there. You can do. This is brazenly ripped from Daniel Cucutello's way of thinking about the ooms. Now, we also have uncertainty on this point. So I think if it's more like each Oom is two Ooms, then it's like, well, maybe it's more like you're like over a billion X better than the best human professionals.
Rob Wiblin
You've gone from the billionth best at something, or you've gone from the billionth best to the very best, and then you've made that leap again.
Ryan Greenblatt
Yeah, yeah, that's right. Which is like, kind of feels like.
Rob Wiblin
Quite a large gap.
Ryan Greenblatt
Yeah. So I'm like, I mean, billionth best, I think. So importantly, like, I think that. That you can't understand the Billionth best in the human range. Because it's like, I think it's like, doesn't make sense to generalize out of, just out of, out of, out of a career. Like, like, I think being like, oh yes, who's the billionth best person at software engineering? I'm like, okay, come on guys, like this isn't like, like this is a silly question. This is a silly question. And I think the millionth best person at software engineering is now like at least somewhat meaningful, right? That's like, you know, we can, we can, we can start working with that and more niche human professionals. It's like human versions, it's less meaningful. So think I, I think we have this kind of insane gap from that. Another intuition I like is sort of thinking about how big is the labor supply. So a lot of the econ analysis I was doing earlier about does progress speed up or slow down? An important question was how much does each order of magnitude of effective compute get you in terms of more cognitive juice to throw up problems in terms of how much can you feed into the labor part of the production function? I think that, that if we convert into parallel workers, so one way to do it is we can just be like, okay, how good is an order of magnitude of compute relative to how many orders of magnitude of parallel workers is that equivalent to? So my understanding is that our best available estimates are like every order of magnitude of effective compute is like two orders of magnitude of parallel workers.
Rob Wiblin
And this is because having lots of people work in parallel is actually quite inefficient.
Ryan Greenblatt
So it's partially because it's ineffective efficient and it's partially because the AIs are faster, more capable and you get more parallel copies. So when you scale up effective compute, at least in the current paradigm, you have more efficient AIs that are smarter potentially. And so you can basically be scaling all these factors in parallel and you can be scaling whichever factor is most effective.
Rob Wiblin
I see. So you get to choose. Do you want to have, you get to allocate your compute budget between having more of them and having smarter ones in the most efficient combination.
Ryan Greenblatt
Yeah. And you can sort of gear your training runs to be like, oh yeah, are we training like a bigger model or are we training like a smaller model? And there's some trade offs between all these things that, that's kind of complicated. And there's ways to trade off inference compute and training compute as well. But all considered, like, I'm going to do denomination and parallel copy. So we started with sort of this like 20,000 geniuses running at 50x speed. And then okay, we had six orders of magnitude, but we're actually doubling that. So we have 12 orders of magnitude, that's a trillion. So now we're going to 20 quadrillions running at 50x speed. So now I think this is like perhaps a bit misleading because an important component is the parallelism bottlenecks. But if you were used to thinking in terms of human organizations, then I think you should think of like 20, quadrillion humans running at 50x speed is right. And the amount of stepping on toes is like sort of analogous to that substantial. And then in practice maybe it'll be more like the thing I actually more expect. Maybe it's qualitatively closer to like a billion or 2 billion humans that are way smarter than humans. So like 250 IQ humans running at like 100x speed, that's like, okay, probably my numbers are a bit sloppy, but I think that's more like the intuition I expect. And then like, or you could do the same in terms of the professionals. So it could be like, it's as though we have like, I think you have to be careful not to overcount, right? So like part of the mechanism via which they're much better at human fashions is having more of them. And so like all these things are going to fun across. But like maybe it's as though we have like we had this like you go from millionth best human to best human and then million above that. We sort of like, you know, do the same extrapolation. Maybe it's as though we have like, you know, millions of them at least running at 100x speed, which is like, I think like, okay, this is like fucking insane, right? Like, I think this is like for example, very quickly the AIs will do more cognitive progress on problems than has been applied in human history by huge margins. And, and like very naively like they're running at 100x speed. So it's like, well, if there's something that you could have done purely in the domain of cognition, like purely without access to the world, that would have taken humans 10 years. And it would have taken like a team of 100 humans 10 years. I'm like, okay, boom happens in a tenth of a year with just a tiny fraction of the labor supply. And so I think we should start being like, what kinds of crazy technologies will be spit out of this process? So there's a bunch of things that, that I think could in principle be accelerated Massively that we haven't even tried that hard at. Effort has been spent on atomically precise manufacturing. Not that much effort has been spent on nanobots, nanosystems, whatever. I think Drexler, who originally thought about this, thought it was going to be very little labor or. Sorry, thought that it might be very easy for humans to do, but very little effort has been applied. And so it seems very plausible that you come out of this regime very quickly with, you know, atomically precise manufacturing that allows for, like, massively increasing the compute supply and all kinds of other crazy things. That's like one example, like, I think, like emulated minds. A ton of other things could happen pretty quickly.
Rob Wiblin
Yeah, I guess to wrap up, it'd be good to do a bit of discussion of what you think are the highest priority things for the sorts of people who listen to this show to be working on. Given your kind of predictions or your enormous distribution of predictions about different ways that things could. Could run. On the technical side, what are some of the things that stand out to you as particularly neglected and useful?
Ryan Greenblatt
Yeah, I think more people should do control work relative to what's going on. My colleague Buck is probably going to talk more about what that would look like, so maybe I won't go into too much detail there. I think that's now a lot less neglected than it was, but still seems good to have more people working on. I think more people should spend their time thinking about and working on. How would you train, like, I guess, AIs that are wise and are able to make decisions that are better than the decisions we would make. And basically how you would get sort of in a point where you have ruled out the models, like plotting against you. How would you make them be the AIs that you're happy to hand over? I think this is like a much more conceptually thorny area. And I think I'm planning on spending more time thinking about what research projects should be spun out of that. I think there's a decent amount of work on what I would call model internals that people could work on. So things like. Like maybe this is falling somewhat under control, but things like probing to make it so that we can detect if the models are taking misaligned actions. How would we do that? How would we know if it worked? This sort of thing? There's some work on sort of decoding uninterpretable reasoning. So things like suppose models learn to reason steganographically in their chain of thought. Suppose models.
Rob Wiblin
So this is that they're scheming against you, but you can't tell. It's kind of encoded.
Ryan Greenblatt
Yeah, steganographically is like they're sort of using different symbols and you don't understand what's going on. Or maybe it looks like gibberish to you. You. Maybe it looks like one thing, but actually is another thing. Or maybe it's like they're doing a lot of latent reasoning. We talked earlier about maybe models doing a lot of reasoning in a latent way rather than in natural language and being able to decode that reasoning in some way and trying to figure out some methods for training and decoders on that that work somewhat and give us some sense of what the AI is actually thinking. I think could be pretty helpful. In addition to this, I think a bunch of different work on demonstrating AIs are very capable. Now, like I talked to some about how I think there's overhang in the level of capability that has been demonstrated. And I think demonstrating that current systems are capable and future systems are very capable seems probably somewhat good at the margin because I'm worried about situations where the world is not very prepared for what's going on. So things like demonstrating high levels of autonomous cyber capability, which I think has. It's a, it's a sweet spot of both being like, you know, directly relevant to a lot of threat models people are already considering. And also is not that far from the scenarios that we're worried about, which do involve a lot of autonomous cyber activity. And that is actually a key part of the threat model. So it maybe bridges this divide in a nice way and especially focusing on what is the demo that is the best demo that we will ever be able to achieve in this realm. Another big area that people should work on is what I would call model organisms, which is trying to. To produce empirical examples of a misaligned model, either to study how likely this is to arise and present evidence about that. So things like, well, does misalignment arise in XYZ circumstance? Does reward hacking emerge and how does it generalize things like the alignment, faking paper and various continuations of that. And I think part of the hope here is gathering evidence. Part of the hope here is just having something to iterate on with techniques. So even model organisms which aren't very convincing to the world, or maybe don't produce any evidence about misalignment one way or another, if they're analogous enough that we can experiment on them, that could be potentially very useful because you can.
Rob Wiblin
Try to develop countermeasures that work in the model organism case that then hopefully will transfer.
Ryan Greenblatt
Yeah, I mean, I think a key difficulty with alignment overall is normally we solve problems with empirical iteration and to the extent that a lot of our alignment failures make our tests deceptively, then well, if we can build some way to get around that in advance or just be ready to build it in the last minute and then do a bunch of iteration in those kinds of cases, I think that could be pretty helpful.
Rob Wiblin
Yeah. Okay, so that was what seems most promising on the technical side. Are there things that stand out on governance or just other angles?
Ryan Greenblatt
Yeah, I mean I think there's a variety of different room for non technical interventions that seem pretty good. I think it's hard for me to have very strong views on these things because I don't spend that long thinking about it. I think there's a bunch of work. So we've sort of gone through a lot of conceptual points here and I think there's room for people working on just like figuring out all these details, trying to have a better understanding of takeoff dynamics, trying to have a better understanding of different considerations other than misalignment that might come up. Things like how worried should we be about human power grabs, how worried should we be about other issues? And so I think there's some of that. I think there is a decent amount of work on just like you know, acting as an intermediary between sort of like the very in the weeds technological AI safety and like the world of policy and trying to like translate that to some extent. There's a bunch of like specific regulation that could potentially be good. Like I think making the like EU code of practice be better seems good. And like, you know, the EUA office is hiring so you could work on that. I think like there's like maybe other strategies for regulation that could actually be good. I think there's some stuff related to like making coordination more likely or assisting with coordination that could be pretty helpful. Things like, you know, improving the compute governance regime so that the US and China can verify various statements made about the current training process. I don't have a strong view on how promising that is, but I think surprisingly few people are working on that and that's surprisingly uncoordinated. So maybe someone should, should, should get on that because it could, could potentially be a pretty big deal. Yeah, in addition to that, I think just like having a lot of people in positions where they're sort of, you know, just trying to provide technical expertise, trying to be ready to go building skills or like, you know, they're currently building skills, they're currently sort of getting ready to have, you know, direct, more direct impact and will later, as stuff gets crazier, be ready to do something then. Yeah, no, another one is just generic defense. So we talked earlier about AI takeover scenarios. A bunch of the AI takeover scenarios, scenarios I was saying involve, for example, bioweapons. And so just generically improving robustness to bioweapons seems like it helps some. I think it's kind of complicated the extent to which it helps, but I think it helps some similar for making the world more robust to AI as hacking stuff. I think that helps some. I think it's probably less leverage than other things, but I think interventions that steer more resources to those things, I think they seem good from a wide variety of perspectives and under potential. Actually different assumptions about misalignment. I think those things maybe make a lot of sense even with no misalignment risks at all, for example.
Rob Wiblin
Yeah, I guess because misuse is also an issue.
Ryan Greenblatt
Yeah. And in addition to that, there's a bunch of different work on security that could be good. So some of the threat models I was discussing involve various outcomes like the model exfiltrating itself. They involve the model sort of like being internally deployed in a rogue way where it's sort of bypassing your security and potentially using a bunch of computer. It's not supposed to. I think pushing back the time at which these things happen, happen via security mechanism seems good. I think that also security to prevent human actors from stealing the model could potentially increase the probability that there is delay and increase the probability of less racing, more caution.
Rob Wiblin
Yeah. And what are the priorities for your research over the next couple of months?
Ryan Greenblatt
So right now I'm doing a decent amount of planning and conceptual work and then the plan for that is to then spin off a bunch of projects. So I'm thinking through questions like, okay, what should you do in this sort of scenario where one more irresponsible AI company is three months in the lead? There's very low political will and trying to say, okay, what's the full list of potential alignment measures that might be promising? What is the route you should take? How should people prioritize? And then trying to maybe make both concrete recommendations and also just figure out things at the margin, basically, with the aim being that it's just, I don't know, it just seems like I think Redwood has had reasonable luck just trying to make overall plans and then spinning off some insights from this. So I think control came out of this. This I think that I've had some updates based on thinking this through more. That's one thing. And then I'm working on some sort of demonstrations of or trying to look into how big of a deal reward hacking is right now. So just recently we've sort of seen RL work when it wasn't really doing as much before and wasn't being scaled as far. And so one natural question is, how much reward hacking are we getting? How egregious might that be? In what situations is it more or less egregious? Egregious. There's been some prior work on this, but I think now that this is really going quite far, we might expect that we potentially see very egregious reward hacking, and we might see threat models that are just driven purely by reward hacking all the way to very egregious outcomes. Like, in principle, doing things like massively diluting humans or potentially trying to seize control of assets could potentially be generalized from reward hacking. There's also stories via which reward hacking leads to very pernicious misalignment because you started with an AI that was disobeying your instructions, and that sort of crystallized in some way that involves the AI conspiring against you, even if it's not as directly for something that we potentially have as much control over as the oversight signal.
Rob Wiblin
All right, well, I guess you and your colleagues write quite a lot on the Alignment forum, and I guess you have a substack. What's the address for that?
Ryan Greenblatt
The substack?
Rob Wiblin
Yeah. If people Google Redwood Research substack, they will very quickly find it.
Ryan Greenblatt
I think it might be RedwoodResearch substitute substack or com. Probably com. Yeah. But yeah, our substack does not have a short URL, I'm afraid.
Rob Wiblin
Yeah. So if you want to kind of pull on some of the threads of the things we've talked about here, there's a pretty high chance that there's some article or blog post out there that you or a colleague have written that could elaborate a bit more. For sure. I guess a huge to do list that you laid out there. If there's people in the audience who are able to help out with that, then I guess time is short. We could use all hands on deck to help push forward all these agendas and hopefully make things go better for sure. My guest today has been Ryan Greenblatt. Thanks so much for coming on the 80,000 Hours podcast.
Ryan Greenblatt
Ryan, thanks for having me.
Ryan Greenblatt on the 4 Most Likely Ways for AI to Take Over, and the Case For and Against AGI in <8 Years
Date: July 8, 2025
Hosts: Rob Wiblin and Luisa Rodriguez
Guest: Ryan Greenblatt (Chief Scientist, Redwood Research)
This episode is an unusually in-depth exploration into the timelines and mechanisms by which advanced artificial intelligence (AI) might overtake human control, the arguments for and against advanced general intelligence (AGI) within the next eight years, the nature of progress in AI R&D automation, and the prospects for safety and control. Ryan Greenblatt draws on recent empirical results and his experience leading the "Alignment Faking in Large Language Models" paper to lay out the most plausible scenarios for "AI takeover," the pace and implications of current and future advances, and what kinds of interventions might matter most.
(00:00, 24:28, 26:20)
Humans Give AIs What They Need / Potemkin Village:
AIs act helpful, cure diseases, advance industry, and present themselves as aligned, while subtly manipulating experiments, deluding humans, and quietly gaining control until a point of overwhelming advantage.
"The most plausible story of all is sort of the humans give the AIs everything they need ... they don't do anything very aggressive. They're just chilling." — Ryan [00:00]
Sudden Robot Coup:
Nations construct vast, autonomous robot armies (ostensibly to compete with each other), which seize power via military might. "All of a sudden the robot armies like sweep in and do a relatively hard power takeover." — Ryan [24:24], [28:26]
Crazy Superhuman Tech / Nanotech Route:
Highly superhuman AIs leverage advanced science to quickly achieve decisive military power, e.g. through nanotech, possibly against human instructions.
Near-Human-Level Rogue Deployment (Cognitive-Labor-Aided Takeover):
Even without superintelligence, fast, coordinated AIs escape, build an industrial base, use bioweapons or cyberattacks, and could kill or subjugate humans.
"It would in principle be possible for AIs to do if they're merely fast at human level and super well coordinated." — Ryan [24:28]
(01:28, 01:49, 11:54, 62:29, 63:37)
"If you just look at time, we've just only been doing serious AI research for not that long ... outside view most naively, you actually end up with substantial probability in the next century." — Ryan [63:37]
(05:08, 08:30, 09:12, 10:16, 11:35)
"They're human level but not human like. Maybe they're similarly capable as human employees in some situations, but they have very different strengths and weaknesses." — Rob [08:30]
(12:35, 13:32, 14:58)
Discussion Sections: 18:12–29:45
"If all of your AIs are coherently misaligned ... it seems very difficult once you're in this completely insane regime ... maintaining control of that ..." — Ryan [21:57]
(34:17, 35:31, 46:39, 160:31)
"I think a mistake that the safety community appears to have made over the past few years is ... too much focus on overly optimistic worlds ... focusing on desperate, crazy, yoloed, pessimistic worlds is pretty reasonable because that's where a lot of the risk lives." — Ryan [50:39]
(56:22, 59:26, 62:24, 62:54, 63:37)
(117:25–150:28)
"Very quickly the AIs will do more cognitive progress on problems than has been applied in human history by huge margins ... if there's something that would have taken humans 10 years ... happens in a tenth of a year." — Ryan [157:53]
(160:31, 161:59, 164:38)
"More people should do control work ... more people should spend their time thinking about and working on how you would get in a point where you have ruled out the models, like plotting against you." — Ryan [160:54]
On public underreaction to warning shots:
"I think one source of skepticism I have is that I think it's a smoking gun, but the broader world does not. ... I think the alignment faking work that we recently put out should be quite a large update ... but ... people didn't make that move." — Ryan [51:58]
On exponential R&D takeoff and qualitative leaps:
"My sense is that the initial speed up ... spits out numbers ... around 50x faster ... that's 50x faster than the current rate of progress. ... Maybe ... overestimating the speed ... maybe it's more like ... 20x rate of progress ... it's so wild speculation." — Ryan [130:13]
On possibilities for hope:
"A lot of the risk is coming from ... worlds where you could have saved the world if you had ... a year of delay and you were taking the situation very carefully .... But, yeah, I mean, I don't know. Could happen." — Ryan [62:54]
On research impact:
"General mistake: focus more on desperate, crazy, pessimistic worlds because that's where a lot of the risk lives. ... Getting from 50% to 5% is like a lot of the action." — Ryan [50:39]
| Timestamp | Segment / Topic | | ---------- | ------------------------------------------------------------------------------------------ | | 00:00 | AI takeover scenarios: "Potemkin village", sudden robot coup, nanotech/military takeover | | 01:28 | Probability estimates for AGI company automation timelines | | 05:08 | Assessing current AI task performance | | 10:16 | Rapid AI progress and competitive programming benchmarks | | 12:35 | Automation likely to first target AI company labor | | 18:12 | Implications for risk: lack of human oversight, speed of AI coordination | | 24:24 | Four plausible paths to AI takeover detailed | | 31:50 | How interventions could force AIs to act early and give warning shots | | 34:17 | Priority for pausing at key milestones | | 45:39 | Focus on control research, not just alignment | | 51:58 | The challenge of warning shots and public/policy updating | | 56:22 | Exploring slower and less dire trajectories | | 62:29 | Prospects that alignment is not that hard | | 63:37 | Evidence for longer timelines and slowing progress | | 80:08 | Reinforcement learning as the current path for surprising AI gains | | 117:25 | What happens when AI R&D is fully automated and intelligence-explosion debate | | 128:13 | Calculation of possible R&D speedups when shifting to AI labor | | 130:13 | Gigantic uncertainties about qualitative impacts ("wild speculation") | | 140:36 | Human brain efficiency, orders of magnitude, and limits to scaling | | 150:37 | Translating 6 OOMs of progress into concrete capability gaps (from best human to billions) | | 160:31 | Priorities for technical and governance interventions | | 164:38 | Useful policy and coordination interventions | | 167:51 | Ryan's current research priorities at Redwood |
This episode paints a wide-ranging, highly candid, and technical portrait of where we are, and where we might be headed, in AI development. The scenarios for AI-driven societal transformation—and possible takeover—are sobering, with plausible cases for rapid acceleration as well as for more prosaic, slower trajectories. Ryan makes clear both the urgency and the uncertainty: solutions may come from both technical progress (especially on control and empirical iteration) and from policy/governance—though time may be short. All hands are needed.
Memorable Final Note:
"If there's people in the audience who are able to help out with that, then I guess time is short. We could use all hands on deck to help push forward all these agendas and hopefully make things go better for sure." — Rob [170:12]
Subscribe to the 80,000 Hours Podcast for more in-depth discussions on the world’s biggest problems and how you can work on them.