
Loading summary
Ajeya Kocha
If you look at public communications from at least OpenAI, Anthropic and Google DeepMind in all of their stated safety plans, you see this element of as AIs get better and better, they're going to incorporate the AIs themselves into their safety plans more and more. How to create a setup where we use control techniques and alignment techniques and interpretability to the point where we feel good about relying on their outputs is like a crucial step to figure out because it either like bottlenecks our progress because we're checking on everything all the time and slowing things down, or it doesn't bottleneck our progress, but we hand the AIs the power to take over.
Rob Wiblin
Today I'm speaking with Ajeya Kotra. Ajeya is a senior advisor at Open philanthropy where in 2024 she led their technical AI safety grantmaking more generally. She's been doing AI related research and strategy since 2018 and has become very influential in AI circles for her work on timelines, capability evaluations and and threat modeling. Thanks so much for coming back on the show, Ajeya.
Ajeya Kocha
Thank you so much for having me.
Rob Wiblin
So doing this interview gave me a chance to go back and listen to the interview that we did that we recorded I guess two and a half years ago. And I have to say you were very on the ball or there was a lot of issues that came up in that conversation that you were bringing to people's attention that I think in the subsequent two and a half years seem like a much, much bigger deal now. You talked about meters evaluating autonomous capabilities, a line of research that's gone on to become super influential, very widely read, I think in policy circles. You talked about using probes to monitor and shut down dangerous conversations, something that's pretty standard practice and maybe one of the potentially most useful outputs from mechanistic interpretability. You talked about the importance of using chain of thought and scratch pads to monitor what AIs are doing and why. Still probably the dominant technique. You talked about the growing situational awareness of AI models and the resulting possibility of deceptive alignment, something that's now completely mainstream topic. You talked about how when you train models to age in bad behavior, they don't necessarily just learn to become honest, they also learn to just hide their misbehavior better. Something that is, I guess research is kind of borne out, does really happen, and is a big concern. You talked about how you expected models to get schemier as they get smarter, especially once we inserted reinforcement learning back into the mix. Something that's definitely happened. And you talked a bunch about sycophancy, how you thought models might end up just flattering people rather than giving accurate information, because that's kind of something that we enjoy. So. So I feel like you didn't come up with all of these ideas or anything like that, but I think you're ahead of the curve and maybe we'll get some ahead of the curve ideas in listen to view as well, hopefully.
Ajeya Kocha
Thank you.
Rob Wiblin
So you think that a key driver of disagreements about kind of everything to do with AI is people's different views on how likely AGI is to speed up science and technology and I guess physical infrastructure and manufacturing. Why is that?
Ajeya Kocha
Yeah, so I think a thing that I've been noticing as the concept of AGI has become more and more mainstream is that it's also become more and more watered down. So last year I was on a panel about the future of AI at DealBook in New York, and it was me and one or two other folks who kind of think about things from a safety perspective. And then a number of venture capitalists and technologists. And the moderator asked at the very beginning of the panel whether we thought it was more likely than not that by 2030 we would get AGI defined as AIs that can do everything humans can do. Seven or eight hands went up, not including mine, because my timelines are somewhat longer than that. But then he asked a follow up question a couple of questions later about whether we thought that AI would create more jobs or destroy more jobs over the following 10 years. 2030 was five years and seven out of 10 people thought that we would have AGI by 2030. But then it turned out that eight out of 10 people, not including me, thought that AI would create more jobs than it destroyed over the next 10 years. And I was a little confused. I was like, why is it that you think we will have AI that can do absolutely everything that the best human experts can do in five years, but will actually end up creating more jobs than it destroys in the following 10 years?
Rob Wiblin
What's the sudden tension in this view?
Ajeya Kocha
And when I poked some people later in the panel about that seeming tension and they really quickly backed off and they said, oh, what does AGI really mean? The moderator had defined it as this very extreme thing, but they were like, we kind of already have AGI. People keep moving the goalposts, we keep making cool new products and people aren't accepting that it's AGI and they aspire to something higher. And I thought that was funny. Because the old school, singular futurist definition of AGI is this very extreme thing. But I think VCs have an instinct to call something AGI that is like GPT5 is AGI or something just much milder. And so I think this creates a situation where people feel like they've gotten a lot of evidence that AGI isn't a very big deal and doesn't change much because we already have AGI or we're going to have it next year, or. Or we got it like two years ago and look around us, nothing much is changing. And so I think there's this expectation where whether or not we get AGI in the next few years, a lot of people are starting to not really care about that question. They still expect the next 25 years or the next 50 years to play out kind of like the last 25 years or the last 50 years, where there's a lot of technological change between 2000 and 2025, but it's like a moderate amount of change. And they kind of expect that in 2050 there will be a similar amount of change as there was between 2000 and 2025. Even if they think that we're going to get AGI in 2030, they think AGI is just what's going to drive that continued mild improvement. Whereas I think that there's a pretty good chance that by 2050 the world will look as different from today as today does from the hunter gatherer era. It's 10,000 years of progress rather than 25 years of progress driven by AI automating all intellectual activity.
Rob Wiblin
Yeah, I guess you've hinted at the fact that there is an enormously wide range of views on this, but can you give us a sense of just how large the spectrum is and what the picture looks like on either end?
Ajeya Kocha
Yeah. So I would say on the sort of standard mainstream view, if you ask a normal person on the street what 2050 will look like, or if you ask a standard mainstream economist, I think they would think, well, the population is a little bit bigger, we have somewhat better technologies, Maybe they have a few pet technologies that they're most interested in. Maybe we have this one or that one, slightly better medicine, people live slightly longer. Yeah, it's an amount of change that's extremely manageable. I think on the far extreme, from there, on the other side, is a view described in if anyone builds it, everyone dies. Where in that worldview, at some point, probably pretty unpredictably, we sort of crack the code to extreme superintelligence. We invent a technology that rather suddenly goes from being like GPT5 and GPT6 and so on, to being so much smarter than us that we're like cats or mice or ants compared to this thing's intelligence. And then that thing can really immediately have really extreme impacts on the physical world. The classical canonical example here being inventing nanotechnology. So the ability to precisely manufacture things that are really, really tiny and can replicate themselves really, really quickly and can do all sorts of things and can move, inventing space probes close to the speed of light and things like that. I think there's a whole spectrum in between where people think that we are going to get to a world where we have technologies approaching their physical limits, we have spaceships approaching the speed of light, and we have self replicating entities that replicate as quickly as bacteria while also doing useful things for us. But we're going to have to go through intermediate stages before getting there. But I think something that unites all of the people who are sort of AI futurists and concerned about AI X risk is that they think in the coming decades we're likely to get this level of extreme technological progress driven by AI.
Rob Wiblin
How strong is the correlation between how much someone expects AI or AGI to speed up, science research in particular, and I guess like physical industry as well, and how likely they think it is to go poorly, or how nervous they are about the whole prospect?
Ajeya Kocha
I think it's a very strong correlation. I've found often that people who, reasonable people who are AI accelerationists tend to think that the default course of how AI is developed and deployed in the world is very, very, very slow and gradual. And they think that we should cut some red tape to make it go at a little bit more of a reasonable pace. And people who are worried about X risk think that the default course of AI is this extremely explosive thing where it overturns society on all dimensions at once in maybe a year or maybe five years or maybe six months or maybe a week. And they're saying, oh, we should slow it down to take 10 years maybe. And meanwhile, the sort of accelerationists think that by default diffusing and capturing the benefits of AI will take 50 years or 100 years, and they want to speed it up to take 35 years.
Rob Wiblin
It's quite interesting that I guess people who radically differ in their policy prescriptions might be targeting, aiming for the same level of speed actually, that maybe they wanted this period to take 10 years or 20 years. That's what both of them want. But they just think it's going to. Their baseline is so different, so they're pushing in completely opposite directions. What's your kind of modal expectation? What do you think is the most likely impact for it to have?
Ajeya Kocha
I think that probably in the early 2000-30s we are going to see what Ryan Greenblatt calls top human expert dominating AI, which is an AI system that can do tasks that you can do remotely from a computer better than any human expert. So it's better at remote virology tasks than the best virologists, better at remote software engineering tasks than the best software engineers, and so on for all the different domains. By that time, I feel like probably the world has already accelerated and changed and sort of narrower and weaker AI systems have already penetrated in a bunch of places and we're looking at a pretty different world. But at that point I think things can go much, much faster because I think top human expert dominating AIs in the cognitive domain could probably use human physical labor to build robotic physical actuators for, for themselves. That would be one of the things that whether the AIs are sort of have already taken over and are acting on their own, or whether humans are still in control of the AIs, I think that would be a goal they would have of automating the physical as well. And I think I have pretty wide uncertainty on exactly how hard that'll be. But whenever I check in on the field of robotics, I actually feel like robotics is progressing pretty quickly and it's taking off for the same reasons that sort of cognitive AI is taking off. It's like large models, lots of data imitation. Large scale is helping robotics a lot. So I imagine that you can pretty quickly, maybe within a year, maybe within a couple years, get to the point where these superhuman AIs are controlling a bunch of physical actuators that allow them to sort of close the loop of making more of themselves, doing all the work required to run the factories that print out the chips that then run the AIs and doing all the repair work on that and gathering the raw materials on that.
Rob Wiblin
So you're saying you're expecting in the2030s it won't just be that these AI models are capable of automating computer based R and D, but they'll also be able to lead on the project of building fabricators that produce the chips that they run on. And so that's another kind of positive feedback loop.
Ajeya Kocha
Yeah. So I really Recommend the post 3 types of intelligence explosion that's by Tom Davidson on Forethought, where he makes the point that we talk a lot about the sort of promise and the danger of AIs automating AI R and D and automating the process of making better AIs, but that's only one feedback loop that is required to fully close the loop of making more AIs, because we're talking about software that makes the transformer architecture slightly more efficient or gathers better data to train the AIs on. But AIs are also running on chips which are printed in these chip factories at Nvidia. And those factories have machines that are built by other machines that are built by other machines and ultimately go down to raw materials. And I think that something we don't talk about very much because it'll happen afterward, is how hard it would be for the AIs to automate that entire stack, the full stack, and not just the software stack.
Rob Wiblin
So I guess the range of expectations that exist among sensible, thoughtful people who've engaged with this on how much at peak, how much is AGI going to speed up economic growth? It ranges from people who say the it will speed up economic growth by 0.3 percentage points. So it'll be 15% increase or something on current rates of economic growth. And I'd be very happy if it was that good.
Ajeya Kocha
Yeah.
Rob Wiblin
To people who say at peak, the economy will be growing at 1000% a year or higher than that, thousands of percent a year. So it's like 100 or 1000 or a 10,000 fold disagreement basically on the likely impact that this is going to have. It's an almost like unfathomable degree of disagreement among people who it's not as if they've thought about this independently and they haven't had a chance to talk. They've spoken about this, they've shared their reasons and they don't change their mind and they disagree by a thousand fold. Impact, I guess you've made it part of your, I guess, mission in life the last couple of years to have really sincere, intellectually engaged, curious conversations with people across the full spectrum. Why do you think it is that this disagreement is able to be maintained?
Ajeya Kocha
Yeah, I feel like at the end of the day, the different parties tend to lean on two different pretty simple priors or simple outside views that are different outside views. I would say that the group that expects things to be a lot slower tends to lean on. Well, for the last 100, 150 years in frontier economies, we've seen 2% growth. And think of the technological change that has occurred over the last 100 or 150 years, we went from having very little electricity was just an idea to everywhere was electrified. We had the washing machine and the television, the radio. All these things happened, computers happened. In this period of time, none of these show up as like a, an uptick in economic growth. And I think there's this stylized fact that mainstream economists really like to cite, which is that new technology is sort of the engine that sustains 2% growth. And in the absence of that new technology, growth would have slowed. And so they're kind of like, this is how new technologies always are. People think that they're going to lead to a productivity boom, but you never see them in the statistics. You didn't see the radio, you didn't see the television, you didn't see the computer, you didn't see the Internet, and you're not going to see AI. AI might be really cool. It might be the next thing that lets us on track, chugging along. And that's one perspective. It's an outside view they keep returning to. And also just maybe a somewhat more generalized thing, which is things are just always hard and slow, you know, just like way harder and slower than you think. You know, it's like, what was it like? Murphy's. Not Murphy's Law.
Rob Wiblin
Murphy's Law. Because anything that can go wrong will go wrong.
Ajeya Kocha
Yeah, anything wrong.
Rob Wiblin
I think this is our experience in our personal lives that it's awfully hard to achieve things at work, things that to other people might seem so straightforward. And they're like, why haven't you finished this yet? And you're like, well, I could give you a very long list.
Ajeya Kocha
Or like Hofstadter's Law. And it always takes longer than you think, even when you take Hofstadter's Law into account. Or like the programmer's credo, this is my favorite one. It's like we do these things not because they are easy, but because we thought they would be easy. So there's just this whole cloud of it's naivete to think that things can go crazy fast. If you write down a story that seems perfect and unassailable for how things will be super easy and fast. There's all sorts of bottlenecks and all sorts of drag factors you inevitably failed to account for in that story. It's like that's kind of like that perspective. And then I think the alternative perspective leans a lot on much longer term economic history. So if you attempt to try and assign reasonable GDP measures to the last 10,000 years of human history. You see acceleration. So the growth rate was not always 2% per year at the Frontier. 2% per year is actually blisteringly fast compared to what it was in like 3000 BC, which is like maybe that was like 0.1% per year. So the growth rate has already multiplied like many fold, maybe in order of magnitude, maybe 2. I think the like people in the slower camp tend to feel like the exercise of doing like long run historical data is just like too fraught to rely upon. But people in both camps do agree that the industrial revolution happened and the industrial revolution accelerated growth rates a lot. And we went from having growth rates that were well below 1% to having 2% a year growth rates. And I think that people in the faster camp tend to lean on the long run and on models that say that the reason that we had accelerating growth in the long run was a feedback loop where more people can try out more ideas and discover more innovations, which then leads to food production being more efficient, which then leads to a larger supportable population. And then you can rinse and repeat and you get super exponential population growth. And then that perspective says that AIs. If you can slot in AIs to replace not just the cognitive, but the cognitive and the physical, the entire package, and close the full loop of AIs doing everything needed to make more AIs or AIs and robots doing everything needed to make more AIs and robots, then there's no reason to think that like 2% is some sort of physical law of the universe. They can grow as fast as their physical constraints allow them to grow, which are not necessarily the same as the constraints that keep human driven growth at 2%.
Rob Wiblin
So that's the justification that they provide for their perspective in broad strokes. But why is it that even after communicating this at great length to one another, they don't kind of converge on uncertainty or saying interest. It'll be something in the middle because there's competing factors that they just continue to be reasonably confident about. Quite different, I guess, narratives about how things will go.
Ajeya Kocha
Yeah, I'm honestly not sure. I think maybe one part of it is that. So I guess I'm partial to the things will be crazier side of things. So I'm not sure I'll be able to give a perfectly balanced account. But I feel like one thing I've noticed on the side in terms of people who think it'll be slower is that their worldview kind of has A built in error theory of people who think things will go faster. So the worldview is not just things will keep ticking along, but everyone thinks there will always be some big new revolution.
Rob Wiblin
Everyone's always expected to spiral everyone almost.
Ajeya Kocha
Every time and they've always been wrong. So there's that dynamic which is like, it's a, like, you know, from their point of view, I think it's totally reasonable. It's like kind of like even if there isn't some super knockdown argument in the terms of your interlocutor where you can like point to a mistake that they'll accept, or even if you kind of look at the story and think it's kind of plausible, you still have this strong prior that like someone could.
Rob Wiblin
Have made the same argument in the.
Ajeya Kocha
Past, someone could have made the kind of the same argument about television, someone could have made the same argument about computers. None of these played out. So I think that's a big factor. I also think there hasn't been like, these are complicated ideas and there hasn't been that much dialogue. And I think there could be more and I think there could be more dialogue that is trying to ground things in near term observations also. But yeah, I think that's a big part of it. I think they have an error theory built in that makes it so that the object level conversation about okay, here's how the AI could make the robots and here's how the robots could bootstrap into more robots and so on. That whole way of thinking doesn't feel very legitimate or interesting. Or they sort of have a story where that type of thinking always leads to a bias towards expecting things to go faster than they actually will. Because it's hard for that kind of thinking to account for all the drag factors in all the bottlenecks. Whereas I think on the other side people who think things will go faster feel like everyone is always kind of like blanket assuming that there are going to be bottlenecks and then they bring up specific bottlenecks and those specific bottlenecks when you look into them don't seem they might slow things down from some sort of absolute peak of 1000% growth. But they're not reasons to think that 2% is where the ceiling is or even that 10% is where the ceiling is. So they also have this, this kind of error theory of the bottleneck subjection.
Rob Wiblin
So it's incredibly decision relevant to figure out who is right here. I think almost all of the parties to this conversation, if they completely changed their view and the People who thought it was going to be 1000% decided it was going to be 0.3%. They would probably change what they're working on, although they think it was like a decisive consideration, probably against everything that they were doing previously and vice versa. If people came to think that there would be 1000% speed up, then they'd probably be a whole lot more nervous and interested in different kinds of projects. So how can we potentially get more of a heads up ahead of time about which way things are going to go? I guess it seems like sharing theoretical arguments hasn't been persuasive to people. Is there any kind of empirics that we could collect as early as possible?
Ajeya Kocha
So one thing that I think will not address all of this, but is a step in the right direction is really characterizing how and why and if AI is speeding up software and AI R&D. So Meter came out with an uplift rct, which I think was the first of its kind, or at least the largest and highest quality, where they had software developers split into two groups. One group was allowed to use AI, the other group was disallowed from using AI. And they studied how quickly those developers solved issues like tasks on their to do list. It actually turned out that in this case AI slowed down their performance, which I thought was interesting. I don't expect that to remain true, but I'm glad we're starting to collect this data now and I'm glad we're starting to sort of cross check between benchmark style evaluations where AIs are given a bunch of tasks and sort of scored in an automated way and evidence we can get about actual like in context, real world speedups. So I really want to get a lot more evidence about that. Of all kinds, like big uplift RCTs. It would be great if companies were into internally conducting RCTs on their own rollouts of internal products to see are teams that get the latest AI product earlier more productive than teams that don't Even self report, which I think has a lot of limitations, is still something we should be gathering. So I guess my high level formula would be look at the places where adoption has penetrated the most and start to measure speed up in actual output variables. I think it would be really cool if there was a solar panel manufacturing plant that had really adopted AI and we started to see how much more quickly they could manufacture solar panels or how much better they could make solar panels.
Rob Wiblin
Yeah. Is it possible to do this at the chip manufacturing level? I guess Maybe that's the most difficult manufacturing that there is, more or less. So we might think that AI, you get more of an early heads up if you do something that's more straightforward, like solar panels, but would really like to be monitoring across all kinds of different manufacturing. How much difference is any of this making?
Ajeya Kocha
I think the most important thing or the thing I ultimately care about is the AI stack. So chip design, chip manufacturing, manufacturing the equipment that manufactures chips, and then of course the software piece of it too. The software piece is the earliest piece, but I think we should be monitoring degree of AI adoption, self reported AI acceleration, RCTs, anything we can get our hands on for the entire stack. Because I think the moment when the AI futurists think things are likely to be going much, much faster sort of coincides with when AI has fully automated the process of making more AI. So that's really something to watch out for. And then I think like, but on a separate track, you also want to just be looking at the earliest power users, no matter where they are, just because you can get insight that transfers to these domains.
Rob Wiblin
Is there anything else we can do?
Ajeya Kocha
I don't know. I'm really curious about this.
Rob Wiblin
Do I understand, right, that last year you put out a request for proposals. You were at open field looking to fund people who had ideas for how would we resolve this question?
Ajeya Kocha
Yeah, so I put out a pair of requests for proposals in late 2023. One of them was on building difficult realistic benchmarks for AI agents. So at the time, very few people were working with AI agents and only a couple of agentic benchmarks had come out, including Meter's benchmark that I discussed on the show last time. I was really excited about it. It felt like it was a moment to move on from giving LLMs multiple choice tests to giving them real tasks like book me a flight or make this piece of software work, write tests, run the tests, iterate until the thing actually works. That was a very new idea at the time, but also the time was right for that idea. And there were a lot of academic researchers who were excited about moving into the space. So we got a lot of applications for that ARM of our request for proposals and we funded a bunch of cool benchmarks, including Cybench, which is a cyber offense benchmark that's used in a lot of standard evaluations now. But then we also had this other arm which was basically types of evidence other than benchmarks like surveys, RCTs, all the things we talked about, we got much less interest for that And I think it just reflects that it's harder to think of good ways to measure things outside of benchmarks, even though everyone agrees benchmarks have major weaknesses and consistently overestimate real world performance. Because benchmarks are sort of like clean and contained and the real world is messy and open ended. But one thing that I'm excited about that came out of the second RFP is that Forecasting Research Institute is running this panel called LEAP, which is the Longitudinal Experts on AI panel, where they just take like 100 or 200 AI experts, economists and superforecasters and have them answer a bunch of granular questions about where AI is going to be in the next six months, in the next year, in the next five years, both benchmark scores, but also things like will companies report that they're slowing down hiring because of AI, or will an AI be able to plan an event in the real world or these kinds of things. So I'm very excited about that and I think honestly having people make subjective predictions, explain how those predictions are connected to their longer run worldviews and then check over time who's right might be the most flexible tool we have. So I'm very excited to see where LEAP goes, But I think it is challenging to get indicators that are clearly early warnings so that we can actually do something about it if the people who are more concerned are right, but that are also clearly valid and not easy to dismiss on the other side as just not realistic enough to matter.
Rob Wiblin
So as part of this you've been thinking about, I guess one way that this could really go wrong is if the companies that are developing cutting edge AI may know, may begin to see themselves internally, how much it's helping them and that perhaps it's speeding them up enormously. But they may not decide not to share that information with the rest of.
Ajeya Kocha
The world and they may decide not to release those products. If there's one company that's well ahead of the others, then like in AI 2027, it was sort of depicted that the company that was ahead in the AI race was so far ahead of its competitors that it could afford to just keep its best stuff internal and only release sort of less good products to the rest of the world.
Rob Wiblin
It could afford it in the sense of it didn't need to make money by selling the product.
Ajeya Kocha
Its competitors were far enough behind that they couldn't undercut it or compete with it by releasing a better product. In the story, the company in the lead open mind is basically just releasing products that are slightly better than the state of the art of its competitors.
Rob Wiblin
I think they're so far ahead that they can just choose to always basically have their product be somewhat better. They can just release whatever level of their own internal machine would be the best to the external world. Yeah, okay. But I guess it would be unfortunate if there are people who do know this, but the broader world doesn't get a heads up. And so we could have known six months or a year earlier in what direction things were going, but that was kept secret. I mean, I guess maybe for the leading AI company, I'd prefer to keep it secret, but for the rest of it, I suppose would probably prefer that the government has some idea what's going on. So you've been thinking about what sort of transparency requirements could be put in place that would require people, require the companies to release information that would give the rest of us clues as to where things are going well. What sort of transparency requirements could those be?
Ajeya Kocha
Yeah. So I think there's a whole spectrum of evidence about AI capabilities where on the one hand, the easiest to test but the least informative is benchmark results. Companies do release benchmark results when they release models. Right now they say Claude Opus 4 was released and they have a model card that says it has this score on this hacking benchmark, it has this score on the software engineering benchmark, and so on, as part of a report about whether it's dangerous or GPT5 had the same thing. I think that that's great that they do that. But in my ideal world, they would release their highest internal benchmark score at some calendar time cadence. So every three months they would say we've achieved this level score on this hacking benchmark, this level score on software engineering benchmark, this score on an autonomy benchmark. And that's because, as you said, danger could manifest from purely internal deployment. Because if they have an AI agent that's sufficiently good at AI R&D, they could use that to go much faster internally and then other capabilities and therefore other risks might come online much faster than people were previously expecting. So it's not ideal to have your report card for the model come out when you release it to the public, unless there's some sort of guarantee that you're not sitting on a product that's like sufficient, that's substantially more powerful than the public product. So maybe it's fine to release your model card and system card along with the product if you also separately have a guarantee that you won't have too much of a gap between the internal and the External. So that's on the end of things that are currently discussed. It's how I would, how I would tweak information that's currently reported to be somewhat more helpful for this concern. But then there's a bunch of other stuff that is not currently reported that I think ideally it would be really great to know stuff like how much and how are they using AI systems internally? One thing I'm very interested in is companies will sometimes report to brag about the percentage of lines of code that are written by their AI systems. Various CEOs have said internally, 90% of our lines of code are written by AIs and things like that. I think it would be great to have systematic reporting of those kinds of metrics. But those metrics aren't the ideal metric I'd be interested in. So one thing I'm interested in is what fraction of pull requests to your internal code base were mostly written by AI and mostly reviewed by AI. So AI is like humans are not involved for the most part in both sides of this equation. And I'd be very interested in watching that number climb up because I think it's an indication both of AI capabilities and of how much deference they're giving to AIs. And eventually, if things are going to go crazy fast, the AIs have to be doing most things, including most management and approval and review. Because if humans have to do that stuff, then things can only go so fast. So I really want to track how much higher level decision making authority is being given to the AIs in practice inside the companies. Yeah, I think there are probably a bunch of other things that we could send basically as a survey. How much do you use AIs for this type of thing? For that type of thing? How much speed up do you get subjectively do you think you get if you're running any internal RCTs? I would of course love to know the results of that.
Rob Wiblin
What about just requirements that in as much as they're training future generations of AI models, they have to reveal to at least some people in the government how they're performing on normal evals of capabilities. So they can kind of see the line going up even if they're not releasing it as products for whatever reason. And if the line starts, if the benchmarks start going like curving upwards far above previous expectations, then that could lead them to sound the alarm.
Ajeya Kocha
Yeah, I think that is a good thing to do. But I don't think that just benchmarks alone will actually lead anyone to sound the alarm. Because the thing with benchmarks is that.
Rob Wiblin
They saturate, but they always have that S curve shape.
Ajeya Kocha
They always have the S curve shape. And the benchmarks we have right now are harder than the previous generation of benchmarks. But it's still far from the case that I feel confident that if your AI gets 100% score on all these benchmarks, then it's like a threat to the world and it could take over the world. I still think the benchmarks we have right now are well below that. So what's probably going to happen is that these benchmarks are going to get saturated, then there's going to be a next generation of benchmarks people make, and then those benchmarks are going to tick up and then get saturated. So I think we need some kind of real world measure before we can start sounding the alarm. And then the ultimate real world measure is actually just observed productivity. Right. Like if they are seeing internally that they're discovering insights faster than they were before, then that's a very late but also very clear signal. And that's the point at which they should definitely sound the alarm and we should sort of know what's happening. So, yeah, yeah.
Rob Wiblin
How is this idea being received by the companies? I mean, on the one hand, it seems like transparency requirements is the regulatory instrument that the companies have objected to the least. It's the one that they've been most willing to tolerate. On the other hand, the whole message of this is we don't trust you to share information with the rest of the world and we think that you might screw us over basically by rushing ahead and deliberately concealing that. I could imagine that that could be a little bit offensive to them, or at least if that is their plan, then they probably want to find some excuse for not having this kind of oversight.
Ajeya Kocha
Yeah, I think that the response just tends to differ based on the actual information that's being asked for. So benchmark scores they already release, like I said, they release it at the point of releasing a product, which I think is fine for now, But I would like to move it to a regime where they release benchmark scores at some sort of fixed cadence, even if they don't have a product release. Benchmark scores are not considered sensitive information. But this other stuff that I think is a lot more informative on the margin is much more fraught. They don't necessarily want to share with the world the rate at which they're gaining algorithmic insights, because you want to maintain some mystery about that. For competitive reasons, it's risky for you if it's a little bit too fast because then like, I don't know, competitors will start paying more attention to you and like trying, trying to copy you and trying to find out what's going on. It's also risky for you if it's like too slow, but because then that's kind of embarrassing.
Rob Wiblin
Investors lose heart.
Ajeya Kocha
Yeah, investors lose heart. And another thing I didn't mention earlier is that I would really like them to be reporting their most concerning misalignment related safety incidents. So has it ever been the case that in real life use within the company the model lied about something important and covered up the logs? I really want to know that. But then of course it's clear that reporting that is very embarrassing to companies. So one thing that might help here is that there are a number of companies now, so perhaps they could report, they could report their individual data to some sort of third party aggregator that then reports out an anonymized overall industry aggregate score. But I don't think that solves all the issues because there are few enough of them that people would be able to guess. So I think there's a lot of competitive challenges and IP sensitivity challenges and just PR challenges to overcome here with some of the more penetrating internal information. But I think it's important enough to the public interest that we should try and find a way to navigate that.
Rob Wiblin
Yeah. So it's not unusual for government agencies to be able to basically demand commercially sensitive information from companies for regulatory or governance purposes. I actually worked at one when I was in the Australian government. I was at the Productivity Commission, which had extraordinary subpoena powers to basically demand almost any documents from any company in the country. I rarely use power and it wasn't the only agency that had that capability.
Ajeya Kocha
And what kinds of things would you ask them?
Rob Wiblin
I never actually saw this power being used. It was a kind of, I guess people were proud of the fact that we had that authority, but I think you would usually do it for competition reasons, trying to tell whether companies are colluding potentially or whether there's an insufficient degree of market competition and there would be a reason to intervene. And I would imagine almost certainly there's government agencies in the US that have a similar remit. And so if they actually could keep that kind of information secret, then maybe the companies will be more happy to share it with people who are specialized basically in reading this, comprehending this data and figuring out what to do with it.
Ajeya Kocha
Yeah, I think that could be a solution, but I'm a little skeptical. So I think that releasing this information publicly is probably a lot better than releasing it just to a government body, basically because we're building the plane of AI safety research like as we're flying it. And it's not like there's a box checking exercise that any kind of government agency that's like often understaffed, especially with like technical staff, could do. It's more like we want this information out there in the open and then we want people to do like some involved analyses of it. And like our sense of what information we even want is probably going to be like shifting over time. And it'll probably go better if there's like a robust kind of external scientific conversation about what indicators we want to see and what that would mean and when we should trigger alarm. And if that's all being routed through governments with 10 people or even 50 people who have to deal with it, I think it would be very hard for them to interpret the evidence quickly enough and well enough and be confident enough to sound the alarm and then have people actually listen to them. Like if I imagine sounding the alarm on something like the intelligence explosion, I kind of picture it having to be like a society wide conversation, kind of like sounding the alarm about COVID or like something I have in my mind is like when Joe Biden had that disastrous debate performance that led to like weeks of conversation that ultimately led to him being removed from the ticket. It would have been very hard, I think, for a small narrow group of people sort of entrusted with the authority to make the same thing happen.
Rob Wiblin
Because I guess you want common knowledge and you want lots of attention focused on the issue as well as just some technocrats being aware, as well as.
Ajeya Kocha
The opportunity for a bunch of technical experts who may not be paying that much attention now because maybe they think this stuff is all science fiction to jump in at that moment and offer their takes. And I think it would be very powerful if someone like Arvind Narayanan, who's known for being very skeptical of these stories, actually looked at the data, changed his mind and said, oh yeah, this is happening now and it's dangerous and it's very hard to get those kinds of common knowledge dynamics if everything is just sent to governments. That said, of course, I think sending things to governments is better than not sending it anywhere. So I also think that's good.
Rob Wiblin
So inasmuch as the Plan A would be, we want them to be sharing this information such that anyone in the public can find out. I guess they'll probably resist this to any legislation imposing this to some extent and I guess for partially legitimate reasons that it is probably going to be frustrating for them. How high on the list of. Inasmuch as people are trying to set priorities for what sort of asks do you make and which sort of fights do you pick? Would this be like very high on the list for you?
Ajeya Kocha
I think I laid out a whole spectrum of ideal kind of like information sharing practices and I don't think going all or nothing on that whole package is like a top priority fight to pick, but I think sort of the algorithm of thinking really hard about what pieces of information we would want to know in order to know for ourselves if the intelligence explosion was happening and sort of getting the highest value items on that list or the biggest bang for buck items on that list to me feels very high. And I think that's the strategy that people working on AI safety related legislation have landed on. So the RAISE act in New York and SB 53 in California are both quite, quite transparency oriented and both oriented around for example, like whistleblower protections, which are like an important sort of policy plank underlying transparency.
Rob Wiblin
Do you think that information about an emerging intelligence explosion might just leak out to the public anyway because staff at the companies would feel uncomfortable with that proceeding in secret?
Ajeya Kocha
I think that's very plausible. I still think that information that leaks in the form of rumors in San Francisco tech bro parties doesn't have the ability to impact policy and decision making all the way in D.C. or London or Brussels in the same way as information that is just clearly unrefuted and very salient and, and sort of official. So I mean, I think that the AI safety scene in the Bay Area has benefited from having close social ties to people who work at AI companies getting a sense of what might be coming around the corner. But that's not something you can just, that's not something that you can use to really pull an alarm or advocate for very costly actions. So I think it like isn't really enough. We need more.
Rob Wiblin
So let's imagine that via whatever mechanism, society does get a heads up that we are starting to see the early stages of an intelligence explosion. What would we do with that heads up?
Ajeya Kocha
Yeah. So I think one just extremely important factor is at that point in time, how good are AI systems at everything besides AI, R&D? So the alarm has sounded and we learned that AI has fully or almost fully automated R and D at the leading AI lab, perhaps all the AI labs. This is causing those labs to go way faster than they were going with mostly human driven progress in the previous era. So at that point in time, whatever AI progress you thought was going to be made by default in the next 10 years or the next 20 years or the next 30 years might be made in a year or two or even six months, depending on how much AI is speeding everything up. So at this stage, AIs might not be that dangerous, but we might be about to move very quickly through the point in time where they're not so dangerous, to the point in time where they have sort of godlike abilities. And I think that what we want to do as a society, if we gain confidence that we're at the starting point of this intelligence explosion, is to redirect as much of that AI labor as we can from further AI R&D to things that could help protect us from future generations of AIs, both in terms of AI takeover risk and also in terms of a wide range of other problems that might be created for society by increasingly powerful AI. And at that point, it's still not in the sort of narrow, selfish interests of whichever company is in the lead to do that, because if they were to slow down unilaterally, then someone behind them could catch up. But hopefully, if the alarm has sounded and we have a clear picture of we have some six months or 12 months or 18 months until radical superintelligence, then this might be like a window of opportunity to coordinate, to use AIs for protective activities instead of further AI capability, acceleration.
Rob Wiblin
So the challenge we have is AI is becoming much smarter very quickly, and we feel very nervous about that. And the opportunity that's created is that, well, we have AIs a lot more labor, and we have much smarter potential researchers than we did before. So why don't we turn that new resource towards solving this problem that I guess at the moment we don't really know how to fix? It's a little bit. I guess, I think some people who are not too worried about AI, they look at society as a whole or they look at history, and they say, well, technology has enabled us to do all kinds of more destructive things, but we don't particularly feel like we're in a more precarious situation now or at much greater personal risk now than in 1900 or in 1800, because advances in destructive technology have been offset by advances in safety increasing technology. And on balance, probably things have gotten safer. And so the idea is, well, can we potentially, it's like going to be a vertiginous time, but perhaps we could pull off the same trick in this crunch time period.
Ajeya Kocha
Yeah. And I think that a lot of people who are more concerned about AI risk are very dismissive of this plan. It sort of sounds like a crazy plan. It's like really flying by the seat of your pants expecting the thing that's creating the problem to solve the problem. But in a sense, I do think humanity has repeatedly used sort of general purpose technologies that both like created problems to solve those problems. Like, you know, automobiles, something as mundane as that. Like, you know, cars created the opportunity for there to be carjackings and for there to be drive by shootings and for like, you know, it empowered bad actors in various ways. But of course, like, you know, if the police and law enforcement have cars as well, like that is, that is a balance. It's not like you're. When you imagine a future with some crazy new advanced technology and you imagine all the problems it creates, it can be hard to, with the same level of detail and fidelity, imagine all the responses to those problems that are also enabled by that technology. And so you could imagine someone worrying about the rise of fast vehicles and neglecting to think about how the fast vehicles would have all the ways that they cause bad things could be kept in check by people using vehicles for law enforcement and similar. And similarly with computers. You can hack things with computers, but computers also enable you to do a lot of automated monitoring for that kind of hack and automated vulnerability discovery. Yeah, different kinds of law enforcement. You couldn't imagine a police force not using computers. So I do think the basic principle is sound that if you're worried about problems created by technology, one of the first things on your mind should be how can you use whatever that new technology is to solve those problems. But I think that this is an especially narrow window to get this right. And you're not imagining cars creating broad based rapid acceleration of all sorts of new technologies and potentially just a 12 month window or 2 year window or 6 year window before everything goes totally crazy. So I do think that it's important to not blow through that window, to monitor as we're approaching it, and to monitor how long we have. But yeah, I think I'm fundamentally fairly optimistic about trying to use early transformative AI systems, early systems that automate a lot of things, to automate the process of controlling and aligning and managing risks from the next generation of systems who then automate the process of managing those risks from the generation after and so on.
Rob Wiblin
Yeah, it's interesting that you say that this approach has often been dismissed because I feel it's very in vogue now. I hear about this proposal every couple of days someone presents it or I read something about it in one guise or another. I guess one reason why years in the past it might have felt unpopular is people were mostly focused on the issue of misaligned AI. They were concerned about an AI that has it in for you and would like to take over if it had the opportunity. And that's maybe the worst application of this out of all of them, because they're your. You're asking the AI to align itself, but you don't know whether it's assisting you or trying to undermine you. And so you could try to make that work. People have suggested proposals where you could try to get useful, honest work out of an AI that doesn't want to help you. But it's a lot easier to see how you potentially solve problems other than alignment if you assume well. The alignment part we feel like we've got a good handle on. But there's a huge list of other problems that are being created during the intelligence experience explosion, like the fact that AI now, if people get access to it, could invent other kinds of destructive technologies that we don't yet have good countermeasures for. In that case, it's just clear how well the AI could just help you figure out what the countermeasures ought to be.
Ajeya Kocha
So I don't think that I agree with this. So I do think misalignment, the prospect that these early AIs, these early transformative AIs, are misaligned, is a huge obstacle to this plan that needs to be shored up and handled and specifically addressed. And I don't think that it necessarily bites harder for getting the AIs to do alignment research than for getting the AIs to do anything else helpful. Because if they have it out for you, they don't necessarily want to help you shore up your civilization's defenses. So if you're imagining trying to get a hardened misaligned AI to help you with biodefense if it's misaligned and it, for example, wants the option of threatening you with a bioweapon in its arsenal in the future, it would similarly have an incentive to do a bad job at that, as it would to do a bad job at alignment research in general. I think there's one big concern, which is will the AIs that we're trying to use at that point in time have motivations that give them incentives to undermine the work we're trying to get them to do. And I think they certainly would have incentives to undermine alignment research if they were misaligned. But I think they would also have incentives to undermine efforts to make ourselves more rational and thoughtful, like AI for epistemics. Because if we're more rational and thoughtful, then maybe we'll realize they're probably misaligned and that would be bad for them. They would also have incentive to undermine our DEFAC style defensive efforts because that would make it harder for them to take over.
Rob Wiblin
That makes sense. I think the distinction I was drawing is for people who thought that the alignment problem was extremely hard to solve and we were way off track to solving it. The idea of getting the AI to solve the problem is kind of self contradictory because, well, I wouldn't trust the AI at all. Anything that it proposed I would assume was sabotaging us. If you're on the side of thinking, well, the alignment problem is actually the easier part of things. I think that's a relatively straightforward technical problem that we are on track to solve.
Ajeya Kocha
Solve.
Rob Wiblin
But there's this laundry list of 10 other issues. It's then very obvious, but we'll have the brilliant AGI. So why don't we just use that to solve all the other things? And also I'm inclined to trust it and believe it.
Ajeya Kocha
Yeah. So I do think that if you are not worried about alignment at this early stage, everything becomes easier. It becomes an even more attractive strategy and path. But I think the canonical using AI for AI safety or using AI for defense plan does imagine that we're not sure at the beginning that they're aligned. We may not be highly confident that they're extremely misaligned and fully power seeking and looking to take over at every opportunity. But we're not imagining that. We know with confidence we can trust them. So figuring out how to create a setup where we use control techniques and alignment techniques and interpretability and whatever other tool at our disposal to get to the point where we feel good about relying on their outputs is a crucial step to figure out because it either bottlenecks our progress because we're checking on everything all the time and slowing things down, or it doesn't bottleneck our progress, but we hand the AIs the power to take over.
Rob Wiblin
So which kind of specific problems arising from the intelligence explosion are you envisaging wanting to get the AGI to help us out with?
Ajeya Kocha
Yeah. So one obvious one is just AI alignment. How can we ensure that either these AIs that we're using to help us right now or future generations of AIs that they help us create and future generations that those AIs help us to create. How can we ensure that that whole chain is motivated to help humans and is honest and is basically doing what we say and steerable? And that is sort of the foundation of everything else. But then there are also other things that are not really about AIs at all, that are just about broad societal defenses. So if we think that the advent of extremely powerful AI will create a flood of new cyber vulnerabilities that are quickly discovered in a bunch of critical systems like weapons systems and the power grid and so on, can we preemptively use those same AIs that are good at finding those vulnerabilities to find and patch them before bad actors can use the AIs to find them? Another thing is biodefense. So you had my colleague Andrew on your podcast recently that talked about his ambitious plan to rapidly scale up detection of novel pathogens, rapidly scale up medical countermeasures when they're detected, and rapidly scale up the manufacturing of PPE and clean rooms and things like that. If we have AI systems that are good at that kind of research problem, and also maybe we have, at that point, robots, so a lot of that manufacturing itself can be automated and can go a lot faster than if humans had to do that stuff, that would be a big boon to biodefense. And then there's somewhat more speculative things along the lines of, you can think of this as a kind of defense, you can think of it as a psychological defense, maybe, but there's stuff around. Can we use AIs to make our collective decision making a lot smarter, a lot wiser, a lot better? Can we make it so that we're better at finding truth together? Can we make it so that we're better at coming to compromise policy solutions that leave lots of people happy?
Rob Wiblin
How do you ensure that advances in AI doesn't lead to a war between the US And China?
Ajeya Kocha
That kind of thing, or even that too, but even more mundanely, stuff like, over the last 10, 15 years, social media has led to a degradation of political discourse. Could AI tools help you just kind of find the policy from among the vast space of possible policies that a large number of people actually like and can credibly put trust in, and so on?
Rob Wiblin
So I interviewed wilmacaskill and Tom Davidson from Forethought earlier in the year, and the organization has a long list of what they call grand challenges, which they suspect all of them are probably amenable to. This kind of AGI labor during crunch time. I think other ones are like ensuring that society doesn't end up locked into particular values, but kind of prematurely cuts off our ability for further reflection and changing our mind. The potential use of AI or AGI in as much as it's very steerable and follows instructions to be used in kind of power grabs by the people who are operating it. I guess the space governance, this question of if we actually do start to be able to use resources in space, how would we share them? How would we divide them such that, in particular, such that there's not conflict ahead of time. Because people anticipate that once you start grabbing resources in space, you're on track to become overwhelmingly dominant. Yeah, there's epistemic disruption, which you mentioned, I guess new competitive pressures, kind of concerns that you can end up in a sort of Malthusian situation if you have competition between many different AIs and possibly some others that are missing here. But there's, I guess, many other, like, I guess we don't know which of these are going to loom large at the time. Some of them might feel like they've kind of been addressed, or perhaps that we were hallucinating issues that aren't so severe. But yeah, there's many different ways that we could potentially apply it.
Ajeya Kocha
Yeah, I agree. I think all of those problems that Tom and Will highlighted seem like real problems to me. I think maybe my approach would be to, from our current vantage point, lump a lot of that under AI for helping us think better and helping us find solutions that we're mutually happy with. So it's like AI for coordination, compromise, negotiation, truth seeking, that cluster of things. Because I think something like the question of space governance, how do we divide up the resources of space if there are some existing factions that have an existing distribution of power? No one really wants the sort of destruction that comes from everybody racing as hard as possible to get there first. But there's a complicated space of negotiated options beyond that. And I think AIs could potentially help a lot with that sort of thing.
Rob Wiblin
So you said in your notes that you think this approach is basically what all of the frontier AI companies say. This is their safety plan, more or less. Is that right?
Ajeya Kocha
Yeah, I would think so. I think if you look at public communications from at least OpenAI, Anthropic and Google DeepMind, this sort of jumps out more or less in these different cases. But in all of their stated safety plans, you see this element of, as AIs get better and better, they're going to incorporate the AIs themselves into their safety plans more and more. And I think some are more explicit than others about expecting some sort of specific crunch time that occurs when AI is rapidly accelerating AI R and D. But everybody is picturing AI as playing a heavy role in the safety of future AIs. Yeah.
Rob Wiblin
What assumptions are necessary for this approach to make sense? Or what kinds of setups could actually just make it a bad plan?
Ajeya Kocha
Yeah, I think fundamentally you need it to be the case that there exists a window of opportunity where before AIs are uncontrollably powerful or have created unacceptable levels of risk, where they are really capable and really change the game for AI safety research. And that there's some meaningful window of time where you can notice as you're approaching it. And even by default, without crazy slowdown, it lasts at least six months or lasts a year. If you think instead that once your AI sort of hits upon some generality threshold, it within a matter of days or weeks becomes crazy super intelligent. This plan doesn't work because there's no time to respond. You wouldn't even notice, probably before it's too late. And then I think there can also be unlucky orderings of capabilities where this plan wouldn't work, where you could have AIs that are really specifically good at AI, R and D and they're really not good at anything else. Not even AI safety research that's very similar to AI R and D. They're just extremely good at AI, R&D. Maybe the only thing they're good at is making it so that future generations of AIs have, have better sample efficiency and can learn new things more efficiently. Then you could have a period of six months or a year where you know this is happening and you have these AIs, but you're still sort of hurtling towards a highly general superintelligence without being able to use these AIs for anything else necessarily, because they're just not good at anything else.
Rob Wiblin
There's something that's a bit self contradictory about that. Because an AI that can, it's like extremely smart, but all it can do is improve the sample efficiency of the next model is in a sense like not very troubling, but in itself, because it doesn't have general capabilities, that kind of model isn't going to be able to take over or invent other technologies. It's only at the point that it has the broader capabilities, the broader agency, that it actually is able to make problems. But I Guess you're saying you could have a long lead up where that's all that it can do. And then at the last stage.
Ajeya Kocha
Yeah, and then at the last stage it might go back to the first scenario I talked about where it's like, oh, the narrow AIs that are just like savants at AI R and D hit upon an algorithm in almost like a blind search. Almost like if you imagine alphafold, it is brilliant at figuring out how proteins fold, but it isn't broadly aware. You could imagine such AIs or an algorithmic search process hitting upon an architecture or a training strategy that then can go foom really quickly. And so in this lead up, you're like, yep, AI is accelerating AI R and D. It's crunch time. We have six months left, we have three months left. But like, these AIs are not the AIs that you can use for anything useful.
Rob Wiblin
Yeah, I guess many of the problems that we'd like it to help with, social issues, political issues, philosophical issues in some cases. What do you think is the chances that AI, I mean, the companies, I think they're working harder to make them good at coding and to make them good at AI research than any other particular thing. And I guess those are more concrete, measurable problems than solving philosophical questions. So it seems like it is really a live risk that unfortunately the balance of capabilities will end up being pretty disadvantageous for this plan.
Ajeya Kocha
Yeah, I think that the further afield you go from work that looks like doing ML research and doing software engineering, the greater a penalty they'll probably be. The AIs currently are much better at helping my friends who do ML research all day than me, where I do weird thinking and go on these kinds of podcasts and write emails to people making grant decisions and stuff like that. It's much worse at that stuff. You can see already that it's got a very specialized skill profile. Fortunately, I do think that at least AI safety, there's a big chunk of AI safety research that does look very similar to ML research. And I do think, like my friends who are getting like big speed ups from AI are safety researchers and they're doing the kinds of work, control, alignment, et cetera, that I think will be some of the most important things you want these AIs to be helping with at the very beginning. But yeah, stuff like AI for epistemics, AI for moral philosophy, AI for negotiation, AI for policy design, all that stuff just may not be that good. Doesn't necessarily have to be good by default, and that's like a big concern of the plan.
Rob Wiblin
I guess another worry would be that the AI models end up being able to cause trouble before they end up being capable enough to figure out solutions. A classic case there would be. Imagine that we put a lot of effort into. I guess it would be a bit stupid to do this, but we put a lot of effort into training an AI model that's extremely good at developing new viruses or, or new bacteria, basically changing diseases to make them worse. I mean, there are people who are using AI to develop new viruses. I guess they're using it to develop medical treatments. But that sort of stuff can then be repurposed for other things. But if that sort of highly specialized model arrives first, before you end up with a model that has a sufficient understanding of all of society and biology and medicine to figure out what the good countermeasures are, then we need a different approach than this one.
Ajeya Kocha
Yeah. And in general, I think. I think of AIs doing defensive labor as a prediction about the world that you want to try and be thinking about as you make your plans. It's not a guarantee. And in many cases the answer will be to do. To specialize now in doing the kinds of things that might be hardest for the AIs to do then. And I think stuff like building a bunch of physical infrastructure to stockpile a bunch of PPEs and vaccines and things like that is a prime candidate for something that just inherently takes a long lead time and that the AIs might not be that advantaged at, at the point that they're good at doing the scary things that it's meant to protect against.
Rob Wiblin
Yeah, I thought that was going to be another concern of mine that Inasmuch as the AIs are very helpful, you might imagine that they're very helpful at the idea generation or the strategizing stage, but they might still be quite bad at actually running a business or actually figuring out how to do all of the manufacturing.
Ajeya Kocha
So.
Rob Wiblin
So if they could come up with a great strategy for caravailing new bioweapons, where they're like, here's the widget that you should use. Go and make 10 billion of them. They're like, can you help us with that? It's like, no, I'm not very good at that. Good luck.
Ajeya Kocha
Yeah, I think that in general, you should expect AIs to be much better at things, that there are tighter feedback loops on where you can recognize success after a short period of time. And that's why they're like, that's one of the reasons why they're really, really good at coding, because you can just train them on this very hard to fake signal of did the code run after you did whatever you did with it? And in general, I think idea generation versus actually executing on a one year plan has some of this element of you can read a white paper and be like, huh, yeah, that's pretty good. And you can push the thumbs up button and generate an AI that's pretty good at generating white papers that you think are neat and probably would work. But it's much harder to train the AI to run the team of thousands of humans and robots that are actually executing on the plan.
Rob Wiblin
Why is the crunch time aspect or the intelligence explosion taking off actually even relevant to when we would want to start doing this? Because you might just think if AI can help us do research or do work to solve any of these problems, then as soon as it's able to do that, we want to do it. Like whether or not an intelligence explosion is kicking off or not.
Ajeya Kocha
To some extent, that's right. I think the reason that I focus so much on the intelligence explosion is twofold. One is because at that point I think we might have a pretty short clock to figure out a bunch of stuff. And the default trajectory might look like 12 months to extremely powerful uncontrollable superintelligence that could easily take over the world. So it kind of changes our calculus of you want to focus on very short term things rather than things that have long lead times, at least at crunch time, if not before. The other thing is, I think crunch time can help alleviate some of the challenges we've been Talking about with AIs not being good at the full spectrum of things we want them to be good at. Because sort of by definition, at that point, AIs are really good at further AI R and D. And one of the things we could do with AIs that are good at AI R and D, at least in most cases, is to try and direct their AI R&D towards filling out the skill profile of AIs and getting them to be good at some of the types of things that we want them to be good at that they aren't so good at right now. And so at that point, you might have just much more capability at your disposal and it might be much more worth putting in the effort to try and fine tune and scaffold and do all these other things to make your AI that's good at moral philosophy, or your AI that's good at biodefense.
Rob Wiblin
So you're thinking about this strategy not just as a description, I guess, what other organizations potentially should work on, or as a description of what the AI companies are already planning to do, but also, I guess, because you think maybe this should influence what open philanthropy plans to do over coming years, and potentially that open philanthropy's best play might be to have billions of dollars waiting at this relevant crunch time and then disperse them incredibly quickly, buying a whole lot of compute to get AIs to solve these problems.
Ajeya Kocha
Yeah, I mean, just like how right now, 80% plus of our grant money goes to salaries to pay humans to think about stuff and do research and do policy analysis and advocacy and all these other things. So too, in a few years, it might be the case that AIs are better than most of our human grantees, and our money should mostly be going to buying API credits or renting GPU time to get the AIs to do a similar distribution of activities.
Rob Wiblin
So an alternative approach to this would be that at the point that we get a heads up that we think an intelligence explosion is beginning to take place, we do everything we can to pause at that stage, to slow down, basically to arrest that process, so that rather than having to rush in three or six months, get the AIs to fix all of these issues, we buy ourselves a bunch more time. Why not adopt that as the primary approach instead?
Ajeya Kocha
Yeah, so I think that the plan I described is compatible with pausing at an intelligence experiment right at the brink of an intelligence explosion. In fact, I would hope that we do that because I think by default, having 12 months to get everything in order is just not enough time. But I think of it as doing two things. One is making the pause less binary. So if you think of the default path as almost 100% of AI labor goes into further rounds of making AIs better and making more AIs and making more chips and so on. And you think of a pause or a stop as zero percent of the world's AI labor is going towards those activities. I think there's a whole spectrum between 0 and 100%. And then I think of it as doing another thing, which is it's sort of answering the question of what you do in the pause, which is like you do all this protective stuff and you have these AIs around to do it with. And you might think, once you have that frame of making the pause less binary and thinking really hard about what you do during a pause, I think you might often end up thinking, oh, it's worth going a little bit further with AI capabilities, because especially if we tilt the capabilities in a certain direction, we might at the end of that get AIs that are much better than they are right now at biodefense while still not being uncontrollable, still not being that scary. And you can imagine a bunch of little pauses and little redirections and so on during that whole period. And I would hope that at some point in the period we do activities like policy coordination and so on that cause us to have longer in this sweet spot of AIs that are powerful enough to help with a lot of stuff, but not so powerful, they're like, you know, we've already lost the game.
Rob Wiblin
So yeah, we should probably clarify that. Although you think this is among our best bets, in an ideal world, you think that we would go substantially slower through all of this because as good a plan as this might be, we'll really be white knuckling it and not be confident that it's necessarily going to work.
Ajeya Kocha
Yeah. So I think that if a really clear early warning sign triggers that we are about to enter into this intelligence explosion, fast takeoff space where we go in the space of 12 months from AI R&D automation to vastly superhuman AI, then I would vote for, at that time shifting that trajectory to be 10 times longer or even longer than that and trying to make that transition as a society in 10 years instead of one year, or 20 years instead of one year. I still wouldn't, and this is maybe a bit of a quibble. I still wouldn't advocate for pausing and then like, hanging out for 10 years and then unpausing, because I actually think that, like, slowly inching our way up is better than like, pause, then unpause and then having a jump. But yeah, I would like going back to what we said about, like, how your default expectations of trajectories influence what you think should happen. I think the default is going through this in like one year, and I would certainly rather it be 10 or 15 or 20 years. But I think that the frame of using AIs to solve our problems applies regardless of whether you're sort of white knuckling it in one year or maybe eking out an extra two months or if you manage to get the consensus and the common knowledge that allows the world to step through it in 10 years.
Rob Wiblin
Yeah, because inasmuch as we're slowing down to do something, this is a big part of the thing that we're slowing down to do. So this is a big part of the company's plan for technical alignment. If this doesn't work out, why do you think it's most likely to have failed for them?
Ajeya Kocha
I think that if it fails, it's probably most likely to fail because they just didn't actually do a big redirection from using AIs for further AI capabilities to putting a lot of energy towards using them for AI safety because they say this is their plan, but they don't really have any quantitative claims about at that stage what fraction of their AI labor, or their human labor for that matter, is going to go towards the safety versus the further acceleration. And they'll be facing tremendous pressure at that point from their competitors to stay ahead. And so my guess is that unless they have just much more robust commitments than they have right now, they probably just won't be directing that much of their AI labor. And so if they have 100,000 really smart human equivalents, maybe only 100 of them are working on AI safety, which is maybe still more than they had before in human labor, but not that much compared to how quickly things are.
Rob Wiblin
Going unless they have really strong commitments. But I guess other mechanisms would be that it's legally required at this point the government basically insists that most of the compute go towards this, or at least most of it is not going towards recursive self improvement. Or I guess if the companies could reach some sort of agreement where they're saying, well, we would all like to spend more of our compute on this kind of thing, so we're going to have some, I guess, contract where we're going to spend like 50% of all of our compute and then we don't lose relative position in particular.
Ajeya Kocha
Yeah, I mean I think that particular contract is probably going to run into.
Rob Wiblin
Big anti trusted, maybe a little illegal. But yeah, maybe we could carve out an exception to antitrust with this one. I guess a different mechanism. In as much as the government is taking a massive interest, they could help to try to coordinate this one way or another.
Ajeya Kocha
Yeah, I think that's a possibility. I do think it's a bit tough. This is not the kind of thing it's like super easy to make laws about because it's really not a box checking exercise. Like what do you actually when you write the legislation that like half the compute must be spent on safety rather than capabilities? Like what do you count as safety research? And how are you enforcing this? Do you have auditors in there being like what are you working on? What are you working on? To all the team leads in the companies and checking off that it's 50% safety. I can imagine stuff like that. I think it would require extremely technically deep regulators that we just don't really have right now.
Rob Wiblin
I think I thought that you might say that the most likely reason for this to fail was that it just turned out that alignment is incredibly hard. You get egregious misalignment even at relatively low levels of intelligence. And we don't really figure out how to fix that early enough to get useful work out of them.
Ajeya Kocha
Yeah, I think that's a possibility. I don't think it's the most likely way it fails on my views. I think the most likely way it fails is that they don't go super hard on it. But I think it's also plausible that they're just trying to get the AIs to help with alignment and the AIs are just misaligned and the control procedures and other things are ineffective. And so they just deliberately only help with further AI R&D and don't help with alignment and safety and biodefense and all these other things you'd want them to help with. I would hope that at that stage the transparency regime is strong enough that that fact is broadcast really widely. And then that could inspire like a change in policy that causes us to slow down. But. But then in that world it's a bad world. Even if we do slow down a lot because just a tough. We're just on our own. We have to like do this stuff without the AI's help because we can't get them to help us. But I'm actually like reasonably bullish about control techniques getting early AIs that are not super galaxy brain super intelligences to be helpful for a range of stuff that they're good at.
Rob Wiblin
I guess another way that they could end up actually just not making that much of an effort is if the window is relatively brief and it just takes a long time to get projects off the ground and they haven't really planned this ahead. So they end up debating it back and forth and then by the time they've figured out that they actually do want to do this, I suppose it's like nominally in these various papers, but I wonder whether they actually are thinking ahead about how this would feel and whether they'll be have the decision making capability to decide to redirect enormous resources towards this other effort.
Ajeya Kocha
Yeah, I do think anything that requires a large corporation to be super discontinuous in something it's doing is facing big headwinds as a plan. So I would hope that they're sort of smoothly increasing the amount of internal inference compute that is going towards safety as the AIs get better and better so that the jump doesn't have to be huge at that final stage. And that is something that if we could elicit honest reports without creating perverse incentives, that's something I'd want to know about. How much human labor is going to safety versus capabilities and how much internal AI inference is going to safety versus capabilities. How much fine tuning effort is going to safety versus capabilities. And I think they have a much better shot if they're stepping it up over time on some kind of schedule.
Rob Wiblin
Okay, so that's the AI companies who I guess we're imagining would mostly be focused on this strategy for AI technical alignment. But you've been thinking about this more in the context of open philanthropy and what niche it could fill. What would open philanthropy need to do if this was dumping billions of dollars onto this plan became its mainland strategy?
Ajeya Kocha
Yeah, I think that for now, the biggest thing we need to do is very similar to the biggest thing I think society needs to do for preparing for the intelligence explosion, which is really trying to track where we're at right now in terms of how useful AIs are for the work that we do and the work our grantees do. I think pushing ourselves to automate ourselves and pushing our grantees to automate themselves and tracking how good is AI at the stuff Forethought does, how good is AI at the stuff that Redwood Research or Apollo does? How good is AI at the stuff that our policy grantees do? And I think that is just one thing, just socializing within ourselves that, hey, it's probably a big deal when the AI start to get really good at any given good thing we're funding. And once we start to see signs of life there, we should be prepared to potentially go really big on that. And like you said earlier, I do think crunch time isn't 100% a special thing. We absolutely shouldn't be waiting until crunch time to do anything at all. It's just the prediction that crunch time is the point when a lot of things that were hard to automate before become easier to automate. So if it turns out, for example, that AI is really good at math research, which I think is plausible, then maybe we should be trying to deliberately shift our technical grantmaking towards more mathy kinds of technical grantmaking, because that is an area where you can churn a lot more. That's just so Much more tractable. So I think just having a function that is looking out for these things and is maybe just poking open Phil and OpenPhill's grantees to consider shifting their work towards more easily automatable things, like consider repeatedly testing whether their work can be automated, is a big thing. Then I think I could imagine down the line something like even just having separate accounting for the rest of our grantmaking versus grantmaking that is going towards paying for AIs for our grantees. We already pay for ChatGPT Pro subscriptions and ChatGPT API credits for tons and tons of grantees. I think just making it a bit more salient in our minds. What fraction of our giving is going towards that? And do we endorse its size? And is there any place where we should be going bigger? And are we on track? Is the percentage climbing the way we think it should be? Does that seem in line with the way AI capabilities are climbing? Are we on track to, if we think crunch time is going to start in six years, are we on track to have inference compute be like a large fraction of our spending at that time?
Rob Wiblin
If I think about this psychologically, I could imagine if I was leading open philanthropy or I guess I was one of the donors being advised, and we did have these transparency requirements and we did start getting a sense that an intelligence explosion might be kicking off. I could imagine dithering for a long time rather than deciding to commit billions of dollars towards this because there's only a particular amount of money, there's only a particular size of endowment. And I think I would be very scared that we'll be going too early or this is a bad idea, or we're going to have egg on our face afterwards. Because it will turn out there were some early signs of intelligence explosion, but it's not really going to work out. And then we've spent $10 billion and we have nothing left to show for it. You'd feel really bad if you made that mistake. Does that sound like a plausible way for things to go?
Ajeya Kocha
Oh, totally. I mean, I think it's just a very natural, institutional. I think even, even beyond just being scared of making a mistake on this front, it's just that organizations have particular ways they do things and there's like, processes. And right now, open Phil's process for grantmaking looks like usually someone fairly junior gets an opportunity, come across their desk, either through one of our open calls or through some contact they have, and that junior person pulls together some materials to convince their manager it's a good fit. And then that manager sort of convinces someone higher up that it's a good fit. And you can have two layers or three layers or sometimes four layers of information cascading up the decision making process that we have in place as an org and then it's approved. And it's just like if the right thing to do is to spend a billion dollars on some particular strain of work that's super automatable, it just like that isn't even like you wouldn't trust some random junior person to make that call. You might need to have just a different process for that. And I don't know what that process would look like, but I think that would be one thing to figure out.
Rob Wiblin
I guess for this sort of incredible scaling of funding and effort to take place, it would have to be you're going to be incredibly bottlenecked on people or there won't be that many more people involved. So it would have to be the AI is not just doing the object level work, but also deciding what problems to work on. Yeah, like managing the project and overseeing other AIs, basically just taking up the entire org hierarchy. So that's the picture that you're envisaging.
Ajeya Kocha
Yeah. So I think there's two possibilities here. One possibility is that by the time it's the right move to dump a bunch of money on crunch time, AI labor, openfill itself has already been largely automated. And that's actually like an easy world because in that world we just have a visceral sense that AIs are really helpful because maybe we've slowed down our junior hiring and all our program associates are AIs right now. And we are totally transformed as an organization. So the evidence, the conviction to pull the trigger might be easier to achieve. And then actually we have a bunch of labor. So Maybe we have 1,000 people on the AI team instead of 45 that we have now. And they can figure out all this stuff much more quickly. But I think the concerning possibility is actually there's jaggedness where maybe AI is extremely good at math and maybe AI is extremely good at technical AI safety and certain specific kinds of manufacturing that could be really useful for a PPE play. But it's not that we haven't automated ourselves. It's not that good at doing our jobs because there wasn't much of that stuff in the training data. We're just not. Well, it still makes horrible mistakes. AI labor. Yeah, it makes. You can't fully trust it in a way that you can put it in A setup in software or manufacturing where you catch those mistakes. But you need humans to do that on the open Phil side. So we're not very automated. We don't have a visceral sense of it's time now. This is the moment. AIs are really, really good. We got to go big. But it's still the right thing to do to pour a bunch of money into AI labor on these few verticals that are heavily automated.
Rob Wiblin
I think we've maybe actually been bearing the lead a little bit here on what the biggest challenge is for an external group like openphill to implement this plan, which is will you even be given access to the very best models that are being trained? And I guess at this crunch time, when there's a crunch on demand for compute, will you actually have enough computer chips? So will anyone be willing to sell to you for you to do this kind of work? Can you go into that?
Ajeya Kocha
Yeah. So I think there's two challenges here to getting access to enough labor as an external group. One is whether they will just even sell to you. So like I said earlier in AI 2027 and a lot of stories of the intelligence explosion, you get to a point where one company has pulled far enough ahead of its competitors that it keeps its internal best systems to itself and only releases systems that are considerably worse than its internal frontier that are just good enough to be ahead of its competitors released products. And there can be a growing gap in how intelligent the best internal systems are and how intelligent the best externally accessible systems are. And the AI company may deliberately choose not to sell to willing customers because they want to keep their secrets to themselves. Another possibility is they might be willing to sell to you, but the price just might be way too steep because the opportunity cost of using that compute to sell to you, to do whatever you want to do with it, is training. Furthermore, powerful AIs and they might be willing to pay quite a lot for that. So I think both are challenges. The second one is in some sense more straightforward to address, which is you try to hedge against this possibility by having some portion of your portfolio really exposed to compute prices and hope that maybe that looks like in the extreme case, just having GPUs yourself, that in peacetime you just rent out to other people doing commercial activity with it. But then during crunch time you redirect to doing AI labor. Although in that case you'll have to furthermore figure out how to get the latest AI chips, the latest AI models onto those chips that you own. So you might have to cut deals to make that happen. But also in less extreme cases, you might just purchase a bunch of Nvidia or purchase a bunch of liquid public stocks that are exposed to AI to make it more likely that you can afford AI capabilities at the time.
Rob Wiblin
So there could be a huge run up in the price of GPUs or compute at this time. But you can partly hedge against that possibility by having most of your investments be in Nvidia or other companies that sell GPUs so that if their price goes up, you benefit on the investment side and that helps to offset the increasing price. Okay. And then on the software side, there's a question of whether you'll have access to the very best models that are being trained on. I guess on the one hand there's this story you could imagine where the companies are very close together, the models are roughly the same, margins are very low. They're very keen to put out models as soon as possible in order to remain competitive. I guess on the other hand, you could have one leader that's starting to keep things all secret. Do you have a particular take on which of these scenarios you think is more likely to come about?
Ajeya Kocha
Yeah, I think that at least at the beginning part of crunch time, like when the AIs are just starting to automate a lot of AI, R&D, my bet is that things will at that point be relatively commercial, relatively open. The leading few companies are within a month of each other in their capability frontier. Or maybe it's hard to say who's in the lead because one company specializes in one aspect. Their model is a little spiky on pre training and another company's model is a little spiky on software engineering or something like that. And, and I think that the reason I think that is basically just because it's kind of what a naive Econ 101 model would predict would happen. It seems like these companies don't have big moats. And it also seems like what we've seen happen over the last few years.
Rob Wiblin
It kind of describes the present day, more or less.
Ajeya Kocha
It describes the present day. And that's a change from a few years ago where I do think OpenAI had way more of a lead and it seemed more plausible that there would be monopoly or a duopoly. But there are reasons to push in the other direction, which is basically that if you have a super exponential feedback loop, you have a bunch of actors that are growing at an increasingly rapid rate, like first at 2%, then at 4%, then at 8% and they don't interact with one another, you do get a winner take all dynamic, where if they're growing on the same growth curve, but one gets there, gets to a particular milestone first, that leader gets more and more and more powerful and wealthy relative to the laggards. This is in contrast to exponential growth where if everyone is growing at 2% forever, then the ratios between more and less wealthy nations or companies stay fixed. So there is a reason to think that specifically around the time of the intelligence explosion, gaps will begin to grow again. But I think probably around the start. Yeah, it will most likely be the case that you can buy AI labor if you can afford it, you can buy API credits, you can go on chatgpt.com, and then I think I have a lot of uncertainty about how it evolves from there.
Rob Wiblin
Yeah. What do you think is the chance that the leading company will try to keep the level that they're reaching secret?
Ajeya Kocha
I think it depends a lot on the competition landscape they face. So basically if the other companies are really far behind, then I think there's a pretty strong incentive and reason to keep your capabilities secret. Because you give up quarterly profits. But maybe you don't care about that because you're running on investment money anyway. If you can get your AI to help you make better AI, to help you make better AI and so on, you could emerge with super intelligence. That might give you a power that rivals nation states or the ability to just decisively control how the future goes. And that might be very attractive to a sort of power seeking company. I do think it does involve foregoing short term profits though, which means that if competitors are close at your heels and your investors are breathing down your neck to like deliver quarterly earnings, you.
Rob Wiblin
Can'T go and tell all of your investors, oh, don't worry, we have like we have a super intelligence because I think then, then word will get out.
Ajeya Kocha
Well, and then also they. Your plan is to screw over the investors. In this case your plan is to create a superintelligence, not to pay them back. So create a super intelligence and take over the world. Maybe like they won't like that. There's like a mismatch in incentives between the investors and the CEO. And the CEO is sort of being a bad agent to their principal. So basically the more things look like an efficient competitive market with very little slack, the more the leading company will be sort of forced to provide access to the rest of us.
Rob Wiblin
To what extent do you imagine the companies would be enthusiastically bought in on assisting with this plan. So this strategy is their predominant approach to AI technical safety. I think even the optimists agree that there are other issues that society is going to have to deal with. In fact, they say this all the time. The leaders of the companies that we're going to need a new social contract, it's going to upend everything. It's going to be a big deal. I imagine that inasmuch as they're nervous about the effects that the technology is going to have, they'll be very happy if someone came to them with a pre prepared plan for here's how we're going to deploy all of this compute in order to solve all these other problems.
Ajeya Kocha
Yeah, I think it's unclear. I think there are certainly they have some incentive to be into this. But the two sort of alternative uses of AI labor that might be more attractive to them are like one power seeking for themselves. Just like building up an enormous AI lead over everyone else and then sort of bursting onto the scene with an incredible amount of power and the ability to challenge the U.S. government or nation states might be attractive to some people. I think that would be a very evil strategy to pursue, but it's definitely in the water. The other thing is more mundane. It's just using these AIs to make normal goods and services, to make the products and the media content and the other services that people most want to pay money for. In a short term sense it's very similar to how right now we don't spend a huge fraction of society's GDP on biodefense and cyber defense and these things. Moral philosophy and moral philosophy. It's just like that's not what people want to pay for. And AI is like another. It's just a thing that accelerates the creation of products and services people want to pay for. And this isn't very high on the list.
Rob Wiblin
I guess most people are not looking to become dictator of the world or to take on huge amounts of power. But I guess the kinds of people who end up leading very risky technology projects are not typical people. They're like somewhat more ambitious than the typical. So I suppose we can't potentially rule that out as a possibility.
Ajeya Kocha
Yeah.
Rob Wiblin
So a possible challenge would be that even if you have an enormous amount of compute, there might just be only so fast that you can go because you require some sort of sequential steps or there's some step that is just like bottlenecked in time like you have to do. I guess people talk about things where you have to do an experiment that just Actually takes a certain amount of time to. To play out. But more generally, at least with LLMs, for example, they produce like one token after another. And having twice as much compute doesn't necessarily allow you to basically complete an answer twice as fast without limit. How much is that an issue here? In as much as we're trying to solve problems in a very short calendar.
Ajeya Kocha
Time, Yeah, I think that that is likely to come up especially for physical defenses like manufacturing PPE or scaling up the ability to rapidly create medical countermeasures, and then also for social and policy things. So I can imagine that AIs could be very helpful in figuring out what kind of agreement between the US and China would be mutually beneficial and how we could enforce it. But the way human decision making works still probably requires humans from the US and China to come together and talk about it, have a conference or convening, and come to a decision that they ratify and they feel good about. And that could be a bottleneck.
Rob Wiblin
Yeah. Are there any other examples of similar bottlenecks? I guess in terms of solving theoretical problems, I suppose you can speed things up enormously by having many, many different instances of the same model, try to brainstorm different solutions and then have them evaluate one another. And that allows you to kind of have many different efforts in. In parallel.
Ajeya Kocha
But it's also, I do think for deep theoretical problems, you can speed things up by having efforts going in parallel. But the right solution that's out there somewhere involves multiple leaps where it's hard to think of the next insight without having the foundation of the earlier insight. So really, even if you have 100 AIs working in parallel, what will happen is that one of them comes up with the first step of the insight, and then everyone is working in parallel on finding the next insight. But you still need to go three or four steps in.
Rob Wiblin
So what sort of stuff do we need to be doing in advance? I guess, for example, setting up planning meetings ahead of time for diplomats between the US and China. Perhaps we need to do that at the very early stage in anticipation that eventually we might have a deal that they might want to ratify. I guess that sounds a bit crazy, but are there other examples of things that you need to do before this all kicks off?
Ajeya Kocha
Yeah, I think that in general you want to be thinking about what would the AIs at the time be most comparatively disadvantaged in? They'll have all these advantages over us. They'll understand the situation much better at that point in time than we do now. They'll be able to think faster, move faster and so on. But I think what we can contribute now would be things that just inherently take a long lead time to set up. That might include physical infrastructure, like the bio infrastructure that my colleague Andrew is working on building out. It might also include just social consensus. I think it takes some amount of time for an idea to be socialized in society, to have it as an accessible concept that maybe we should try and create some sort of treaty between the US and China to allow AI to progress somewhat slower than it might naturally and use a bunch of AI compute to solve all these problems. I think that kind of thing takes years to become something that's in people's toolkit in the water, such that they actually think to have the AIs go down that path and figure out the details of that.
Rob Wiblin
So what should people be doing if they think that this kind of makes sense or it's something that they'd want to contribute to? Are there other organizations that should similarly be sort of planning ahead and thinking about how this might look for them? Or could individuals be thinking about how they could contribute to, I guess, adopting this approach for their own particular projects?
Ajeya Kocha
Yeah. So in terms of other organizations, I think it would be especially great for government entities to be thinking about adopting AI. I know that there's just a number of random little types of red tape that make it harder for governments to adopt AIs than for anyone in industry to adopt AIs. And I think we might end up in a situation where the regulatees, the industry people have fast cars and the regulators have horses and buggies because of this differential adoption gap. I think just more broadly, if your company is not already going maximally hard on adopting AI for your personal use case, and you work on defenses, AI safety, moral philosophy, all these good things, it's probably worth having a team that's just on the lookout for. How could you sort of adopt AI as soon as it becomes actually useful for you?
Rob Wiblin
Let's talk a bit about the career journey that you've been on since we last did an interview two and a half years ago. I guess back then you were doing general AI research and strategy for open philanthropy. This is in 2023. And then in 2024 you started leading the AI technical grant making. And then I guess towards the end of that year you decided to take four months off and take a sabbatical. Yeah. Tell us about all of that.
Ajeya Kocha
Yeah, I think that I had been at OpenPhill for more than six years before I made my first grant. I was involved in some grant making conversations earlier, but the first grant I actually led on was somewhere in mid or late 2023. And I had joined Open Phil in 2016, so it was kind of interesting. My work at Open Phil in some sense, if you kind of, if you just took the outside view and said, this is a philanthropy that's giving away money. My work there was very strange because it was kind of thinking about these heady topics and then writing these long reports that I published on less wrong about them. And I always felt a little like, oh, maybe I should dip into grant making because that is our core product, in some sense, it's what we do. But I had always been sort of drawn away by deeper intellectual projects. So even though I always vaguely had the thought that I should do grant making, it never really happened for me until actually, I think the thing that pushed me headfirst into grant making was the FTX collapse. So actually, sorry, my first grant must have been in 22 instead of 23, because at that point there were hundreds and hundreds of people who had been promised grants by the FTX foundation where their grant wasn't going to go through, or they were worried it was going to be clawed back, or it was partially not going through. An open fill sort of put out this emergency call for proposals for people who had been affected by the crash. And I had, I had some thoughts and takes on technical research and also just the organization needed help, like surge capacity for this emergency influx of grant making. In a matter of maybe six weeks or so, I made 50 different grants after not having made any grants at all. And that was a really interesting experience. And I discovered I. There were elements of it I really liked, but there were also, there's just like something about the way you made grants where you just really couldn't dig into any particular thing very much, especially in the context of something like the FTX emergency. You just had to be like making these decisions really quickly. But I felt like I had thoughts about how grant making could be done with more, at least in the technical AI safety space, could be done with more inside view justification for the research directions we were funding than we had previously. And so in early mid 2023 I sort of tried to go down that path.
Rob Wiblin
Sorry. So in 2022 you did this huge burst of grant making, I guess, trying to help a bunch of refugees from the FTX foundation basically. But then you thought, I guess you would have noticed that there's probably no overarching strategy behind all of the grants. That you were making and you were like, we need to have a bigger picture idea of what we're actually trying to push on and why.
Ajeya Kocha
Yeah. So I was focused on grants to technical researchers. So these are often academics, sometimes AI safety nonprofits, and they would be working on often interpretability or some kind of adversarial robustness. And they seemed like reasonable research bets. But I felt kind of unsatisfied, and I think this is going to be a theme of me and my career. I felt kind of unsatisfied about how the theory of change hadn't been really ground out and spelled out as to how this type of interpretability research would lead to this type of technique or ability we have. And then that could fit into a plan to prevent AI takeover in this way or similarly for any of the other research streams we were funding. This had been actually a big thing that deterred me from getting involved in open Phil's technical AI safety grantmaking for a long time, even though I was one of the few people on staff that thought about technical AI safety outside of that team. It was because in the end it seemed like most grant decisions in this 2015-2022 period turned on heuristics about this person's a cool researcher and they care about AI safety, which is totally reasonable. But I think I wanted to have more of a story for. And this line of research is addressing this critical problem. And this is why we think it's plausibly likely to succeed. And this is what it would mean if it succeeded. And we never really had that kind of very built out strategy because it's very hard. It's a lot to invest in building out a strategy like that. But having been thrown headfirst into grantmaking with the FTX crisis, I was like, maybe I do want to try and take on the AI safety grant making portfolio, which at the time didn't have a leader because all the people who had worked on that portfolio had left by that point, some to go to FTX foundation, actually. And so it was this portfolio that had been somewhat orphaned within the organization and it was clearly a very important thing. And I was like, oh, maybe we could approach it in this kind of novel way for us in this area to really try and form our own inside views about the priorities of different technical research directions and really connect how it would address the problems we most cared about.
Rob Wiblin
It sounds like you find it unpleasant or anxiety inducing to make grants where you don't have a deep understanding of what the money. I guess not so much what the money is being spent on, but you don't have a personal opinion about whether it's likely to bear fruit. Is that right?
Ajeya Kocha
Yeah. Or I think it's a bit nebulous what the standard is that I hold myself to. But I think for my research projects, when I think about timelines or I think about how AI could lead to takeover, or how quickly could the world change if we had AGI, I think I can often with months of effort, get to the point where I can anticipate and have a reasonable response to and a reasonable back and forth with a very wide range of intelligent criticisms for why my conclusion might be totally wrong and totally off base. I feel like I know what the skeptics that are more doomy than me will say and I know what the skeptics that are less doomy than me will say. And I could have an intelligent conversation that goes for a long while with either side. That is a standard I aspired to get to with why we supported certain grants. I could do that with some of our grants. But I wanted the program to get to the point where if somebody came to me and said, isn't interpretability just actually hasn't seen much success over the last four years. What do you make of that? I wanted to be at reflective equilibrium on my answers to questions like that and wanted to be able to say something that went a bit beyond. Yes, but outside view, we should support a range of things. And that is something that I think emotionally is unsatisfying to me if it's a big element of my work.
Rob Wiblin
Yeah. It's maybe worth explaining why it is that open. Phil doesn't aspire to get to that level of confidence with most of its grants. Why is that?
Ajeya Kocha
I think it just takes a long time. I think there's two things. It just takes a lot of effort. And then the other thing is that even if you put in that effort, you don't want to fully back your own inside view. And then I think I wouldn't endorse that either. And so it's this one, two punch where it's just like developing your views about exactly how interpretability or adversarial robustness or control or corrigibility fits into everything is a ton of work. You have to talk to a ton of people, you have to write up a bunch of stuff. And in the meantime, you're not getting money out the door while you're doing all this stuff. Then having done all this stuff, where are you going to end up you're going to end up in a place where there are reasonable views on both sides. And it's a complicated issue. We probably want to hedge our bets and defer to different people with different amounts of the pot and so on. I think people have a reaction that's very reasonably like, okay, we're going to end up in a place where we've thought it through, it was a lot of work, it's still very uncertain. We still want to spread our bets, so why not?
Rob Wiblin
So it doesn't even affect the decision.
Ajeya Kocha
Why not just get to the point where just short circuit all that and spread our bets and lean on advisors? And I think I have sympathy for that. Hopefully I represented that perspective reasonably well. But I just feel like in my life, in my experience, having done the homework really qualitatively changes the details of the decisions you make in ways that I think can be really high impact. One thing that I'm able to do, having gone through the whole rigmarole of forming views, is work with researchers to find the most awesome version of their idea by the lights of my goals and pitch them on that and co create grant opportunities. And I think there's just something that I maybe won't be great at defending, but I just feel like there are other nebulous benefits beyond that and I really like operating that way.
Rob Wiblin
So in 2024 you actually took on responsibility for this whole portfolio, but I guess late 2023. Yeah, 2023. But I guess your personal philosophy of how to operate is somewhat in tension with how open Phil as a whole is tending to operate.
Ajeya Kocha
And the way that's in tension with in the short term making a large volume of grants. I think that's. Yeah.
Rob Wiblin
So what did you end up doing in the role?
Ajeya Kocha
So I think I ended up pursuing a compromise where one thing that just comes with the territory of this role is that there have been grantees that we made grants to in the past that are up for renewal and like part of the responsibility of being the person in charge of this program area is that you investigate those renewals and make decisions about whether we should keep the grantees on or not. And those grants, I tried to follow what an open fill canonical decision making process would be there. I tried to pursue a barbell strategy for a while where on the one hand there were either renewals or people who knew us, who reached out to us to ask us to consider grants where I wouldn't hold myself to the standard of really on the technical merits like understanding and defending the proposal but would lean more on heuristics like this person seems aligned with the goal of reducing AI takeover risk, this person has a broadly good research track record and so on, and try to make those grants relatively quickly. But then I would also be trying to develop a different funding program or some grants that I really wanted to bet on where I would try and work myself to hold myself to that standard and try and really write down why I thought this was a good thing to pursue. It turned out that the second thing basically turned into making a bet in late 23 to mid 24 of AI agent capability, benchmarks and other ways of gaining evidence about like, AI's impact on the world.
Rob Wiblin
So it's sort of the stuff that we were talking about earlier, where you're trying to get an early heads up about whether the AIs are going to be really effective agents. I guess 20, 23, we were really unsure how that was going to go. It seemed like agents in general have been a bit disappointing, or it hasn't progressed as much as I expected, or probably as you expected. But at that point it seemed like, well, maybe by this point they'd be just operating computers completely as well as humans. And you really wanted to, I guess, know if that was the future we were heading for.
Ajeya Kocha
Yeah, yeah. So I launched this request for proposals, which Open Phil has done technical safety requests for proposals before. But this was by far the narrowest and most deeply justified technical RFP that we had put out at that time, where I was like, we are looking for benchmarks that test agents, not just models that are chatbots. And these are the properties we think a really great benchmark would have. And these are examples of benchmarks we think are good and not so good. And we had a whole application form that was in some sense sort of guiding people to or trying to elicit the information about their benchmark that we thought would be most important for determining whether or not it was really informative. And mostly this was just like, be way more realistic, have way harder tasks than existing benchmarks. Even if you think your tasks are hard enough, they're probably not hard enough. There was a lot of push in that direction. So it was a very opinionated and very detailed and very narrow RFP. And we ended up making $25 million of grants through that and then another 2 to 3 million from the companion RFP, which was just a broader all kinds of information from RCTs to surveys about AI's impact on the world. I'm pretty happy with how that turned out it was like you would expect a lot of effort poured into one direction. If you were skeptical of this high effort approach to grant making, there would be this, like, you could argue that I could have just put in way less effort, funded twice as much volume in grants across 10 different areas, picking up the low hanging fruit in all those areas.
Rob Wiblin
So I guess halfway through 2024, you started feeling pretty burnt out or like you wanted to take a bit of a break. Why was that?
Ajeya Kocha
Yeah, I think throughout this. So right around when I switched from doing mostly research to doing grant making, and especially when I was trying to ramp up this program area that had this more inside view, more understanding oriented approach to AI safety research, Holden, who had been running the AI team up to that point, decided to step away and left the organization. And he was my manager. And I think that I had a working relationship with Holden that involved a lot of arguing and discussing about the substance of what I was working on. And when he left, leadership was stretched more thin because someone in leadership was gone. And I think the people who remained in the leadership team didn't have as much context and fluency with all this AI stuff as Holden did. So when I wrote up this big memo being like, oh, we should do AI safety grantmaking in a more understanding oriented way and we should develop inside views. And here's why I think that would be good. And I think what I wanted was for my manager or leadership to argue with me about the object level on that and for there to be some sort of shared view within the organization about how much this was a good idea or what are the pros and cons of it and how much we want to bet on it. But I think that was just kind of unrealistic given the other priorities on their plate and given their level of context in this area. So I ended up having to approach it in a more transactional way with the organization. It was more like, rather than let's talk about whether this is a good idea, it was more like, well, I want to do it this way. And they were like, yeah, I mean, we don't know if that's the best way to do things. And like, we have some skepticism, but like, you can, you can do that if you want. You can do that if you want. And it's like. And so I felt like, kind of lonely because I think, and this is something I learned about myself, like over the course of trying to run this program and then going on sabbatical and reflecting on it, that I really like to be like kind of plugged into the like, central brain of like the organization I'm part of. And I sort of didn't like, like, I didn't feel like I had a path to do that and I instead, like, what I had a path to do was to like, stand up this, this thing, which I tried to do, but it just like felt a bit tough going. And like, it sounds like you are.
Rob Wiblin
You'Re a bit on your own.
Ajeya Kocha
Yeah, I felt a bit on my own. And I'm not a very like, entrepreneurial person, I think, or like I'm ambitious in some ways, but like, it's not. I just really have a high need for like constantly talking to other people. And I try to achieve that sense of team by like hiring people under me to help me with this vision. But I think I was not very good at hiring and management. Partly it was because this vision was like pretty nebulous and I think I like, probably needed to spend more cycles working out the kinks in it by myself and really solidifying what it is and what's the realistic version of doing an understanding oriented technical AI safety program. So it was very hard to hire because you kind of had to hire for someone who really resonated with that off the bat, even though it wasn't a very well defined thing. So that took a lot of energy. And then I think with people I was managing, I have always struggled and in this case still struggled with perfectionism in management. So I have this long history of trying to get people to serve as writers who write up my ideas and it never works for me because they don't do it just the way I want it. And I'm myself a pretty fast writer. And so working with a writer as their editor and getting their writing output to be something I'm satisfied with often ends up taking more time than doing it myself. I found the same happened to some extent with grant makers where at one point we had a number of people spent part of their time working on the benchmarks rfp. I think it's possible that I would have just moved through the grants faster if it were just me working on it, which is a bit tough, I think. I never like. I think this is like a weakness or challenge a lot of new managers go through. And I was sort of going through that at the same time as feeling like some of the feedback and engagement I got from above me was much less than it was before and I had to sort of prove this new way of doing things and felt, yeah, I thought and still think that there was a lot to the arguments I was making. But also it was not a wild success when I took a swing at it by myself.
Rob Wiblin
So September last year you decided to step away and just take some time away from work. I guess after eight years of working very hard full time. What did you end up doing with that time?
Ajeya Kocha
It was a mix of things. I just did a lot of life stuff. I don't know, I just invested more in. I found a new group house to move into or started a new group house. So that was cool. Did more just trying to take care of myself. Started an exercise habit off that exercise habit now again. So we'll see. Then I did a lot of reflecting on why this work situation ended up being so hard for me and also just my journey through my career as a whole and what are the patterns. And when things were hard for me, I also just jumped in and helped with some random projects going on. So the Curve Conference, which is a conference that brings together AI skeptics and AI safety people and people on all sides of the issue of AI's impact on society, that was having its first iteration while I was on sabbatical. So I was able to get involved with that more and try to be helpful more than I could have been if I had a full time job, which was really cool. Did some writing. Most of that writing hasn't been published, but it was still good for me to do. But yeah, it kind of went by really fast. Honestly. There was a lot of stuff to think about and a lot to do.
Rob Wiblin
Yeah. What sorts of reflections did you have on, I guess your career so far and your motivation and I guess what had been difficult in 2023 and 2024.
Ajeya Kocha
Yeah. So I think in terms of 2023 and 2024 specifically, I really do feel like I want to be like an advisor and a helper to the kind of central organization and I had had been that in many ways over the last, over the previous six years. So the transition to being more entrepreneurial and more like I have a little startup making grants in my area and the organization is investing like money in me but not necessarily a lot of like attention and I didn't necessarily have a path to make arguments that then influenced stuff in a cross cutting way, that was hard. So I think that was interesting to learn about myself that if I don't have that I will still gravitate towards trying to meddle in everything else that's going on and if I don't have a productive path to Meddle, I'll feel sad. That was one big thing. I think another big thing is this. Just how much depth do I want? Like, I do think like I have a drive to really like get to the bottom of something or just like I'm always like thinking about the counter argument and the counter argument to the counter argument and like the stuff I liked even when I was very young. Like I, I really liked math tutoring and like, I really liked math in general because you could just like dig and dig and get to an answer and that's just inherently uneasy fit with grant making or just investing like Fox.
Rob Wiblin
Yeah, it's like venture capital that Open Phil is engaged in in a way.
Ajeya Kocha
Yeah. So that was also interesting to reflect on. And like I said, it was like, somewhat strangely for my first six or seven years at OpenPhill, I actually just did do like rather deep research. Even though we were a grant making organization, I just wasn't doing grant making.
Rob Wiblin
Is that in part because Holden really wanted this deep research? He wanted to more deeply understand the idea, both personally and he thought it was healthy for the organization.
Ajeya Kocha
Yeah, I think that's right. I think he had a lot of drive and demand for really figuring out timelines, really figuring out takeoff speeds and exactly what our threat models are for whether AI could take over the world and building that all up. And I think has a lot of the same instinct I have of like, oh, it's just really good to do your homework and it's really good to have the response to the top 10 counter arguments and the response to those responses and just really know your stuff. And so he was the driver of a lot of the work that I did. And I think if you had rerolled the dice and Open Phil had been run by different leadership, it's probably pretty unlikely we would have gone as deep as we did into doing our own AI strategy thinking. Because the thought would have been, well, we should fund a place like FHI or now Forethought to do that stuff instead of us.
Rob Wiblin
In your notes you said that you spent a fair bit of time reflecting in this period about what it had been that you liked about effective altruism. I guess as an ecosystem and as a mentality and what things you didn't like so much about it. Tell us about that.
Ajeya Kocha
Yeah, so I guess it's been a long time since you've talked about effective altruism in the show. So I'll just sort of like open with what it even is, which is this movement or idea that you should think explicitly and seriously and quantitatively about how you can do the most good with your career or with your money that you're donating, and that different career paths and different charities you could donate to could differ by orders of magnitude and how much good they do. So if you are working on reducing climate change, it could be orders of magnitude more helpful to work on researching green technologies versus to work on getting people to turn off their lights more or conserve electricity more in their personal use. And there's this ethos that if you're really taking this seriously and you really care about helping the world, you stop and think and you do the math in the same way that if you had cancer or your spouse had cancer, you would do the research and figure out what treatments had what side effects and what treatments had what success rates, and you would ask a lot of questions of the doctor. There's this ethos that that's what it looks like when you take something seriously. And a lot of people, when they're doing good in the world, they do what makes them feel instinctively good. And there is a whole other approach where you sort of respect the intellectual depth of that problem. And I was really drawn to this. I sort of fell headfirst into the EA rabbit hole when I was 13. So it's been more than half my entire life that I've been extremely involved in this community, this way of thinking. And I think there were maybe three big things that I really, really liked about this approach. One is just that EAs challenge themselves to care about people and beings that were very different from them, very far away from them in time and space. So even the most vanilla EA cause area of global poverty, the vast majority of money that goes to alleviating poverty given by individuals in rich countries goes to helping other individuals in rich countries. Even though money could go much, much further overseas, in countries where people have a much lower standard of living, the reason people donate locally is that they feel more affinity for people who are closer to them and more similar to them. EA also has a lot of strains that challenge people to extend care to animals, to extend care to future generations that may live thousands of years or millions of years in the future, to artificial intelligence also, if. If it can be something that has consciousness and can feel pain and so on. And that was really appealing to me. But then there were also just like, there's a way of going about doing things that was also very appealing to me, which is like, they were very nerdy, they were very intellectual, they were like, really like Thinking stuff through and almost like innovating methodologically on like, how can we figure out which charities are better than which other charities? And like, there are lots of interesting arguments thrown around for this and they were very transparent. There was just a culture of open debate and admitting your mistakes. GiveWell, an early pillar of the early EA movement, had a mistakes page on its website where it just discussed mistakes it had made. They were very honest and high integrity in an interesting way that doesn't obviously follow from caring about other beings more. For example, like, GiveWell refused to do donation matching because donation matching is usually a scam where like the, the big donor like would have given that much anyway even if you hadn't made your donation. So that whole package was like really attractive to me. I think it like really like hit a lot of psychological buttons for me at once and like really felt like my people and like the like way I wanted to live my life.
Rob Wiblin
So there's the being more compassionate to a wider range of beings, which I guess is still the case and probably still something you like about the effective actuous approach. But there was like also going into like enormous intellectual depth and just like really debating things out.
Ajeya Kocha
Yeah.
Rob Wiblin
And then there was also the very high integrity about honesty, like not allowing any chicanery whatsoever or I know, extremely.
Ajeya Kocha
Like fastidious and like exacting level of integrity that like other movements, even other pretty high integrity movements, like weren't aspiring.
Rob Wiblin
To even beyond like what people are even asking.
Ajeya Kocha
Yeah, yeah, you just like, you just like proactively say, like, by the way, did you know donation matching is a scam? That's why we're not doing it. Even even though we would get more donations to help poor people. You know, it was, it's interesting that like that was such a natural part of the early EA movement. Even though like you're sort of giving up on impact.
Rob Wiblin
Yeah, it's not necessarily implied. I guess it's a practical question whether it is or not. So I guess as things evolved you found that, I guess the second one, the intellectual depth was now lacking from your job. Were there other things that were kind of changing that made you less enthusiastic?
Ajeya Kocha
Yeah, I think the intellectual depth was very much there. In other words, parts of the EA ecosystem, especially AI safety and thinking through how exactly would you control early transformative AI systems and things like that? Like I said, my heart was always pulled towards those kinds of questions, even though I worked at a grant making organization.
Rob Wiblin
Yeah, it feels like on some level you really were a more natural grant recipient. Rather than a grant, maybe you should have gotten something to really go in deep on some questions.
Ajeya Kocha
Yeah, I think that if I had graduated college in 2022 instead of 2016, like in 2016, I graduated college, I went to GiveWell. And a big part of why I went to GiveWell at the time was that they had the most intellectual depth on this question of what are the best charities. And if I had graduated college in 2022, I probably would have done MATS, which is this program to upskill in ML AI safety research, and then tried to join an AI safety group. So I think I'm sort of naturally drawn to actually doing the research in some sense. So in that sense, it was sort of a mundane issue that my job, especially after Holden left and the demand for that kind of research evaporated a little bit at the leadership level. It was like, if I were to start over again, probably I wouldn't have applied to join Open Phil. I probably would have applied to join an AI safety group. But then I think the third thing of just this extremely, almost comically high level of integrity that I really, really liked was also eroding over the years. Just as when I think about why, I think that when a lot of the focus of the EA movement was convincing really smart people to. To donate differently, being extremely unusually high integrity was actually just really valuable and powerful asset. Obviously people like me and very wealthy people that were early GiveWell donors really liked that GiveWell had a mistakes page and really liked that. That whole ethos and that whole package, it helped them trust that the recommendations were actually real recommendations and they weren't being spun something and they weren't being sold something like all the rest of the charity recommendation ecosystem. But then when you move away from that being your primary method of change, when instead you've actually attracted quite a lot of funders and now you're trying to use that money and the talent that you've attracted to achieve things in the world, maybe things that involve like, a lot of politics. Then the. Like, the being, like, extremely transparent can be, like, very challenging. Especially because, like, donors, like, want privacy or like, if you're running a political campaign, you don't want your opponents to know exactly your strategy. And like, you know, the ways that you think you might have made mistakes. Like, it's just like, this is not how, like, most of the real world works. Works.
Rob Wiblin
Yeah. It's not the case that the world's most impactful organizations are consistently incredibly transparent or even incredibly high integrity.
Ajeya Kocha
Yeah, yeah. And so there was this tension between the goals, which I felt like I should only care about the goals of ea. So sort of what EA told me, and it kind of made sense to me, was that the point here is to help others as much as possible. The point is not to conform to an aesthetic or like, do things in a way that feels like cleanest or prettiest. But at the same time, I think I was like, to some extent kidding myself about how much of my own motivation and my own attraction to the concept came from just the goals. Like just pillar one and altruism versus pillars two and three of like, that intellectual depth and like intellectual creativity and this like crazy high level of like openness, transparency, like having absolutely nothing to hide, like, you know, letting all comers come. Like, I think for me, like, as a, as a fact about my psychology, the latter two things were actually really important for my motivation and they were sort of over time just like smaller and smaller, like features of what it was like to do ea, to try and pursue EA goals in my career.
Rob Wiblin
Yeah, I guess we should say for people who don't know that, I guess over this period the environment that Openfield was operating in became a lot more challenging and a lot more hostile.
Ajeya Kocha
I guess.
Rob Wiblin
I guess for years it had been funding all kinds of AI related stuff. But as AI became a much bigger industry, it became apparent what sorts of concerns different people had. Its work in some ways started to just clash with very large commercial interests potentially, and also just alternative ideologies that had different ideas about how things ought to be regulated or how things ought to go. And so we're now in a world where there were people who would sit down and think, how can I fuck with open film? Like, what can I do to give these guys a terrible day? What have they published that we could spread? That will be embarrassing for them. And in that kind of environment where people just literally want to cause trouble for you, it's a lot less attractive to be maximally forthcoming about all of your internal deliberations and why you made all of your decisions. All of us would potentially be a bit more conservative in that kind of environment.
Ajeya Kocha
Yeah, I mean, even before the latest round that started in 2023 of AI policy heating up, Obenfeld compromised a lot on its initial wild ambitions for transparency. At the beginning, there was this idea that we would publish the grants we decided not to make and explain why we decided not to make them. When people came to us for grants.
Rob Wiblin
There's a reason most organizations don't do that.
Ajeya Kocha
There's A reason most organizations don't do that. For our earliest two program officer hires, we have a whole blog post that we wrote about their strengths and weaknesses as a candidate and alternatives we considered and how confident are we that this will work out. We stopped doing that. So it's just like there is a level of transparency that's just like I still in my heart want that, but it's absolutely insane. And then I think the adversarial pressure that you mentioned makes it so that Open Phil, as an organization that funds a lot of this ecosystem, has a lot to lose. I think if we go down, a large number of helpful projects, have a much harder time getting funding and we have to be a lot more risk averse than many of our grantees, even though those grantees are also facing an adversarial environment. I think the way they, many of them navigate it is to sort of fight back and explain their perspective and define themselves in the public sphere. I think my instinct is to just do more of that and to just say more and respond. But it's harder to do that from Open Phil's position for a number of reasons. Yeah.
Rob Wiblin
So over the years, a lot of people, I guess usually critics, have said that effective altruism has some things in common with religious movements. To what extent have you found that to be the case? And to what extent have you found that not to be the case?
Ajeya Kocha
Yeah, I mean, I think EA aspires to be and very much succeeds at being like a lot more truth seeking than the world's religions and a lot more truth seeking than a lot of other communities and movements in the world. So in that sense, I think there's a disanalogy that's extremely important. I do think there are like, it's not a bad analogy in some ways because I think for people who, who really are deeply involved in the EA community, it provides like a, a map of the good life. It's like a vision of what it means to be good and have a good life. It's unlike a political movement in that it doesn't just have a set of policy prescriptions for the world, but like many religious movements, it intersects with politics. And there are people who approach political questions like whether you should ban gestation crates for pigs through the lens of their commitment to ea. And it's not just like a community, it's not just like a social club. I think people get solace and friendship from their local community of EAs like people do from their local church community. But it is more than that it is trying to say something about the sweep of the world and your place in it and what it means to live a good and meaningful life. And it intersects with politics and community and a bunch of other things while not being exactly the same as it.
Rob Wiblin
Yeah, I would think a key way that it's not like religion is that it feels more like in many respects a business to me or like a startup or an organization that has quite a functional goal. Or I guess that's a different aspect of it. Well, I guess some people like the ideas, they like the blog posts, they don't engage with the community whatsoever. And I suppose for them it's going to be a different experience. And there's people who like the community. Actually there's many people who participate in the kind of community of people who would say I'm involved in effective altruism, but actually are not that interested in the projects or necessarily even the effort of helping people. So people kind of sample the aspects that they like. But I mean, for many of the people who staff who work in organizations that have other people who would say I'm really into effective altruism, it's much more pragmatic.
Ajeya Kocha
I would say, yeah, I think that is how it ends up manifesting for a lot of people. But I don't think that's really like what EA is. Or I think it's a mistake to collapse EA into a set of three or four goals in the world, like reducing suffering of animals in factory farms plus improving quality of life for poor people in developing countries, plus AI safety. I think in some ways I think people think of EA as a weird umbrella for those three things. And then those three things are basically professional communities pursuing a kind of well defined goal. But I think EA is more like a way of looking at the world and a way of thinking about the good. And I think you can take an EA approach to cause areas that are in some sense more parochial than the big three EA cause areas. I think you can absolutely take an EA approach to US policy from the perspective of thinking about the welfare of US citizens, doing rigorous cost effectiveness analysis of what policies actually help and don't help. And a lot of people do. And then I think there is EA as a generator of new cause areas that sort of could get added to the canon. And I think right now there's a bunch of fertile ground with could EA be a force that helps society prepare for radical change by advanced AI, where AI safety is one big important thing there? But there might be a range of other issues. And you might want to prioritize some of those based on your values and your sense of how things will play out.
Rob Wiblin
You wrote in your notes that at least from your personal point of view, EA wasn't enough like a religion, or it wasn't as much like a religion as you might personally have liked. Yeah, explain that.
Ajeya Kocha
I think I'm someone that just really benefits from structure and from sort of like emotional motivation, reinforcement. And I also just like, very much tend to like, a little bit socially conform or like, I think I tend to like, try and achieve the ideal of like the community I'm in. And I think the ideal of like sort of my corner of the EA community is sort of like you said, is just like to have a really impactful job and then do a really good job at it and work a lot of hours at it. And so that's like the message you get from the community. And that's like what I'm trying to do. But I think I personally would have liked, yeah, a bit more of a, like, spiritual angle to the community and a bit more like, if you read my colleague Joe Carlsmith's blog, I think I get some of that, like, existential reflection about our morality and our values and this crazy thing that so many EAs believe that in a matter of like a decade or two, we might be in an utterly transformed world that might be like, relative to this vantage point, like utopic or dystopic and just like grappling with that. And like, I think, you know, I think if there had been like an EA church where like every Sunday, like someone who's like, really good and thoughtful about these issues, like, led a discussion, spoken from the, spoke about them and led a discussion about them, I think that would have been like, very enriching for my life and probably ultimately like, made me be higher impact. But that's just not how the EA community is structured. And it's like, it's sort of deliberately not structured that way because it's like the professional community aspect of ea. You really, like, want to not care if people like, believe the deepest teachings and philosophical orientation. You really want to just be like, you're doing great safety research and I don't care about that safety research. So the incentives of a professional community pull against what I might personally want here.
Rob Wiblin
Yeah, it sounds like you think while it might have been more appealing to you, it's not actually necessarily better for things to go in that direction. I mean, I guess for me personally, I kind of like the More professional community, limited aspect of it. Because you kind of just want to be able to go home and not have to think about this stuff all the time or haven't necessarily.
Ajeya Kocha
Whereas I want to go home and think about it in a different way. I mean, I already go home and think about my work all day, so I frequently have insomnia where I think about my work and I just want to be like, instead of thinking about the next Google Doc I need to write or the next email I need to send, like, I would like to be, like, thinking about your work and more spiritual dimensions. Like. Yeah, exactly.
Rob Wiblin
Yeah. Yeah. I guess. I mean, I guess people have a range of views, but I guess it's, like, clear why many people have, like, not embraced that or have been keen for the more, like, let's have a, like, more like a strong, strong division between this sort of thing, which can be, like, very stressful. Yeah.
Ajeya Kocha
And it can be very, like, dangerous and culty and like. I mean, there are a lot of. There are a lot of reasons to worry about it, but I do think there is just a large contingent of EAs that are like me in wanting some sort of spiritual grounding. Like, Joe Carlsmith's blog is, like, extremely popular with hardcore EAs. It's not like a generically popular blog, or it's like, it's like, reasonably popular, but it's just like there are a number of people who are like, oh, wow, this is, like, really nourishing something in me that I didn't realize I needed.
Rob Wiblin
You wonder, there's probably an age thing here a little bit as well. I guess I feel like when I was younger, I noticed that is, you know, the older people were less interested in this. In this aspect of it. And I guess, like, now I'm in the kind of older class and I'm like, well, I have my family to, like, provide nourishment. And it's like that. That's like absorbing a lot of time and energy that I kind of don't have for attending church or whatever else it might be.
Ajeya Kocha
Do you. But I mean, that's kind of interesting is like, do you feel like you had some sort of spiritual hole that was filled specifically by having a child? Or was it. Were you always just, like, not that interested in this?
Rob Wiblin
Yeah, I mean, I think of myself as a deeply unspiritual person, so I think, like, that wasn't really a niche that I needed scratched. I guess earlier on I was maybe more interested in the social scene to, like, make good friends and meet people. I guess having made more friends who I think of as like minded and you know, having a lot of common interests with that's kind of not as interesting either anymore. Yeah, I've already got my friends and now I'm just gonna riot it out.
Ajeya Kocha
I actually think that I thought of myself as an extremely unspiritual person. I had like a lot of disdain for like spirituality when I was 20. And so for me the age thing has gone the other way. Like I think I want more and more of a religion shaped thing in my life as I age. And when I think about why, I think it's because when I was 20 I had like unrealistic aspirations for my worldly projects. Like I like, you know, I was like, I guess by that point I'd already been an EA for like six or seven years. But like I was just starting off like trying to do EA things in the world and I had this like sense that like, you know, this is obviously correct, this is obviously great. Everyone who's good and reasonable will get on board with it and we'll just solve poverty and solve factory farming. I wouldn't have exactly said this stuff but I just had that inner vibe and I would go around being like, have you heard the good word about ea? And I think as I've just done things in the real world, I'm like, everything is very hard and slow and just the feeling of doing my job which involves writing these Google Docs and sending these emails is just not automatically connected to my higher aspirations. And there is a long grind and there's a lot of failure. And so I think I have an increasing demand for some separate thing that is specifically trying to reorient me mentally towards the bigger picture.
Rob Wiblin
Yeah, for me the bottom line there is that working on this stuff can be like quite stressful and quite tiring. And I want to completely check out and stop thinking about it and just be with people and talk about other issues. Like I said, the different strategies for training.
Ajeya Kocha
I think I probably want some of both. I think now I live in a group house with a couple of little kids which is really great and it's good for. But I find unfortunately that it takes a lot to pull my mind away or when I watch tv I'm thinking about other stuff in the background.
Rob Wiblin
So I think during your sabbatical you considered going independent, I guess becoming a writer or researcher, just doing your own thing. But in the end you decided to come back to open Phil, at least for a while. Why was that, yeah.
Ajeya Kocha
So towards the end of the sabbatical, I was planning on taking some time to just start a substack and write about a bunch of stuff, including a lot of the stuff about EA that we were discussing, a lot of stuff about AI and sort of see where it went. And at that time, I honestly didn't have a super strong impact case for this. I think. I didn't think it was crazy that it would be the highest impact thing to do, but the reason I was doing it was just because I just wanted this and not that I could really defend that it was the highest impact thing. But at that moment, after having gone through this whole journey, I was like, yeah, maybe I have more room in my life for making a career decision on the basis of not just impact. The reason I decided to stay was that basically while I was out, Open Phil was conducting a search for a new director to lead our GCR work. So all our AI work and our biorisk work, this was the position Holden was in when he left in 2023. Both of the top two candidates seemed really good to me and I felt like someone new coming in could probably really use help from someone who's not particularly running any given program area, doesn't have a big team to worry about and can just help that person develop context, figure out their strategy. And then it could be an opportunity for me to see if I could get the feeling of plugging in again that I had been missing for a while.
Rob Wiblin
How did it go?
Ajeya Kocha
I think it went really well. Our director of GCRS is Emily Olson, who's also the President of Open Philanthropy. And I've been spending most of the last year, most of this year, 2025, just helping her in various ways, trying to understand, well, what have we funded? What's come of that? What's the AI worldview? What do we think is going to happen with AI? How is that informing our strategy? What are the strategies of the various sub teams? I work really, really well with her and it's like I actually had been lonely at OpenPhill almost the entire time I'd been at OpenPhill, even though it got worse in 2023. Because while Holden was really great at giving me a lot of bandwidth, I'm really grateful for and talking about object level stuff with me. Holden never ran a ship where he was like, I'm doing this bigger project. Can you help me with this piece of it? And here's how it fits in. Holden was always more like a research PI where I was doing my own research project and he would talked to me about it a bunch and who's interested in the results. But it was not integrated into a whole. Integrated into a whole. And Emily really does operate in more of an integrated way where I'm doing stuff and I know she needs to know the answer and is going to do something with it, which is very cool and very novel for me as a way to work. And it's something that I always thought I would want and indeed it's like really, really great. And I think she's an extremely caring and thoughtful manager for me who's really good at eliciting work out of me. And I noticed that I work more than I did right before I went on sabbatical and it feels less hard. So that's just a sign that things are working.
Rob Wiblin
So you're trying to decide what to do next, whether to say Open Phil or go into, I guess something more, I guess less meta and maybe that will allow you to go into even more depth. How are you using the stuff that you've learned about yourself over the last few years to inform that decision?
Ajeya Kocha
Yeah, so I'm talking to, besides OpenPhil, which is still a top candidate, I'm talking to two technical research orgs about potentially finding a fit there. One is Redwood Research, the other is Meter. And Redwood Research works on basically futurism inspired technical AI safety research. And they're best known for pioneering the AI control agenda. And Meter I think of as trying to be like the world's early warning system for intelligence explosion. It's like they're measuring all the different measures we want to be tracking to see if we're on the cusp of AI's rapidly accelerating AI R&D or acquiring other capabilities that let them take over. Both of these missions are very close to my heart. They're both narrower than Open Phil, where I could just, if I wanted just dip my toes in absolutely everything that might help with making AI go well. But then in exchange they would let me go deep in a way that I think would probably be more satisfying for me. All else equal. And in terms of how I'm using what I've learned, I think, I think I just like and this is so cliche and it's something that if a 20 year old version of me were watching this, she'd roll her eyes. But yeah, your extremely local environment, the literal person you're reporting to matters a huge amount and the two or three people you're going to be talking to most in your job or just features like how much are you talking to people in your job versus working on your own can just make a transformative difference. I found it interesting to reflect on. I said all that stuff earlier about how EA has become a lot less transparent and a lot less prioritizing maximal integrity at all costs. And that does still bother me. And actually the moral foundations of ea, I think utilitarian thinking you can go down a long rabbit hole where it is very suspect in many ways. And we talked about this in some previous episodes. But both of those things bother me a lot more when I'm also in a working environment that's locally hard for me.
Rob Wiblin
That's frustrating.
Ajeya Kocha
And it's not like those issues aren't issues, but the salience of those heady big picture things versus extremely micro things. About what does it feel like when you have a one on one with your manager? I think I had been underrating the mundane and the micro in how I had been thinking about my career up to now. And I'm trying to do trials. I'm actually in the middle of a work trial with Meter as we're filming this episode. And that's what I'm paying attention to. How does the rhythm of the work feel? How do the people feel?
Rob Wiblin
Yeah, I guess other generalizable observations are that I guess Open Phil's environment changed over the years. You were there for eight years, but.
Ajeya Kocha
I guess nine years now.
Rob Wiblin
Nine years, right. Yeah. But the kind of constraints that Open Phil was laboring under in 2023 were very different than in 2016. And so unsurprisingly, I guess it might have been a good fit for you to start with, but that doesn't necessarily mean it will be a good fit forever. And also there was a leadership change at OpenPhill. The person you were reporting to changes. And very often when that occurs, you see some other people leave as well because they were in their roles primarily because of their very good working relationship with that person or because they had strategic alignment with that person.
Ajeya Kocha
Absolutely.
Rob Wiblin
And I suppose potentially the CEO changing could have been a trigger for you to think, well, maybe this isn't so great anymore and I should proactively start looking for something else.
Ajeya Kocha
Yeah, I think that's possible. I mean, it sort of was true for me in both directions. I think Holden very much was a huge part of why I like wanted to work at GiveWell rather than work in a number of other potential places or do earning to give like I thought I was going to do at first. And then when He. When he left, that was, like, coincided with, like, a difficult period for me. And now with Emily in the. In the position that he was in before, it's like, again, like, pretty dramatically changed, like, what my work is and how it feels. So it does seem like it's a big transformative thing. And if you're in an organization where there's a leadership change, I think it should probably be a trigger to think about, even if you don't leave, what might be different about your role and your place and what you're doing based on the different style or the different constraints and strengths and weaknesses of new leadership.
Rob Wiblin
It sounds like taking four months off was also a good call that I guess I stopped. You were reasonably unhappy. I guess it could have gotten worse, though, if you hadn't done that and it gave you breathing room to make good decisions.
Ajeya Kocha
Yeah, I think that's right. I'm very glad that I took the sabbatical. I'm also glad that I didn't leave. I think a salient alternative for me at the time that I decided to take four months off was to just leave and figure out what I wanted to do next. And I think it was good both for my impact and for my personal growth and satisfaction that I came back, I helped Emily, and now I'm doing a proper job search, which at the time that I left for my sabbatical, it was more healing and reflecting and not in a focused way, searching for a role.
Rob Wiblin
Yeah. Coming back to effective altruism for a bit, I guess. Yeah. You said we basically almost don't talk about effective altruism on the show anymore. I guess it was like a much bigger feature in the earlier years. I mean, the biggest reason for that, I suppose, is now that we're more AI focused. But AI is an issue that so many people are concerned about, regardless of their broader moral values or broader moral commitments. It just doesn't feel as relevant. You don't have to be concerned about shrimp, or you don't have to be concerned about beings very far away in time to think. It would be really good to do AI technical safety research, or it would be good to think about what governance challenges are going to be created by it. And of course, EA has. It's a controversial idea, and I think actually is at its core, quite a controversial idea. Many people even fully understanding it, I think, would simply not agree with its prescriptions, basically, of how resources ought to be allocated. And why bring along all of that baggage when it's not actually decision relevant for most People, do you think we should we talk about it more or is that kind of just a sensible evolution?
Ajeya Kocha
Yeah, I mean, I think it kind of depends on the show's goals. Like I think my take is that it's correct and good that you don't need to buy into the whole EA package with all of its baggage to worry about misaligned AI taking over the world and to do technical AI safety research to prevent that, to worry about AI driven misuse and to do research and policy to prevent that and to just generally worry about AI disruption and think about that. But I don't think so. I think there should be and there is a healthy, thriving AI is going to be a big deal ecosystem that does not take EA as a premise. But at the same time I think EA thinking and EA values probably do still have a lot to add in the age of AI disruption. I think it's going to be EAs for the most part who are thinking seriously about whether AIs themselves are moral patients and whether they should have protections and rights and how to navigate that thoughtfully against trade offs with safety and other goals. It's going to be eas that by and large are still the ones that take most seriously the possibility that AI disruption can could be so disruptive that we end up locked into a certain set of societal values. We gain the technological ability to shape the future for millions of years or billions of years and are thinking about how that should go. There's a lot of degrees of extremity to the AI worldview. Even if you accept that AI is going to disrupt everything in the next 10 or 20 years, the people who are thinking hardest about the most intense disruptions are going to be disproportionately EAs because sort of EA thinking challenges you to try and engage in that kind of very far seeing rigorous speculation. Even though there's a lot of challenges with that and it's very hard to know the future. I think EAs are the ones that try hardest to peek ahead anyway.
Rob Wiblin
Yeah, I think, I guess digital sentience or like worrying about AIs themselves suffering is a good example, I guess. Yeah. I would definitely make the prediction that effective altruism will loom large in that the group of people working on that, I guess, I mean for someone who's not altruistic, it's a bit or isn't motivated by social impact. It's a bit unclear why you would go into that area. It's not particularly lucrative, it's not at least yet particularly respected. I guess it's not super easy to make progress and I guess it's sufficiently unconventional. I think most people most of the time in their career, they want to do something that's acceptable and that their parents will be proud of. And it's a lot less clear that digital sentience is going to provide you with the kind of esteem or prestige that many people or safety comfort that many people want in a career. So it's maybe natural that people who are altruistically motivated and also, I guess intellectually a bit eclectic, willing to be avant garde, are going to be more.
Ajeya Kocha
Intellectually avant garde, like tolerant of quite a lot of philosophical reasoning and speculation. In a sense. I think this might be what a healthy EA community is. It's an engine that incubates cause areas at a stage when they're not very respected, they're extremely speculative, the methodology isn't firm yet. You kind of just have to be extremely altruistic and extremely willing to do unconventional things and then matures those cause areas to the point where they can stand on their own while also being a thing that many EAs work on. And I think digital sentience and maybe the other things on Will and Tom's list, like space governance and thinking about value lock in and stuff like that, are other candidates for EA to kind of incubate the way it incubated worrying about AI takeover. Basically, yeah.
Rob Wiblin
I feel it less strongly in the case of the value lock in thing because many of the mechanisms there would be just ways that AI ends up. I guess you would get a power grab by people or a power grab by AIs, or somehow it undermines democracy or deliberation in a way that makes it hard for society to adapt over time. And I think people are worried about that, regardless of both people involved in effective altruism and people who would be very skeptical of it.
Ajeya Kocha
I think that there are some versions of the value lock in concern that go through something else overtly scary and bad happening, like one person getting all of the power and that's how that person's values get locked in and that's how we get value lock in. But I think there's a whole spectrum of things that are sort of like almost like social media. It's sort of like in this distributed way, this technology has made us meaner to each other and worse at thinking and has allowed individuals to live in information bubbles of their own creation. You can imagine AIs getting way better at creating a curated information bubble for each individual person that allows them to continue Believing whatever it is they started believing with super intelligent help preventing them from changing their mind. And this might be something you think of as an important social problem for the long run future, even if it doesn't happen via one person getting all the power. Power is still relatively distributed, but large fractions of society are sort of impervious to changing their mind.
Rob Wiblin
It's interesting that in thinking about what is the niche that EA can fill that others won't fill, the thing you were pointing to was not primarily actually altruism. Although I guess that is a factor in terms of going into digital sentience. Perhaps it's actually a research methodology or a research instinct, which is, I guess being willing to be in that very uncomfortable space between just making stuff up and having firm conclusions that you can stand by because you've taken particular measurements. It feels like for some reason that is one of the most distinctive aspects of people who are passionate about effective altruism. Willing to try really hard to make informed speculation about how things will go and neither just have it be a good story nor be too conservative that you're not willing to actually make hard predictions.
Ajeya Kocha
Yeah, absolutely. And I think even the tamest of EA cause areas like global health and development has a huge dose of this. I think if you look at GiveWell's cost effectiveness analysis, they have to grapple with how does the value of doubling one's income if you make a very low amount of money compare to a certain risk of death or like the value of a certain painful disease you could have? And they have to try and get their answers based on surveys and weird studies people have done. It's not very rigorous in the end. And they have to form their judgments and spell out their judgments. And I think the willingness to tackle questions like this and just be like, well here's, here's our answer. And there's a lot to argue with. It's very emblematic of EA organizations, including all the best AI safety, EA organizations like Redwood Research.
Rob Wiblin
Yeah, I guess more standard ways to approach those questions would be to just pick one slightly arbitrarily and then be really committed to it or to be kind of irritated at being asked the question and say that there's absolutely no way of knowing or there's no fact of the matter here whatsoever. And I guess trying to be somewhere, I guess I don't know whether it's somewhere in the middle, but yeah, effective autism.
Ajeya Kocha
And within EA there's kind of like a spectrum in terms of like where in the middle you want to land where like, some, you know, everyone's kind of looking at the person more speculative than them and thinking that they're sort of like, yeah, they're just like building castles on sand and this is not the way to like, you know, do things. And they're looking at people less speculative than them and they're thinking like, you know, they're just the streetlight effect and like, they're just ignoring the most important considerations and like, not working the most most important area.
Rob Wiblin
Yeah. So I guess for people who do have that mindset, I suppose an important message would be that people should take advantage of the fact that they have this unique mentality or this reasonably rare mentality and go into roles that other people probably won't fill because they feel too uncomfortable. Or at least I guess they could just reasonably think it's misguided, but other people aren't necessarily going to do this stuff.
Ajeya Kocha
Yeah, I think that's right. And I think it's interesting to think about. If you imagine EA as a, as one piece of the world's response to crazy changes like AI, there's actually a case that EA should be heavily indexed on research. I think the community has gone back and forth with how it thinks about this. And I think at first people are just like, naturally attracted to research stuff. So there was a huge glut of people who wanted to be researchers and then there was a big push, including from ADK and others that know consider operations roles and policy roles and other things that aren't just research. And I think that was a good move at the time. But I wonder if we think about what is EA's comparative advantage relative to the world. Maybe that suggests that some of the people who are doing operations and doing policy, but maybe in their hearts just want to be like a weird truth teller thinking speculative thoughts, should consider going back and doing that again.
Rob Wiblin
My guest today has been Ajeya Kocha. Thanks so much for coming on the 80,000 Hours podcast.
Ajeya Kocha
Again, Ajeya, thanks so much for having me.
80,000 Hours Podcast
Episode: Every AI Company's Safety Plan is 'Use AI to Make AI Safe'. Is That Crazy? | Ajeya Cotra
Date: February 17, 2026
Host(s): Rob Wiblin, Luisa Rodriguez
Guest: Ajeya Cotra (Senior Advisor, Open Philanthropy)
This episode features an in-depth conversation with Ajeya Cotra about the dominant AI safety plan at major AI labs: using increasingly capable AI systems themselves as integral tools for ensuring future AI systems remain safe and aligned. The discussion explores the logic and risks behind this strategy, the vast disagreements about the pace and impact of AI progress, transparency requirements for labs, and what governments, philanthropies, and individuals can do to prepare for transformative AI. The latter half is a candid reflection on Ajeya's career in effective altruism, management, burnout, and the maturity of the EA movement.
AI Labs’ Stated Plans:
“In all of their stated safety plans, you see this element of as AIs get better and better, they're going to incorporate the AIs themselves into their safety plans more and more.” (Ajeya, [00:00])
Crucial Safety Bottleneck:
“It either like bottlenecks our progress because we're checking on everything all the time …. or it doesn't bottleneck our progress, but we hand the AIs the power to take over.” (Ajeya, [00:32]; repeated at [56:59])
Definition Drift:
“VCs have an instinct to call something AGI that is like GPT5 … something just much milder.” (Ajeya, [04:11])
Spectrum of Predictions:
“It's an almost unfathomable degree of disagreement among people … a 10,000 fold disagreement” (Rob, [14:10])
Error Theories & Priors:
“Every time … they've always been wrong … you have this strong prior that … someone could have made the same argument about television … about computers. None of these played out.” (Ajeya, [21:14])
Empirical Heads-Up?
Transparency Requirements:
“I would really like them to be reporting their most concerning misalignment related safety incidents. … But then of course it's clear that reporting that is very embarrassing to companies.” (Ajeya, [38:50])
Barriers:
“It's more like we want this information out there in the open and then we want people to do ... involved analyses of it. … I think it would be very hard for [regulators alone] to interpret the evidence quickly enough.” (Ajeya, [41:03])
Redirecting AI Labor:
Window of Opportunity:
“There exists a window of opportunity before AIs are uncontrollably powerful or have created unacceptable levels of risk, where they are really capable and really change the game for AI safety research.” (Ajeya, [63:05])
Practicality and Limitations:
Ajeya's Perspective:
Barriers:
“You get to a point where one company … keeps its internal best systems to itself and only releases systems that are considerably worse than its internal frontier.” (Ajeya, [90:44])
Call to Action:
Career Trajectory:
“I felt like, kind of lonely … I just really have a high need for like constantly talking to other people … I was not very good at hiring and management.” (Ajeya, [122:58])
Burnout & Sabbatical:
Return & Future:
Reflections on EA:
“A big part of my motivation came from … intellectual depth and this like crazy high level of openness, transparency, like having absolutely nothing to hide.” (Ajeya, [140:52])
“There’s a pretty good chance that by 2050 the world will look as different from today as today does from the hunter gatherer era … 10,000 years of progress rather than 25 years of progress.”
— Ajeya Cotra ([05:49])
“It's an almost unfathomable degree of disagreement among people … they've spoken about this, they've shared their reasons and they don't change their mind and they disagree by a thousand fold.”
— Rob Wiblin ([14:10])
“My instinct is to just do more of that and to just say more and respond. But it's harder to do that from Open Phil's position for a number of reasons.”
— Ajeya Cotra ([143:32])
“Anything that requires a large corporation to be super discontinuous in something it's doing is facing big headwinds as a plan. So I would hope that they're sort of smoothly increasing the amount of internal inference compute that is going towards safety as the AIs get better and better …”
— Ajeya Cotra ([82:11])
“I think I was, to some extent, kidding myself about how much of my own motivation … came from just the goals ... the latter two things [intellectual depth, transparency] were actually really important for my motivation and they were … over time just like smaller and smaller features of what it was like to do EA.”
— Ajeya Cotra ([140:52])
End of summary. For a deep dive into any section, refer to provided timestamps above.