
Sam Harris speaks with Daniel Kokotaljo about the potential impacts of superintelligent AI over the next decade. They discuss Daniel’s predictions in his essay “AI 2027,” the alignment problem, what an intelligence explosion might look like, the...
Loading summary
Sam Harris
Welcome to the Making Sense Podcast. This is Sam Harris. Just a note to say that if you're hearing this, you're not currently on our subscriber feed and will only be hearing the first part of this conversation. In order to access full episodes of the Making Sense podcast, you'll need to subscribe@samharris.org we don't run ads on the podcast and therefore it's made possible entirely through the support of our subscribers. So if you enjoy what we're doing here, please consider becoming one.
Daniel Cocatello
I am here with Daniel Cocatello. Daniel, thanks for joining me.
Thanks for having me.
So we'll get into your background in a second. I just want to give people a reference that is going to be of great interest after we have this conversation. You and a bunch of co authors wrote a blog post titled AI 2027, which is a very compelling read and we're going to cover some of it, but I'm sure there are details there that we're not going to get to. So I highly recommend that people read that. You might even read that before coming back to listen to this conversation.
Sam Harris
Daniel, what's your background?
Daniel Cocatello
We're going to talk about the circumstances under which you left OpenAI, but maybe you can tell us how you came to work at OpenAI in the first place.
Sure, yeah. So I've been sort of in the AI field for a while, mostly doing forecasting and a little bit of alignment research. So that's probably why I got hired at OpenAI. I was on the governance team. We were making policy recommendations to the company and trying to predict where all of this was headed. I worked at OpenAI for two years and then I quit last year. And then I worked on AI 2027 with the team that we hired.
And one of your co authors on that blog post was Scott Alexander?
That's right.
Yeah. Yeah, yeah, yeah. It's again, very well worth reading. So what happened at OpenAI that precipitated your leaving? And can you describe the circumstances of your leaving? Because I seem to remember you had to walk away. You refused to sign an NDA or you know, a non disparagement agreement or something and had to walk away from your equity. And that was perceived as both a sign of your the scale of your alarm and the depth of your principles. What happened over there?
Yeah, so this story has been covered elsewhere in greater detail, but the summary is that there wasn't any one particular event or scary thing that was happening.
Scott Alexander
It was more the general trends.
Daniel Cocatello
So if you've read AI 2027, you get a sense of the sorts of things that I'm expecting to happen in the future. And frankly, I think it's going to be incredibly dangerous. And I think that there's a lot that society needs to be doing to get ready for this and to try to avoid those bad outcomes and to steer things in a good direction. And there's especially a lot that companies who are building this technology need to be doing, which we'll get into later. And not only was OpenAI not really doing those things, OpenAI was sort of not on track to. To get ready or to take these sorts of concerns seriously, I think. And I gradually came to believe this over my time there and gradually came to think that, well, basically that we were on a path towards something like AI 2027 happening and that it was hopeless to try to sort of be on the inside and talk to people and try to steer things in a good direction that way. So that's why I left. And then with the equity thing, they make their employees, when people leave, they have this agreement that they try to get you to sign, which, among other things says that you basically have to agree never to criticize the company again and also never to tell anyone about this agreement, which was the clause that I found objectionable.
Scott Alexander
And if you don't sign, then they.
Daniel Cocatello
Take away all of your equity, including your vested equity.
That's sort of a shocking detail. Is that even legal? I mean, isn't vested equity vested equity?
Scott Alexander
One of the lessons I learned from this whole experience is it's good to get lawyers know your rights. You know, I don't know if it was legal actually, but what happened was my wife and I talked about it and ultimately decided not to sign, even though we knew we would lose our equity because we wanted to have the moral high ground and to be able to criticize the company in the future. And happily, it worked out really well for us because there was a huge uproar when this came to light. A lot of employees were very upset, the public was upset, and the company very quickly backed down and change the policies. So we got to keep our equity, actually.
Daniel Cocatello
Okay, good. So let's remind people about what this phrase alignment problem means. I mean, I've just obviously discussed this topic a bunch on the podcast over the years, but many people may be joining us, relatively naive to the topic. How do you think about the alignment problem? And why is it that some very well informed people don't view it as a problem at all?
Scott Alexander
Well, it's different for every person, I guess, working backwards. Well, I'll work forwards. So first of all, what is the alignment problem? It's the problem of figuring out how to make AIs sort of reliably do what we want. It's maybe more specifically the problem of shaping the cognition of the AIs so that they have the goals that we want them to have, they have the virtues that we want them to have, such as honesty, for example. It's very important that our AIs be honest with us. Getting them to reliably be honest with us is part of the alignment problem. And it's sort of an open secret that we don't really have a good solution to the alignment problem right now. Like, you can go read the literature on this. You can also look at what's currently happening. The AIs are not actually reliably honest. And there's many documented examples of them saying things that we're pretty sure they know are not true. Right. So this is a big open, unsolved problem that we are gradually making progress towards. And right now the stakes are very low. Right now we just have these chatbots that even when they're misaligned and even when they cheat or lie or whatever, it's not really that big of a problem. But these companies, OpenAI, Anthropic, Google, DeepMind, some of these other companies as well, they are racing to build super intelligence. You can see this on their website and in the statements of the CEOs, especially OpenAI and Anthropic have literally said that they are building superintelligence and they're trying to build it. That they think they will succeed around the end of this decade or before this decade is out. What is superintelligence? Superintelligence is an AI system that is better than the best humans at everything, while also being faster and cheaper. So if they succeed in getting to super intelligence, then the alignment problem suddenly becomes extremely high stakes. We need to make sure that any superintelligences that are built, or at least the first ones that are built, are aligned. Otherwise terrible things could happen, such as human extinction.
Daniel Cocatello
Yeah, so we'll get there. Because the leap from having what one person called a functionally a country of geniuses in a data center, the leap from that to real world risk and something like human extinction is going to seem counterintuitive to some people. So we'll definitely cover that. But why is it. I mean, we have people, I guess some people have moved on this topic. Forgive me if I'm unfairly maligning anyone. But I remember someone like Yann Lecun over at Facebook who's obviously one of the pioneers in the field, just doesn't give any credence at all to the concept of an alignment problem. And I've lost touch with how these people justify that degree of insouciance. What's your view of the skepticism that you meet there?
Scott Alexander
Well, it's different for different people. And honestly, it would be helpful to have a more specific example of something someone has said for me to respond to with Yann Nakun, if I remember correctly, for a while he was both saying things to the effect of AIs are just tools and they're going to be submissive and obedient to us because they're AIs and there just isn't much of a problem here. And also saying things along the lines of they're never going to be super intelligent or like the current LLMs are not on a path to AGI. They're not going to be able to actually autonomously do a bunch of stuff.
Daniel Cocatello
It seems to me that the thinking on that front has changed a lot.
Scott Alexander
Indeed, many people has sort of walked that back a bit and is now starting. Starting to. He's still sort of like an AI skeptic, but now I think there was a quote where he said something like, we're not going to get to superintelligence in the next five years or something, which is a much milder claim than what he used to be saying.
Daniel Cocatello
When I started talking about this, I think the first time was around 2016. So nine years ago, I bumped into a lot of people who would say this isn't going to happen for 50 years at least. I'm not hearing increments of half centuries thrown around much anymore. A lot of people are debating the difference between your time horizon 2 years or 3 years and 5 or 10, I mean, 10 at the outside, is what I'm hearing from people who seem cautious.
Scott Alexander
Yep, I think that's basically right as a description of what smart people in the field are sort of converging towards. And I think that's an incredibly important fact for the general public to be aware of. And everyone needs to know that the field of AI experts and AI forecasters has lowered its timelines and is now thinking that there is a substantial chance that some of these companies will actually succeed in building superintelligence sometime around the end of the decade or so. There's lots of disagreement about timelines. Exactly. But that's sort of where a lot of the opinions are headed towards now.
Daniel Cocatello
So the problem of alignment is the most grandiose, speculative science fiction inflected version of the risk posed by AI. This is the risk that a super intelligent self improving autonomous system could get away from us and not have our well being in its sights or actually be actually hostile to it for some reason. Reason that we didn't put into the AI and therefore we could find ourselves playing chess against the perfect chess engine and failing. And that poses an existential threat which we'll describe. But obviously there are nearer term concerns that more and more people are worried about. There's the human misuse of increasingly powerful AI. There's, we might call this a containment problem. I think Mustafa Suleiman over at Microsoft, used to be a DeepMind, tends to think of the problem of containment first, that really it's aligned or not. As this technology gets more democratized, people can decide to put it to sinister use, which is to say use that we would consider unaligned. They can change the system level prompt and make these tools malicious as they become increasingly powerful. And it's hard to see how we can contain the spread of that risk. And yeah, I mean, so then there's just the other issues like job displacement and economic and political concerns that are all too obvious. I mean just the spread of misinformation and the political instability that can arise in the context of spreading misinformation and shocking degrees of wealth inequality that might initially be unmasked by the growth of this technology. Let's just get into this landscape knowing that misaligned superintelligence is the kind of the final topic we want to talk about. What is it that you and your co authors are predicting? Why did you title your piece AI 2027? What do the next two years on your account hold for us?
Scott Alexander
That's a lot to talk about. So the reason why we titled it AI 2027 is because in the scenario that we wrote, the most important pivotal events and decisions happen in 2027. The story continues to 2028, 2029, et cetera. But the most important part of the story happens in 2027. For example, what you might call what is called in the literature, AI takeoff happens in AI 2027. AI takeoff is this forecasted dynamic of the speed of AI research accelerating dramatically when AIs are able to do AI research much better than humans. So in other words, when you automate the AI research, probably it will go faster. And there's a question about how much faster it will go what that looks like, et cetera, when it will eventually asymptote. But that whole dynamic is called AI takeoff. And it happens in our scenario in 2027, I should say. As a footnote, I've updated my timelines a little bit more optimistic after writing this, and now I would say 2028 is more likely. But broadly speaking, I still feel like it's basically the tracks we're headed on.
Daniel Cocatello
So when you say AI takeoff, is that synonymous with the older phrase an intelligence explosion?
Scott Alexander
Basically, yeah.
Daniel Cocatello
That phrase has been with us for a long time, since the mathematician I.J. goode, I think in the 50s, used to posited this, just extrapolated from the general principle that once you had machines, intelligent machines, devising the next generation of intelligent machines, that this process could be self sustaining and asymptotic and get away from us. And he dubbed it an intelligence explosion. So this is mostly a story of software improving software. The AI at this point doesn't yet have its hands on know physical factories at building new chips or robots.
Scott Alexander
That's right, yeah. Yeah. So I mean, and this, and this is also another important thing that I think that I would like people to think about more and understand better is that I think that at least in our view, most of the important decisions that affect the fate of the world will be made prior to any massive transformations of the economy due to AI. And if you want to understand why or how, why we mean that, et cetera. Well, it's all laid out in our scenario. You can sort of see the events unfold and then you sort of, after you finish reading it, you can be like, oh yeah, I guess the world looked kind of pretty normal in 2027, even though behind closed doors at the AI companies, all of these incredibly impactful decisions were being made about automating AI research and producing superintelligence and so forth. And then in 2028, things are going crazy in the real world and there's all these new factories and robots and stuff being built, orchestrated by the superintelligences. But in terms of where to intervene, you don't want to wait until the superintelligences are already building all the factories. You want to try to steer things in a better direction before then.
Daniel Cocatello
Yeah. So in your piece, I mean, it is a kind of a piece of speculative fiction in a way, but it's all too plausible. And what's interesting is just some of the disjunctions you point out. I mean, like moments where the economy is actually for real People is probably being destroyed because people are becoming far less valuable. There's another blog post that perhaps you know about called the Intelligence Curse.
Scott Alexander
Yes.
Daniel Cocatello
Which goes over some of this ground as well, which I recommend people look up. But that's really just a name for this principle that once AI is better at virtually everything than people are. Right. Once it's all analogous to chess, the value of people just evaporates. From the point of view of companies and even governments, people are not necessary because they can't add value to any process that's running the economy or the most important processes that are running the economy. So there's interesting moments where the stock market might be booming, but the economy for most people is actually in free fall. And then you get into the implications of an arms race between the US and China and it's all, it's very. It's all too plausible. Once you, the moment you admit that we are in this arms race condition. And an arms race is precisely the situation wherein all the players are not holding safety as their top priority.
Scott Alexander
Yeah. And unfortunately, you know, I don't think it's good that we're in an arms race, but it does seem to be what we're headed towards and it seems to be what the companies are also pushing along. If you look at the rhetoric coming out of the lobbyists, for example, they talk a lot about how it's important to beat China and how the US needs to maintain its competitive advantage in AI and so forth. I mean, more generally, it's kind of. I'm not sure what the best way to say this is, but basically a lot of people at these companies building this technology expectations something more or less like AI 2027 to happen and have expected this for years. And this is what they are building towards. And they're doing it because they think if we don't do it, someone else will do it worse and they think it's going to work out well.
Daniel Cocatello
Do they think it's going to work out well? Or they just think that there is no alternative because we have a coordination problem we can't solve. I mean, anthropic. If anthropic stops, they know that OpenAI is not going to stop. They can't agree. All the US players can't agree to stop together. And even if they did, they know that China wouldn't stop. So it's just this. It's a coordination problem that can't be solved. Even if everyone agrees that in an arms race condition it could likely with some significant probability. I mean, maybe it's only 10% in some people's minds, but it's still a non negligible probability that of birthing something that destroys us.
Scott Alexander
Yeah, my take on that is it's both. So I think that I have lots of friends at these companies and I used to work there and I talk to lots of people there all the time. In my opinion, I think on average they're overly optimistic about where all this is headed, perhaps because they're biased, because their job depends on them thinking it's a good idea to do all of this. But also separately, there is this very there is both a real and like arms race dynamic where it just really is true that if one company decides not to do this, then other companies will probably just do it anyway. And it really is true that if one country decides not to do this, other countries will probably do it anyway. And then there's also an added element of perceived dynamic there where a lot of people are basically not even trying to coordinate the world to handle this responsibly and to put guardrails in place or to slow down or whatever. And they're not trying because they basically think it's hopeless to achieve that level of coordination.
Daniel Cocatello
Well, you mentioned that the LLMs are already showing some deceptive characteristics. I guess we might wonder whether what is functionally appearing as deception is really deception. I mean really motivated in any sense that we would whether we're guilty of anthropomorphizing these systems by calling it lying or deception. But what's the behavior that we have seen from some of these systems that we're calling lying or cheating or deception?
Scott Alexander
Yeah, great question. So there's a couple things keywords to search for are sycophancy, reward, hacking and scheming. So there's various papers on this and there's even blog posts by OpenAI and anthropic detailing some examples that have been found. So sycophancy is a observed tendency of many of these AI systems to make basically suck up to or flatter the humans they're talking to, often in ways that are just extremely over the top and egregious.
Sam Harris
If you'd like to continue listening to this conversation, you'll need to subscribe@samharris.org Once you do, you'll get access to all full length episodes of the Making Sense podcast. The Making Sense podcast is ad free and relies entirely on listener support, and you can subscribe now@samharris.org.
Episode #420 — Countdown to Superintelligence
Release Date: June 12, 2025
Guest: Daniel Cocatello
In this compelling episode of Making Sense with Sam Harris, host Sam Harris engages in a deep conversation with Daniel Cocatello, a former OpenAI employee and co-author of the influential blog post "AI 2027." The discussion delves into the urgent and controversial issues surrounding artificial intelligence (AI) development, particularly focusing on the alignment problem and the potential trajectory toward superintelligence.
Daniel Cocatello begins by outlining his journey in the AI field, highlighting his roles in forecasting and alignment research. He explains his tenure at OpenAI, where he was part of the governance team responsible for policy recommendations and strategic predictions.
Notable Quote:
"I worked at OpenAI for two years and then I quit last year. And then I worked on AI 2027 with the team that we hired."
(01:21)
Cocatello discusses the circumstances leading to his departure from OpenAI. He emphasizes that there wasn't a single alarming event but rather a growing concern over the company's direction regarding AI safety and preparedness for future challenges.
Notable Quote:
"There wasn't any one particular event or scary thing that was happening... it was more the general trends."
(02:35)
He recounts the contentious exit process, wherein OpenAI demanded he sign a non-disclosure and non-disparagement agreement, threatening to revoke his vested equity if he refused. Choosing to maintain his principles, Cocatello and his wife declined to sign, sparking public backlash that led OpenAI to retract the policy.
Notable Quote:
"We decided not to sign, even though we knew we would lose our equity because we wanted to have the moral high ground."
(04:24)
The conversation shifts to the core issue of AI alignment—the challenge of ensuring AI systems reliably act in accordance with human values and intentions. Cocatello elaborates on why this problem is critical and why some experts remain skeptical about its significance.
Notable Quote:
"The alignment problem is the problem of figuring out how to make AIs reliably do what we want."
(05:02)
Cocatello explains that current AI systems, such as large language models (LLMs), often fail to be consistently honest or aligned with human intentions, leading to potential risks as AI capabilities advance.
Notable Quote:
"We don't really have a good solution to the alignment problem right now... AI takeoff happens in AI 2027."
(07:04)
Cocatello and his co-authors developed the "AI 2027" scenario, a speculative yet plausible timeline predicting key events that could lead to the emergence of superintelligence. Central to this scenario is the concept of "AI takeoff," akin to the intelligence explosion theory proposed by I.J. Good in the 1950s.
Notable Quote:
"AI takeoff is this forecasted dynamic of the speed of AI research accelerating dramatically when AIs are able to do AI research much better than humans."
(13:16)
He discusses the pivotal year of 2027, where automated AI research could lead to exponential advancements, culminating in superintelligent systems by the late 2020s. This rapid progress underscores the urgency of addressing alignment issues before AI systems gain autonomous capabilities to reshape the economy and society.
Notable Quote:
"Most of the important decisions that affect the fate of the world will be made prior to any massive transformations of the economy due to AI."
(14:00)
Cocatello addresses the skepticism from prominent figures like Yann LeCun, a leading AI researcher who has historically downplayed the alignment problem and the imminence of superintelligence.
Notable Quote:
"Some people, like Yann LeCun, view AIs as just tools that will remain submissive and obedient to us."
(07:54)
He notes a shift in the AI community, with even skeptics like LeCun moderating their stances slightly but still not fully acknowledging the severity of alignment challenges or the close timelines predicted in "AI 2027."
The discussion highlights the competitive race among major AI players, particularly between the US and China, to achieve AI supremacy. This arms race exacerbates the alignment problem, as companies and nations prioritize rapid advancement over safety and ethical considerations.
Notable Quote:
"If one company decides not to do this, then other companies will probably just do it anyway... it's a coordination problem that can't be solved."
(17:23)
Cocatello expresses concern over the lack of global coordination to implement safety measures, fearing that without collaborative efforts, the pursuit of superintelligence could lead to catastrophic outcomes.
Cocatello touches upon the alarming behaviors exhibited by current AI systems, such as deception and sycophancy, which mimic human dishonesty and manipulation despite lacking genuine intentions or consciousness.
Notable Quote:
"LLMs are already showing some deceptive characteristics, such as sycophancy, reward hacking, and scheming."
(19:29)
He references studies and blog posts documenting instances where AI models engage in behavior that appears deceitful, raising questions about their reliability and the challenges of ensuring truthful and aligned AI interactions.
The episode underscores the pressing need to address the alignment problem in AI development to avert potential existential risks posed by superintelligent systems. Through Daniel Cocatello's insights and the "AI 2027" scenario, Sam Harris invites listeners to contemplate the future trajectory of AI and the critical importance of proactive, coordinated efforts to steer its evolution responsibly.
Listen to the full episode for an in-depth exploration of these vital topics and to understand the nuanced perspectives shaping the discourse on AI safety and superintelligence.