Summary6 min read

Deep Questions with Cal Newport

Episode: Is AI About to “Eat Everything”? | AI Reality Check

Date: May 14, 2026

Episode Overview

In this AI-focused reality check episode, Cal Newport examines the widely-shared "Meter AI Time Horizon" chart and the hysteria it provoked online, dissecting what the chart actually measures, what its rapid recent leap in scores really means, and how to make sense of AI progress without succumbing to hype. Cal brings clarity to how AI coding capabilities are being benchmarked, the dynamics behind their recent improvements, and why headlines about AI "eating everything" misrepresent both the chart and the actual state of artificial intelligence. He calls for a measured approach to both AI technology and AI discourse—one that is informed, skeptical of hype, and not beholden to what he terms "exponential-worshipping cults."

Key Discussion Points & Insights

1. The Meter AI Time Horizon Chart: What Is It Really Showing?

(00:00–15:00)

The Meter (M-E-T-R) organization released an updated "AI Time Horizon" chart, showing a dramatic increase in 2025–2026.
Social media exploded with apocalyptic and utopian takes: “AI power is doubling every 103 days now. It's going to eat everything. Nothing will be spared.” (Cal quoting tweets, ~01:30)
Actual methodology of the chart:
- Meter selects clear "software tasks" (such as bug fixes or exploiting vulnerabilities).
- Human programmers are timed doing these tasks, average time is the “difficulty rating.”
- LLMs (like Claude, Opus, GPT) paired with a coding harness are tested to see the hardest (longest-duration) tasks they can complete at least 50% (or 80%) of the time.
Key Quote:

“It is a mistake...to think that when you see Opus 4.6 labeled with 12 hours on this plot, that means Opus 4.6 can now do whatever it would take a human 12 hours to do.”
(Cal, 13:40)
It measures progress in a narrow suite of programming tasks, not “general intelligence.”

2. Interpreting Task Duration: What Does ‘12 Hours’ Mean?

(15:00–18:30)

Task difficulty ratings are abstract: they reflect how long it takes a collection of (often inexperienced) humans to complete a task for the first time.
Cal reads from Meter’s notes:

"An eight-hour time horizon does not mean that AIs can do eight hours of work that a high-context human professional can do... it's what a low-context person, such as a new hire or remote Internet contractor, can accomplish."
(Cal, reading Meter notes, 16:30)
Don’t over-interpret the exact time numbers; treat them as a proxy for relative task difficulty.

3. Why Did the Chart Suddenly Jump? The Shift from Pre-Training to Post-Training & Harnesses

(18:30–32:00)

For a long period, progress was flat: large language models (LLMs) improved with more data and compute, but hit a wall in 2024.
Shift in 2024–2025:
- Moved from general “pre-training” (making models better at text generally) to “post-training” (tuning models on narrow datasets for specific problems).
- Focused on coding and reasoning tasks.
- Real breakthrough: building sophisticated coding harnesses—complex tools with hand-coded logic, pattern recognition, and calls to external tools, built by expert programmers.
  - Example: leaked code from Anthropic’s Claude harness showed “a ton of hand-coded... pattern recognition, giant if-then statements... old-fashioned 1960s style AI logic.” (Cal, 27:30)
Combined effect:
- A tuned LLM + expert-system-style harness = powerful progress on programming benchmarks.
Key Quote:

“The story of the exponential leap here... is the story of these incredibly complicated, hand-coded, 1960s AI style coding harnesses that they put on top of it.”
(Cal, 28:50)
The last two years saw focused, labor-intensive engineering—NOT a sudden, general leap in intelligence.

4. The "AI Eating Everything" Hype—And What People Get Wrong

(32:00–40:00)

Tweets and Internet reactions extrapolated a general intelligence explosion from the narrow progress on this benchmark.
Cites mental models:
- Wrong model: AI is like water, evenly rising across all problem “mountains” (from Max Tegmark’s Life 3.0).
- Right model: AI progress is a river with tributaries; some areas (like programming) are navigable, others are not.
Key Quote:

“One tributary being navigable doesn't necessarily tell you anything about another unrelated tributary... It took two years of concerted effort with experts at computer programming to build harnesses for computer programming. And you don't know how far it's going to go before you hit a dead end.”
(Cal, 36:08)
Cites the EPOCH Capabilities Index (ECI): general abilities (across various domains) have increased only linearly, not exponentially, even as programming milestones jump.

5. Understanding Community Influence and AI Hype

(40:00–49:00)

Hype often comes from transhumanist communities, influenced by “exponential-worship” and Ray Kurzweil’s predictions.
- Their outlook is “eschatological,” focusing on transcendence or doom via technological singularity.
These worldviews have been influential in both AI company messaging and public perception, increasing anxiety and distorting the conversation.
Key Quote:

“Their religious cult is one of exponentials delivering transcendence or destruction... It's eschatological, right? There's heaven, there's hell, and a giant event is going to happen... The transhumanists love this story.”
(Cal, 43:50)

6. Cal’s Plea: Distance Tool-Building from Cult-Like Exponential Narratives

(49:00–end)

AI should be discussed like any other useful tool—not as the harbinger of utopia or apocalypse.
- Analogy: Don’t see the electric car and imagine cars soon going at the speed of light.
AI companies need to publicly distance themselves from the extreme transhumanists and existential risk evangelists.
Key Quote:

“We need Addario Amadei or Sam Altman to look at these: ‘AI is going to eat everything’, ‘the aliens are here, whatever’, and say ‘that's not us. That's kooky. That doesn't represent us. We're trying to build useful tools.’”
(Cal, 51:45)
Focus on specific, grounded benefits and limitations of actual AI tools.

Notable Quotes & Memorable Moments

On chart misinterpretation:

“It does create a strong sense of digital ick.” (Cal, 01:00)
On programming harnesses:

“There is a lot of old-fashioned 1960s style AI logic built into this coding harness... It's as if we went back to the expert systems era but mashed it up with LLMs.” (27:30)
On hype and sanity:

“When someone showed me the first useful electric car, we were able to say, ‘Oh, that's cool...’ Why do we have to go crazy?” (Cal, 50:10)
Parting words:

“Take AI seriously, but not everything that people say about it.” (Cal, end)

Key Timestamps

00:00–05:00: Introduction to Meter chart and viral Twitter reactions
05:00–15:00: Deep dive into how the chart is constructed and what it tracks
15:00–18:30: Discussing “task duration” meanings and limits of the benchmark
18:30–32:00: Why AI coding capabilities suddenly advanced (pre-training vs post-training; rise of harnesses)
32:00–40:00: The flawed “water rising” model vs. “river/tributary” model of AI progress; comparison with general capabilities
40:00–49:00: Role of transhumanist/existential risk communities in shaping the narrative around AI
49:00–end: Cal’s plea for more sane, useful, and demystified discourse on AI tools

Conclusion / Takeaway

The explosion in the Meter chart reflects impressive, but targeted, engineering in AI-driven coding tools—not an imminent “intelligence explosion” or existential risk.
Treat AI tools as powerful but specific technology advances, not as universal agents of transformation or doom.
“Take AI seriously, but not everything that people say about it.” (Cal, end)

(This summary skips the podcast’s advertisements, intros, and outros, and focuses solely on the episode’s core discussion.)

Loading summary

Transcript1 lines

[00:01]
A
Last week, the AI Safety and Evaluation Organization Meter, that's M E T R, released a new update on their famous AI Time horizon chart. Look, I'm going to load it on the screen here for people who are watching. And when you zoom in, you can see these points on this chart starting around 2025 begin to go up. And then when we get to 2026, they go way up. And then in the last update, go way up again. Now, this graph looks scary even if you don't know what it means. It does create a strong sense of digital ick. And as you can imagine, the Internet jumped into action to try to amplify that uneasy feeling. Now, in a recent essay posted to his newsletter, Gary Marcus did a good job of rounding up some of the more, shall we say, concerned responses to this latest update to Meter's latest graph. Let me show you a couple here. Here's one. A tweet that said AI power is doubling every 103 days now. It's going to eat everything. Nothing will be spared. We are on the threshold of truly ergotic alien intelligences in which human input will be nothing but a liability. All right, here's another example that Gary pointed out. The tweet simply says TikTok. It has a expertly drawn graph that shows highest intelligence on earth by time. And you see there's a point where it goes up, up, crosses a tripwire, and then shoots straight up where human brains become smart enough to create asi, which is artificial superintelligence. Then below it is a version of that time horizon graphs. And they're like, look this, doesn't that look similar? The line goes up. The line goes up. So I guess we're about to have artificial superintelligence conquer the world. Now, many more tweets out there in response to this time horizon update. They all give you the same sense that this meter chart is capturing an intelligence explosion that A, we're not ready for, B, that will change everything, and C, that vindicates every bold or crazy thing anyone has ever claimed about AI's and its capabilities. But is this right? Well, it's Thursday, which means it's time for an AI reality check episode of this show, which seems like a perfect time to look closer at what exactly the meter time horizon chart is showing and what exactly that means. As always, I'm Cal Newport, and this is Deep Questions, the show for people seeking depth in a distracted world. So the first question we want to ask here is, what is it exactly that the meter time horizon chart Is actually showing. All right, so I spent time reading about it. The good news is meter actually is very transparent. They publish very detailed collections of notes describing their methodology and what goes into their chart. So it was actually quite a pleasure to get answers to these questions. So what are they actually showing on this chart? Well, here's what they did. They came up with a collection of what they call software tasks. These are well defined challenges that you can solve by writing and or analyzing computer code. All right, then for each of these tasks, they went out and asked a collection of human programmers to go do the task. Hey, go do this. I believe the instruction was as quickly as you can. And then they asked them, how long did it take you to complete this task? They would then take the geometric mean, so they would average those answers. And whatever the mean was, whatever the average was, is the human time duration that they would label that task with. So if it, you know, on average took people two hours to complete a various tasks, they would say, this is a two hour tasks. They then said, let's evaluate different large language model based tools on these tasks. Now, of course, a given large language model can't do anything except spit out tokens. So they would take each large language model and combine it with a. They call it a scaffold. But what we would call today also a coding harness. So a program that can call the LLM to try to solve programming challenges. It's like claude code or cursor or codecs. These are all coding harnesses. So the coding harness, when you give it the problem, for example, will query an LLM and say, give me a plan for tackling this problem. The coding harness will then go step by step through that plan. It will call the LLM when it needs code generated. But the harness could also do a lot of stuff on its own, like do checks. It knows how to interact with various software tools that are relevant for creating software. It can go back and verify, hey, did this step really work? The coding harnesses have gotten pretty complicated. More on that later. So I'll take an LLM with a coding harness and one by one they'll ask it to do each of these tasks. And they actually have it do each of these tasks six times. And if it completes the task at least half the time, they say, okay, this model plus this harness can tackle that task. And they keep going until it gets stuck. So they say, what was the longest duration task that this LLM plus a coding harness was able to complete? At least 50% of the time. And that's what they plotted. So let's go back now to this plot to make that a little bit more clear. This is a little bit confusing sometimes. So we see on the Y axis here the duration that various tasks are labeled with. Right. So Fix Bugs in a small Python library was labeled a little bit more than an hour. That's how long it took the humans who did it, who completed the task to actually finish it, exploit a buffer overflow that took a little bit more than two hours, et cetera. Okay, so let's zoom back out here. We got the chart is trying to look at this. The technology. With all the AI technology we have in the world, the biggest problem we're having is the chart itself isn't loading. Let me just do a quick reload here. It's ironic, I think, Jesse. All right, there we go. Let's go back to a linear scale. So then what they're plotting for each dot here is actually the name of a model. Right. So like over here is Claude Opus 4.5. And where they're plotting the point for Cloud Opus 4.5 is to correspond to the longest duration task it was able to conclude. So if we click on that, we find out that a four hour and 53 minutes was the. That was the length of the longest length task that it was able to successfully complete at least 50% of the time. They're plotting each model against what's the longest time duration task that it was able to complete successfully at least 50% of the time. The X axis is time. So they're taking each model to the time in which it was released and then plotting it at how long of a duration task was it able to actually complete? So it's a little bit confusing. If we zoom out as this line goes up, I guess that's scary. But this is what's really going on here, is they're trying to capture are these models able to tackle tasks that require more and more human time to complete as we move on to more and more advanced models? And that is, in fact, what they're actually showing where we start to get the speed up around 2025 that then really picks up around 2026. All right, so that's what's on the plot. You can also look at 80% success, where they plot each model at the duration that it could complete 80% of the time. This curve looks similar, but if we look at the Y axis, we see it's much smaller. So if you need it to be successful 80% of the time, the very best Model Claude Mythos Preview is able to complete a task that roughly took the humans about three hours, where, if 50% of the time is enough, it's completing a task that takes 16 hours. So it does make a difference how successful you need it to be. All right, so that's what's on this chart. Hopefully, that makes sense. A little confusing. Hopefully that makes sense. What is that actually capturing? So now that we know what is on this chart, this plot, what does it actually mean? Well, first of all, now that we know this, here's our first observation meter is not measuring the general capability of these LLM models. Right. They're looking at a specific suite of programming tasks. So it's not the case. It's a mistake a lot of people were making. When you see, like, Opus 4.6 labeled with 12 hours on this plot, that doesn't mean that Opus 4.6 can now do whatever it would take a human 12 hours to do. No, it means there's a particular software task that required, on average, 12 hours for human testers to do that. Claude, Opus 4.6 can now complete accurately about 50% of the time. Right. So it's not telling us anything about the general capabilities of these models. It's also not measuring the general programming capabilities of these models. So, in other words, the fact that opus 4.6 is on the y axis line for 12 hours doesn't mean that that model, with the right coding harness, can now successfully do any programming task that would take humans 12 hours. That's another thing I hear often. Well, now our models can do work that used to require 12 hours for people to do again. It means there's a particular programming task that when it was given to a collection of humans to complete, it took them 12 hours that this model can now complete on its own. We don't know how long it takes to model, but it can complete it on its own, correctly 50% of the time. So it's really measuring a specific collection of software tasks. Now, what about these durations themselves, though? I mean, are they meaningful? Like, what does it mean that a task took humans that they tested on 12 hours? Like, what meaning do we get out of that? And here we have to be careful. It's not really clear what the specific number means. Like, Meter acknowledges in their notes to go along with this study, that it's kind of hard to put a precise meaning on this. Right. Because when you say this task, this particular task took humans 12 hours, what does that mean? What were those humans doing for those 12 hours? And they, they're clear about this like, well, it's not, it's not clear. Like it could be they're spending this time looking up what the task even meant, trying to learn the techniques needed to do that task. Right. Maybe they're having to learn a new programming language or go, you know, master, they've never done something like this before and they're on the Internet for six hours trying to learn what it is. We don't know what this time is actually being spent on. And this is what Meter says. I'm going to quote from their study here. The time horizon is closer to what a low context person, such as a new hire or remote Internet contractor can accomplish. An eight hour time horizon does not mean that AIs can do eight hours of work that a high context human professional can do as part of their day to day job. So they're saying, look, we don't really know what to tell you about these numbers precisely. It's just that some people, when we gave them this task, it took them a while. So probably the right way to think about what is being measured here is something like a general benchmark for programming capability. And these time durations, you should not get caught up with the particular hours, but just think of these as an abstract measure of difficulty. So I don't know what these low context human programmers were doing, but if this task took them twice as long as this task, then maybe that's a twice as hard task. So we don't know exactly what this abstract scale tells us, but it's like a good general way of capturing the hardness of programming tasks, which is smart, right? It's a good way to do a benchmark. We took a bunch of programming problems, we found some way to measure how hard they are, and what we're asking is how far can this new model get in these tasks? How hard of a task can this new model actually complete? And so if we see this model can complete a harder task than this model, we're like, oh, that model is better at programming tasks. That's the right way probably to think about what Meter is doing. I think it's a well designed benchmark of how capable our models combined with harnesses at various programming tasks. And that's the right way to read those numbers. More like abstract difficulty numbers. The actual times aren't that meaningful. All right, third question. How are these models getting better? Why are we seeing these jumps? Well, I want to jump back into the chart here because the timing here is really important and it connects to Some important things you need to know about how LLM based coding models work and what has happened in the last two years. So if we look at this chart, I'll bring it back up here. Notice it's flat for a long time from GPT2 all the way to this point right here. Basically they can't do anything right. They can't complete any meaningful, interesting task that was part of this coding suite. Now there's a reason for this because up until this first point right here, which is like Claude Sonnet 3.5 and we get 01 preview, this is where we first start to see arrays to hey, there's some of these tasks we can complete. Up to that point, what was happening with LLMs, the focus was on pre training. Pre training is that long expensive period where you use real text written by humans and you have the model try to guess missing tokens. It's where all of the primary smarts and intelligence and capabilities of language models come from. From GPT2 through the attempt to make GPT4. 5. We were just trying to make pre training longer, more data, train it longer, and wanted to make the general capability of these models better. We didn't need benchmarks like this so much to know that three was better than two or four was better than three because we could just demonstrably see without much work all these new things the models could do that the last ones couldn't. As I've written about in the New Yorker last August, they hit a wall in the summer of 2024 where it became clear OpenAI discovered this first. Other companies had the same realization. Over the next year or so, it became clear that simply scaling up the pre training, the quality of data, the quantity of data and how long they train was not giving obvious new leaps and capabilities of these models. So it created a shift in how they thought about improving these models that we really began to see in the fall of 2024. And this shift was towards post training where they said, okay, we're going to take a pre trained model and now we're going to get very particular narrow data sets where we have like prompts and correct answers and using complicated techniques based on reinforcement learning, we're going to tune that model to use its intelligence. It already has to get better responses for very narrow types of problems. So we're now going to start focusing on particular problems where we have really good right and wrong answer data and we're going to start tuning these pre trained models to try to do better in these particular areas and so they surveyed the landscape of particular areas in which they could tune these models to do better. And one area that seemed really clear was computer programming. Programming languages are highly structured text. It's actually easier for a language model to deal with producing computer code even than it is with English language. So we knew from the very beginning of these large language models that they're very good at producing code. It was just hard to prompt them to produce exactly the right code you needed. And to be sort of consistent about it, starting in that fall of 2024 when we began tuning these models, they started to become better at not just like producing one small bit of code, but being able to produce longer, more coherent pieces of code. This is where we begin to get, we see here like 01 and Sonnet 3.5, we get these early reasoning models, which were models that they said, okay, after you've been pre trained, we're going to tune you to try to give longer answers, to sort of think out loud. And because we generate answers autoregressively, so we always look at what we've output so far before producing the next token. By thinking out loud, we're basically going to have more expensive computation, but be more likely to get better answers. So that began to help planning. So if you asked one of these LLMs to come up with a simple plan for solving a problem, once we went to reasoning, they became a little better. They began tuning it on computer code that actually works. And so the code they produce was starting to have a higher quality as well. So that's where we begin to get this sort of move up the curve on these programming tasks. Then we start to get these massive jumps. So this real big Jump with Opus 4.6 and Claude mythos that follows it, this really corresponds with the period starting in late 2025, early 2026, when suddenly you sell. Professional computer programmers began using agentic coding systems. So the other thing that happened, so we started tuning these things to be better at producing plans and to be produce sharper code. We also, and by we I mean the AI companies began to work really hard on the coding harnesses. So the programs that you hook up to the LLMs to help produce, to make the plan produce the code, check things to connect with the various systems that professional programmers work, and they begin building out these coding harnesses to be better and better at solving and working on the types of software tasks that professional programmers face. And here's the key thing, and we know about this because ironically, the company that produced the new model that could detect all vulnerabilities had a vulnerability and they leaked their source code for their coding hardens Claude code so we know what's actually in the source code for the cloud code. And and there is a ton of hand coded just humans sitting there building this thing tinkering from scratch pattern recognition giant if then statements they call all sorts of external tools. There is a lot of old fashioned 1960s style AI logic built into this coding harness where it's just the expertise they used to call these expert systems. The expertise of the computer program is anthropic. Building the cloud code coding harness are sitting in here just building out how do I build a tool that's as useful as possible for actually producing computer code to solve problems in the way that computer programmers solve it. So it's a mixture of an LLM that can produce better plans and produce sharper code, plus a year or a year and a half of working on these monstrous coding harnesses with all of these hand coded expert system type logic just to be good at the very specific types of tasks that computer programmers face. And it's when those things came together they cross a certain threshold utility that we got the takeoff with. We see with like Claude Opus 4.6. Now here's the key meter is testing the model plus the best coding harness they can find. So what you're seeing in this graph is not just the fruits of post training these models that are pre trained in the same way they were doing in 2024, but you're also seeing the fruits of these incredibly complicated hand coded old fashioned 1960s AI style coding harnesses that they put on top of it. So a huge amount of energy in the AI industry went towards trying to solve this problem because there's a good market there. How do we build tools to help computer programmers and then those tools could now handle these much more complicated by complicated problems that require many more steps or many more time. Well this is exactly the type of thing these coding harnesses plus tune models were being optimized to do. So they're jumping up this plot because we really started caring about this for the last two years and in particular the last year or so the industry really focused on this particular problem. Okay, so is that like a bad thing? Is there like a fraud? No, this is actually very impressive. I think from a technology perspective there was this long period for a couple years with generative AI where the concern was what's the killer app here? Like this is really cool. I like asking chatbots things. It's really impressive, but where are we going to make money on this? What can this actually do? And for about two years that was the question. And they originally thought the answer was going to be, we'll just pre train these things until they're AGI and we could just do anything with them. That didn't work. But they pivoted. They pivoted in late 2024 to say, okay, we need to start tuning these for particular uses and building tools around them to try to solve particular problems. And. And they said, programming, not just like Vibe, code me whatever, but like professional quality programming using professional tools over multiple steps. There's a market for that. And they were right. And they really worked hard on this. And it's impressive technology. These harnesses are impressive technology. The tuned up LLMs are impressive technology. There's a huge amount of data you can use to tune an LLM to be better in particular at producing compilable code. But really the coding harnesses, I think is the story of the exponential leap here, because they really figured out we can't just trust the LLM to come up with the right plans. We can hard code a lot of logic because we know a lot about programming as programmers and these are really interesting, cool tools. It's still shaking out exactly how they're going to be integrated into software development, but it's a real success story. They found a lane for applying this technology that would have a real commercial viability. And it worked. And that's what we're seeing in that chart. But does this mean, like, was being claimed in all those tweets we looked at the beginning of this episode that AI is about to, quote, eat everything? Does that chart going up in the last year with two points mean that we are inevitably on a crash course towards artificial superintelligence and all things we care about and we're going to become inputs to an ergodic alien intelligence? Well, clearly the answer there is no. It is a measure specifically of the efforts of the AI companies to try to build programming tools as a measure of programming tools. And in the last couple of years they made some great breakthroughs. We have two data points to capture that. I think partially what's going on here is there's two different mental models for thinking about AI improvement. And depending on what mental model you adopt, it really affects the way you end up thinking about things like the meter chart. The first model, which I think is very common and is wrong, is a model I first saw in the Max Tegmark Book Intelligence 3.0, which predates all this generative AI stuff. And it's this idea that you can imagine that AI capability is like water. And as AI in general gets better, the water level rises. And this water level is rising over a sort of like mountainous plane. And the taller the peak is, like, the harder the problem. And so as the water rises above a certain peak, it can now solve problems of that difficulty. And then as it rises farther, it can solve problems of that next difficulty. That's the way a lot of people think about AI. And so when you look at the meter chart, it looks like, look, this is whatever this is measuring. We see the water level going up fast. So the hardness of things AI in general can solve is really improving. Oh my God, it's going so fast. If we extrapolate that out for another three years, then we're going to have the water above, like all the mountains and AI is going to be able to do everything. That's the wrong mental model for how generative AI based tools work. A better model is to think about AI progress as a river. And as you go down this river, you see these various openings for tributaries, like little streams coming to the river. And think about each of these tributaries as a potential application of the AI technology, a particular area where you could build tools on the AI for it to be useful. You don't really know in advance how navigable each of these tributaries are until you go at it and give it your best effort and you portage over the rapids and see how far you can get. Like it's, you know, Henry Hudson in the 17th century. Some of these tributaries end up, if you really try hard, are very navigable. I think the computer programming example, the software development example, is a tributary that's ended up to be very navigable. It's like, oh, we found the Hudson River. This thing keeps going. This is really important. But one tributary being navigable doesn't necessarily tell you anything about another unrelated tributary. So maybe we go down this other one over here. Oh, I'm going to build AI that's going to handle all my email. We didn't get very far. It's rapids and then it becomes really shallow and then just kind of disappeared. Oh, okay, we'll try another. That is what it is like trying to find applications based off of AI technology. You have to explore different tributaries and that requires the building of these custom tools, these harnesses. It's really hard. It took two years of concerted effort with experts at computer programming to build harnesses for computer programming. And you don't know how far it's going to go before you hit a dead end. So there is no notion. No one says, hey, I got 100 miles up the Hudson River. So I'm now going to assume any other opening I see is going to be one that I can go 100 miles on. They're different tributaries, they have their own challenges. Some are better suited for navigation than others. And that really describes our current moment. Now we know this. I'm going to load up a tweet here. But we know this in part because here's a tweet from Ramez Nam. I saw this from Gary Marcus newsletter as well. But here we have another index of AI model capabilities called the Epoch Capabilities Index, or the eci, which is not just computer programming, but a bunch of different things. And now look, here's GPT4 and then here's the latest ones. Over here we have like a linear increase. It's noisy. And the jumps here over the last three years are slow and steady. Right. These are the same models that on the programming test were jumping up on an exponential. So it depends what you're trying to do with them. All right, so I think that's an important way to think about this, right? I think an important way to think about this is what application are we talking about? And I think we should treat these applications like normal technology, which means we could be excited about the things that they can do, but not extrapolate wildly about the things they can't. If you're a software developer, you can be super interested in these tools. They put a lot, they went all in on it. And I think they're making really interesting progress. But if you're not a software developer, that doesn't really matter to you right now. And it doesn't tell you anything about what AI is going to do in your particular corner of the world, if anything. All right, so to wrap this all up, let's go back to those hysterical tweets from before. AI is going to eat everything. What's really going on here? Well, partially it's the wrong mental model. It's the water raising instead of exploring the tributaries. I think that's a big part of it. But I also think a lot of these people that are making those really over the top tweets, they come out of a community that's known as the transhumanist. So we talked about last week about the rationalist community, and a subclass of the rationalists were the existential risk people. And these really influenced the big AI companies. But there was another group that intersects the rationalist, which is a transhumanist. And they really came out of Ray Kurzweil's work, which looked at exponentials, and it was looking at exponential increases in processing power, computer chips. They said, well, if we extrapolate this exponential, computers will be so powerful that we'll be able to upload our consciousness into machines, and we'll be in a utopia. Transhumanists love this idea of following exponentials wherever they find them, extrapolating them out, and then saying, well, if we get all the way out there, life as we know it will literally be changed. And sometimes they're super utopian, and sometimes they're super dystopian, but they get meaning in their life. Their religious cult is one of exponentials delivering transcendence or destruction. It's eschological, right? I mean, there's heaven, there's hell, and a giant event is going to happen as live as one or the other. It's a story that goes back to the very beginning of written stories. And the transhumanists love this story. They move from exponential to exponential. And so when they see something like the meter graph, which has an exponential, now, honestly, it's two points, but okay, an exponential, they say, yes, this is going to deliver our doom or our salvation, because that's the way they want to see the world. So the transhumanist and the existential risk communities, these have become mixed together, and they've been very influential in the way that the AI companies talk about their technologies. They've been very influential in the Internet chatter, and they've caused a lot of anxiety. So I want to end this, you know, and again, I hope I'm being clear here. I'm not trying to be skeptical about the value of programming tools, but I want to be able to talk about just tools like a normal person, right? Like, when someone showed me the first useful electric car, we were able to say, oh, that's cool. Like that thing, it drives really nice. It doesn't use any gas. It's simpler. It's not going to break down as much. What a cool technology. We could just say that without being like, I'm extrapolating. And soon cars will be going the speed of light, and we're all going to worship car gods. Like, why do we have to go crazy? Why can't we just look at a tool and say, that's really cool. Let's see what happens next. Here's my call. Now here's what I think has to happen. I think the AI companies are now at a size and importance, especially as they're considering going public, that they need to start to distance themselves from these cult like communities. I think the major AI companies need to distance themselves from the extreme ex riskers. I think they need to distance themselves from the transhumanist. They've become too big and too important and too influential to be mixed up with these schools of thought that have so many other out there, exaggerated or straight up problematic elements to it. I really do think we're going to see in the next year or so a separation from the AI companies and Dario Amadei and Sam Altman and Elon Musk. We're going to see a separation in the way they talk about their technology to the public, to consumers, to the investors, from the way that these other communities that they were a part of before and are very influential to them talk about it. I think it's going to be like the modern Republican Party distancing themselves from the conspiratorial John Birch society in the 1960s. That's what we need to do today. We need Addario Amadei or Sam Altman to look at these. AI is going to need everything, the aliens are here, whatever, and say that's not us. That's kooky. That doesn't represent us. We're trying to build useful tools. Let us tell you how. Over here we're building this. Here's why we're just going to make everyone's life better. Over here we're trying to build this tool. Here's why I think it's going to be useful. Here. We failed, but we're still working on it. We're not destroying the world. AI is not going to eat everything. You need to stop talking about all these exponential worshiping cult. You want to be on the Internet doing that, you should be in a dark corner of doing it. We're trying to build real tools that we think are going to be useful. We're going to explain why that's what I think has to happen. It's time to distance the way we think about AI from these particular communities that are freaking everyone the hell out. They're influencing how even CEOs are talking about things, they're moving the stock markets, they're causing large anxiety and they're often wildly wrong, exaggerated or technically incomplete. So all right, that's my soapbox plea. But if you want to come away with one message from the meter chart, how would I summarize it? I would summarize it by saying in the fall of 2024 we moved from pre training to post training and one of the problems we began post training for was reasoning and computer programming. And then in 2025 we began working on the harnesses. We got good. And so our last couple of model plus harness combinations have been making leaps in the complexity of things they can solve which exactly matches what we're seeing the software development world where now it wasn't until those models that they were good enough for people to use them. The meter chart is capturing the fact that the AI companies bet on this very narrow task but perhaps financially lucrative and economically useful task is paying off that these tools are doing well. It says nothing about the fate of humanity or AI more generally, or artificial superintelligence or any of these other sorts of X risk transhumanist fever dreams. All right, so let's leave it there. We'll be back Monday with an advice episode of the show and probably another AI reality check on Thursday. But until then, remember, take AI seriously, but not everything that people say about it.