Summary8 min read

CoRecursive: Coding Stories

Episode: The Pre-Training Wall and the Treadmill After It

Host: Adam Gordon Bell
Guest: Don
Date: May 9, 2026

Overview

In this episode, Adam and Don unpack the dizzying evolution of large language models (LLMs), focusing on AI research, industry business models, competition, and the struggle for lasting advantages—or "moats"—in the fast-paced world of artificial intelligence. Through a series of industry quotes and deep explorations, they follow the path from early transformer models and scaling laws, through the “pre-training wall,” to the ongoing treadmill of incremental innovation and rapidly disappearing barriers to competition. The conversation touches on OpenAI’s latest models, the open sourcing of LLMs, China's Deepseek breakthrough, Meta’s strategy, and the significance of reinforcement learning and training data limitations.

Key Discussion Points & Insights

1. Setting the Stage: AI Obsession and Industry Quotes

Adam has been intensely focused on LLM developments, sharing late-night AI news and quotes with Don. They set the stage by referencing three industry touchpoints:
- Greg Brockman (OpenAI President): Announces "Spud" as a new base/pre-train model as the result of "two years worth of research" ([00:38]–[00:46]).
- Leaked Google Memo: "We have no moat, and neither does OpenAI"—raising questions about competitive advantage ([01:04]–[01:19]).
- AI Model Accessibility: "R1 is on GitHub, Llama is on Hugging Face, and what's this $850 billion for?"—challenging the value proposition of closed models ([01:53]–[01:59]).

2. Understanding LLM Pre-Training and Scaling Laws

Early Experiences with Copilot: Don recounts mixed reactions to code autocomplete tools—highlighting both automation's benefit and its interruptions ([03:11]–[03:54]).
Transformers and LLM Genesis: Adam reviews Google’s "Attention is All You Need" paper (2017) and the transformer innovation leading to GPT ([03:54]–[04:33]).
Scaling Law Hypothesis: Ilya Sutskever’s theory (from deep learning breakthroughs at U of T and “AlexNet”)—with enough data and compute, neural networks will inevitably succeed ([05:44]–[07:03]).
The "Graph": OpenAI’s fundraising pitch: "Give us more money for GPUs, and intelligence (‘smarts’) will go up" ([10:16]–[11:50]).
- “From their perspective, there was something great they could do with their graph, right? Which is say, like, hey, if you give us more money, we can buy more GPUs, and, per our graph, we’ll have a smarter thing.” — Adam ([10:42])

3. The AI Business Model and the Threat to Moats

Early OpenAI focused on massive scaling, believing being the first to superintelligence (AGI) would ensure dominance and safety ([13:13]–[13:38]).
Vulnerability Revealed: The underlying algorithms are easy to clone; competitive advantage rests entirely on scale, not fundamental “secret sauce” ([13:38]–[14:41]).

4. Data as the "Fossil Fuel"—The Pre-Training Wall

Chinchilla Paper (DeepMind, 2022): Shows that not just compute and model size, but the amount and quality of training data are crucial—a more nuanced formula ([14:41]–[15:34]).
Instruct GPT & ChatGPT: The shift from simple token completion to instruction-following models, introducing “pre-training” (full-internet ingestion) and “post-training” (making it useful for chat) ([16:00]–[16:25]).
The Wall: By 2024, leading companies find they have run out of high-quality new data. As Ilya Sutskever says:

“While COMPUTE is growing... the data is not growing. We have but one Internet. You could even go as far as to say that data is the fossil fuel of AI.” — [22:44]
They coin the "pre-training wall": the fundamental limits from having used up all the good Internet/book data ([24:00]).

5. Synthetic Data and Reinforcement Learning: Creating New Fuel

Learning from Games (Go and Chess): DeepMind’s RL (reinforcement learning) approach lets AI create new training data—playing against itself, scoring, and improving without new internet data ([24:43]–[25:57]).
Applying RL to LLMs: OpenAI adapts this for coding/math problems, running thousands of generations where correct responses are looped back as training data ([26:48]–[27:50]).
- “You can write code and it will ... work or it won’t. You can run the compiler, see if...” — Adam ([26:38])
Breakthrough at Anthropic: Suddenly, open LLMs become superhuman at coding, likely by generating and validating solutions—inventing their own data trails ([28:08]).

6. Hardware Constraints Breed Innovation: Deepseek in China

Nvidia’s Sanctions: U.S. restricts Nvidia’s best hardware exports to China, forcing Chinese firms like Deepseek to innovate with lower-grade GPUs ([30:27]–[31:38]).
Deepseek Technical Hacks:
- “Lower the bits” for efficiency ([31:44]–[31:58]).
- Use layers as network interfaces to improve communication bottlenecks.
- Custom instruction sets for further optimization ([32:09]–[32:28]).
Result: Deepseek’s R1 model matched or exceeded Western LLMs for a fraction of the compute cost. Their methods and models were openly published ([34:10]–[35:06]).
- “If they can do this with their constraints, what’s going to happen to us? ... We should have been looking at how to optimize our AI instead of just throwing hardware at it.” — Don ([33:22])
Collapse of the "Moat": Not only is it hard to stay ahead, but open research and hardware constraints accelerate competition and erode the lead ([35:06]).

7. Open Sourcing: Meta/Facebook’s Llama and Industry Shifts

Meta’s Strategy: Open sources the Llama model, providing anyone with the means to train advanced LLMs, echoing open source’s impact on operating systems ([38:21]–[39:48]).
- “I believe AI will develop in a similar way [to Linux].” — Zuckerberg ([39:13])
Business Reason—Commoditize Your Compliment: Meta wants cheap, openly available LLMs to power its core business (ads, moderation), giving away value in one domain to increase it elsewhere ([40:38]–[41:25]).

8. Distillation, Copycats, and the Treadmill Metaphor

Distillation: Smaller, more efficient models are trained on the outputs (“chat logs”) of larger, expensive ones—replicating intelligence for a fraction of the cost ([44:21]–[45:51]).
- “It’s like a senior teaching a junior... You don’t have to make all my mistakes, just do this.” — Don ([44:28])
Race Against Time: As soon as a new lead is attained, it can be copied:
- “It’s a Red Queen’s race.” — Adam ([47:57])
- The “treadmill”—run as fast as possible just to stay in one place ([48:33]).
- Open models are six months behind the front-runners; if the improvement curve stalls, the business case collapses.

9. OpenAI’s “Spud” and the Latest Model Race

Spud/GPT-5.5: OpenAI claims dramatic leap (“two years ahead”), but the model is more expensive to run, raising questions about diminishing returns:
- “They’re back on the curve, right... this is the biggest [model] we’ve ever shipped.” — Adam ([51:42])
Business Challenges: Subscription and brand loyalty may not suffice if open alternatives catch up. Ads may be coming, but competition could rapidly lower prices ([46:45]–[47:57]).
Long-Term Outlook: If OpenAI/Anthropic collapse, the ecosystem survives via backfilled open source models. The treadmill (constant innovation to retain market position) continues ([52:54]–[53:43]).

Notable Quotes & Memorable Moments

On Scaling Laws and Compute:
“If you give us more money, we can buy more GPUs, and, like, per our graph, we'll have a smarter thing... It becomes like a fundraising thing.” — Adam ([10:42])
On Hitting the Data Limit:
“We have but one Internet. You could even go as far as to say that data is the fossil fuel of AI.” — Ilya Sutskever (quoted by Adam) ([22:44])
On Responses to the Data Limit:
“You've eaten all the good parts, only the crumbs are left, and they're not going to get you where you want to be.” — Don ([23:44])
On Reinforcement Learning as a New Path:
“The cool thing is that they just lean into this ... It's creating its own data by playing the game against itself.” — Adam ([25:10], on AlphaGo/DeepMind)
On the "Moat" Problem:
“There’s no secret sauce … if people knew all we're doing is trying to get as big as possible as fast, like, we'll be in trouble.” — Adam ([13:13])
On Open Sourcing LLMs:
“Eventually though, open source Linux gained popularity...I believe AI will develop in a similar way.” — Mark Zuckerberg (quoted by Don) ([39:13])
On the Treadmill Metaphor:
“Here, like, you have to run as fast as you can just to stay in the same place... As soon as you have an advantage, keeping that advantage requires working just as hard as you did before.” — Adam ([48:33])

Important Timestamps

[00:38] — Greg Brockman’s “Spud” / pre-train quote
[01:04] — The “no moat” Google memo
[01:53] — “R1 on GitHub”—questioning $850B AI value
[03:11]–[03:54] — Old-school Copilot and early autocomplete
[05:44]–[07:03] — Ilya Sutskever, scaling laws, AlexNet/deep learning history
[10:16]–[11:50] — The "graph"/Fundraising through scaling law
[14:41]–[15:34] — Chinchilla, the importance of data over raw compute
[16:00]–[16:25] — “Pre-training” and “post-training” steps clarified
[22:44] — “Data is the fossil fuel of AI” – Ilya Sutskever
[24:43]–[25:57] — AlphaGo/DeepMind and reinforcement learning loops
[28:08] — Cloud code and synthetic training data breakthrough
[31:44] — Deepseek’s technical tricks from the hardware constraints
[33:22] — “The gravy train might be coming to an end”—Nvidia & hardware arms race challenged
[39:13] — Zuckerberg on open source AI and the Linux analogy
[44:21] — Distillation: training small models on the outputs of big ones
[47:57]–[48:33] — “Red Queen’s race” explanation for AI progress treadmill
[51:42] — GPT-5.5 (“Spud”): more expensive, possibly a real leap—or not

Conclusion & Final Reflections

On Moats and Sustainable Advantage:
- Both argue: AI moats are fragile and often temporary. While OpenAI claims a two-year lead (“Spud”), competitors catch up quickly via open weights, distillation, and synthetic data. The industry is in a perpetual “treadmill”—work harder just to maintain your place.
On the Industry’s Future:
- If the leading edge stalls, the subscription business model may falter as high-quality, free, open-weight models emerge. However, as long as innovation is rapid and customers value immediate access to the best models, companies like OpenAI and Anthropic can command significant revenues—for now.
- “If all of these companies explode, we still end up with just the open weight models of six months ago.” — Adam ([52:54])
- “The cat’s already out of the bag, right?” — Don ([53:43])

Final Takeaway:
With AI models, competitive edge decays rapidly. The cycle of “breakthrough, copy, catch up” repeats, making the AI world a true Red Queen’s race—brilliant, relentless, and exhausting. The only constant is change, and the only sure bet is that today’s secrets and advantages are tomorrow’s base standard.

For full technical detail and memorable banter, check out the episode!

Loading summary

Transcript248 lines

[00:00]
A
So, hi, I'm Adam Gorbel. This is Co Recursive, and today I have Don.
[00:04]
B
Hi, I'm Don. I'm here. Yeah. So Adam's been obsessed with AI and LLMs for way too long. He keeps sending me tweets and articles. In fact, he sent me a bunch just like last night. Like 10 o', clock, too. It wasn't. It wasn't early. It was late. It was late. And I'm like, what is.
[00:20]
A
I think it was like nine.
[00:22]
B
It was nine. It was like, I don't know, close to 10. It was like 9:40 or something. I mean, it's not like I was busy. It's fine. All right. 9:20.
[00:29]
A
I win this round. Okay.
[00:31]
B
But you ended at 9:50.
[00:33]
A
So basically, OpenAI has a new release, and so they're out pumping it up.
[00:38]
B
Yeah.
[00:39]
A
And the thing I sent you that I thought was super interesting was this quote from Greg Brockman, who's the president of the company.
[00:47]
B
He said, I think of Spud as a new base, a new pre train. And I'd say it's like we have maybe two years worth of research that is coming to fruition in this model. And I have no idea what those words mean. What's spud? What's a base? What's a pre train? Two years of what?
[01:00]
A
So we'll get into that. And what was the second one I sent you?
[01:05]
B
The other one was older. A leaked memo from inside Google, three years old. A line you had highlighted said, we have no moat, and neither does OpenAI and moat. Why are they making castle analogy?
[01:19]
A
I feel like you know what a moat means.
[01:21]
B
I do. I do know what a moat means. They're creating walled gardens. Right. So they're like, well, hey, you know, we're making this thing, and we've got billions and billions of dollars in. In funding, but there's nothing that stops somebody else from just doing this thing, which is like, the whole core of what the Internet was created for, like, way back in the day. Right. It was just a bunch of people figuring things out. Everything was open. Then, you know, corporations moved onto the scene. And all of a sudden it's like, how can we monetize and make walled gardens and force people into our ecosystems?
[01:50]
A
Which I think ties to the third quote I sent you. Do you want to share that one?
[01:54]
B
Sure. And it says, R1 is on GitHub. Llama is on hugging face. And what's this 850 billion dol for?
[02:00]
A
Yeah, that one is cryptic, but I. I feel like this gets at exactly what you are getting at. But yeah, I'm gonna call this format Stack trace working name, we'll see how that goes. But it's like, you know when something blows up and you get a giant stack trace and you know you have to kind of figure out what the error is and peel back the layers one by one. Right. So I thought if we could peel back through these quotes from these articles that I thought were super interesting. The Brockman one is like, is like brand new, right? It's now the 29th I think it was like a couple days ago they released their new framework. So I thought if we can walk backwards from whatever the business case to the engineering, what they're building, how it works. Because none of this makes sense unless you understand like what pre training is, what a base model is, like what OpenAI is even doing or trying to do with their new models. And like once we add some meet onto these bones, maybe we can figure out if, if these companies make sense, if they'll be profitable, if the world will change, et cetera.
[03:00]
B
Yeah, no, that sounds like a good idea.
[03:01]
A
Okay, so let's start with what training is. So did you ever use the old school copilot where it was like autocomplete in VS code?
[03:11]
B
I used something similar in Intellij. So I didn't use VS code too much. But Intellij had like auto complete and it started getting smarter and smarter. Like it was sort of look at like the context in which you were writing and try and propose something. I would say that maybe 60% of the time it was useful but like 40% it was like way off. It was like I don't want that. And then it would, it would try and you get into this state where it's like press a button to auto complete it. It's like, but I don't want to. So now it's interrupted my flow, right. Because I can't just press a button or else it'll spew out all the stuff I don't want. So I'd have to like hit another button to cancel it out. I don't know if this is just a me problem, but it got, it got in the way of me actually writing the code of like no, leave me alone. You're suggest suggesting something that's not useful.
[03:54]
A
Yeah, I feel like people had different reactions to it. Like some people are still using that, that, that form factor but, but many people aren't. But that was like the, the, that was like part of the first iteration of these LLMs. It was just picking the next token. So you have like all this code and then it's like, hey, what comes next? And it, it tries to guess that that's the entire, like, training objective. So before Copilot Even launched in 2017, Google published this paper, attention is all you need. And it invented the transformer. The transformer is the, the T in like, GPT. Right. And it was just in Google they figured out, hey, let's take this transformer thing, let's feed it all the Internet, every Wikipedia article, every book, every, every Reddit. Yeah, everything you've ever posted on Reddit is preserved somewhere. And you just get it to predict what the thing is that comes after that. Right. And OpenAI at this point, you know, they were kind of a research lab and they, they did like this Dota 2 battles where they were trying to beat professional players. They had like physical robots, were trying to solve Rubik's Cubes. Like, they did all this stuff. It was like, supposed to be a researchy type organization and the GPT was like one of their bets. And it was complicated because to get it to consume all of the Internet was this complicated training run. And it was a bit finicky, but it worked, right? Like, it started to become good at predicting this next token. And like, as we know, this all became this huge industry. But the first thing that they kind of figured out, even maybe, you know, in the very early days, was that, hey, we have something here where if we throw more compute, if we throw more GPUs at this, it just gets better. Yeah, like, that is kind of unusual, right? Like, most problems, I don't know, most problems can't just be solved by, like, give it more CPUs.
[05:44]
B
I find the opposite, that throwing more hardware added is kind of like a. It's kind of like a trope in our, in our line of work. Right. If you, if you have some, some code that's not well optimized, it's used in a lot of memory, is taking a long time, you throw more hardware at it and problem solved. Until, you know, until it starts slowing down again.
[06:04]
A
That's true. Right? Like, why optimize this code? Yeah, that is a trope. Why optimize this code? Just buy a bigger server.
[06:09]
B
Just buy a bigger server. Yeah, just, just upgrade it to the next node size.
[06:13]
A
Yeah. So they have this very clear idea like, hey, if we can throw enough computers, then we'll have AGI or something. We'll have something very intelligent. Hey, we got something that seems like it can think A little bit. We can't chat to it yet. And like, if we just throw more at it, can think even better. Right. And so let's just keep doing that. And they call this process training. Simple enough, all the text of the world, feed it to this thing, give it as much compute as you can. We get something smarter and smarter. So this original hypothesis of just like scaling up came from 2014. Even before the Transformers, there was this guy, Ilya Stutzkever, and he had this paper called Sequence to Sequence and he argued that, yeah, with a big data set and enough compute, success is guaranteed of building some sort of prediction machine.
[07:04]
B
What does success look like? Define success. The predictions are only useful if they're accurate most of the time.
[07:09]
A
Yeah. So he had come out of the University of Toronto this like, deep learning group, and they had this great success on what had been this really hard problem at the time, which was identifying images, like picking what the things were in images and tagging them. And people have been competing for all different places to do the best labeling of these images. And this group. What's the head guy's name? Hinton. So this is Hinton, right? He was the professor. And yeah, they, they, they beat this benchmark of identifying images. They were just like so much better at it. And they did it with deep neural networks and just like a lot more compute. Right. They blew it out of the water and revolutionized the field of machine learning. This guy in the background, this is Ilya right here. Right. He was one of his students. They started working on this imagenet, which was an annual computer vision competition. And their submission was called Alexnet. And they trained on two consumer gaming cards that they had in the basement of U of T. I don't know, I don't know a lot about GPUs, but you probably do. So they had two GTX 580s. Is that good or those were good.
[08:14]
B
Yeah, those were good cards. I'm still rocking a 1080 Ti. It's old.
[08:18]
A
Yeah, I don't know, I'm not up on the field of GPUs, but the point is, like they, they were able to use neural nets and they just blew this, this benchmark out of the water. That had been kind of. People have been inching up, right, like getting a little bit better at identifying things. And that year when they submitted like the runner up 26.2 of the questions wrong and they got 15.3, so like everybody had been slowly climbing into the 20s and they just like cut it in half. They're like, we got everything except these.
[08:44]
B
15% is actually not. Not bad. What kind of questions are we talking about?
[08:49]
A
It's identifying all these different things, but for some reason, there's a lot of dogs in it. So, like, you have to guess the dog breed and like, circle, like, oh, this is a whatever.
[08:58]
B
And that's better than I would do
[08:59]
A
because of all those poodle ones. Like, who knows?
[09:02]
B
Yeah, yeah.
[09:02]
A
There's like a million poodle crosses. Yeah. So every researcher in this room grew up to become very important because this was like a revolution when they built this, when they beat this using a new approach. Right? So three years later, based on this, Ilia, this guy, he. He co. Founds OpenAI and he co. Founds it with. With Sam Altman. And Brockman was the original quote. And I'm sure you've heard of. Of Sam Altman. Then there was this other party, Elon Musk. Who's one.
[09:28]
B
Who's that guy?
[09:29]
A
Elon Musk?
[09:30]
B
Oh, I don't.
[09:30]
A
I haven't. Yeah, not familiar. Interesting. Anyways, so Ilya is like, he's. He's the, the research brains, right? He's the researcher who research. Right. Brockman is like the engineer. You know, like, let's actually productize.
[09:42]
B
Yeah.
[09:43]
A
All this, right? So he becomes chief scientist, and then in 2020, there's a paper published. They actually write a formal paper that's more than just vibes that says, like, here is how given more compute, like, we can learn more things. Right? Like, here's. And this is their paper. They say, like, here's how we do this. Which is interesting because, like, before this, all these people were competing for imagenet. And it was like, you try a bunch of things, right? You have your pile of GPUs in your basement or in your research lab, and it's like, try some stuff. See if you can do better. But here they're like, dude, we have a graph.
[10:16]
B
And obviously the graph was. Was based on. On some, like, concrete data. Because, I mean, like, anybody can make a graph and be like, oh, you know, I. I improved the performance by 20% when I, like, increased the hardware by this much. So therefore my graph goes to the moon.
[10:31]
A
Yeah. Or like a. We got married 10 years ago, so by the time I'm 60, I'll be married five times.
[10:37]
B
You need like, a solid base of some comprehensive data points at the beginning of the graph to make a prediction.
[10:42]
A
So I think it makes sense to be skeptical. But from their perspective, there was something great they could do with their graph, right? Which is say, like, hey, if you give us more money, we can buy more GPUs, and, like, per our graph, we'll have a smarter thing. And so it becomes like a fundraising thing.
[10:57]
B
Well, I mean, like, yeah. Why else would you make a graph unless you wanted to, like, you know, convince somebody to give you money?
[11:03]
A
Yeah, like, I have this thing here on the graph, but think what I could do Instead of those two, like, Dawn GPUs, if I had all the highest end ones that I could fit in this room. Oh, now we're talking. Right? So this is sort of what they do, right? The kind of this graph and this published paper, you know, and it's published in a reputable place, so people have vetted it. Yeah. So originally it was financed by this guy who you said you didn't know, Elon Musk. And I don't know, he was just like, cool, let's. Let's build the. The. Let's build the. The super AGI of the future. Right. There was a little. The early days of all the people involved in this, where they all believed in this idea that we could build a super human intelligent machine. But, like, that belief meant when somebody got a published graph saying, like, GPUs go up, smarts go up, and they're like, let's do this. Right.
[11:50]
B
I guess where I'm getting hung up is what was. What was the thing that they were buying for $850 billion? Like, smartness. Yeah, I'll buy, you know, here like, 500 units of smart. We can have a thousand units of smart. Cool. Here's $850 billion. Are they just still buying units of smart? That's not a good business point.
[12:11]
A
No. So, yeah, so they built an API, right? So early. Like GPT3. I used it. So this is before the chat. It was just token completion. Like, I. I tried to use it to write tweets for my work because, like, I would write a blog post and didn't want to write the tweets, but you would have to be like, give it an article and then a sample tweet, then an article on a sample tweet. And then when you give it your article, then it's like, oh, I get the pattern.
[12:33]
B
Complete it for you.
[12:34]
A
You couldn't say, like, hey, man, write me a tweet. Like, it didn't understand that you couldn't communicate with it anyway. So they put this on an API. They charge for it. People like it. It's exciting, but it's small. But they're Small, but they're like, we're onto something. Let's raise more money. And yeah, their business strategy that they were worried about, that they said you could write down on a single grain of rice, was. Was scale. Like the word scale. Because they're like, we have this thing on this API and people are paying for it. And it's, you know, it's. It's one unit smart. If we had 10 times the amount of GPUs, we can have 10. 10 units smart, or whatever the graph was. So they were like, we gotta go, man. Like, we have this thing. It's gonna change the world. But, like, anybody can look at what we're doing and be like, oh, we could do the same thing. Right? There's no secret sauce from their perspective. They're like, dude, if people knew all we're doing is trying to get as big as possible as fast, like, we'll be in trouble.
[13:27]
B
Yeah. So, like, the. Under. The underlying algorithms are easy to replicate, and that' bad because they want to inevitably be the people that hold the keys.
[13:39]
A
Yeah. They had this idea that the first super intelligence that came around would be all powerful, and so it better be us because we're great, upstanding people who control it and not China or Iran.
[13:53]
B
Oh, I see.
[13:54]
A
Or just that guy down the road.
[13:57]
B
But, like, the same rationale is the atom bomb.
[14:00]
A
Yeah. But from a corporate side. Because at this point, I don't think national security has gotten into it yet. At this point, it's just a bunch of nerds building this thing. Right. It's not, but the US Will get involved. Okay, let's keep going. So in the year 2022, DeepMind, which was a group within Google, Right. They published this paper called Chinchilla. So at this point, like the GPT thing, you know, it came out of Google. Google wrote a paper, but they didn't really build anything on it except an internal LLM. And then OpenAI ran with it. Google sees it's something important and they keep working on it. And so they put out this Chinchilla thing, and it shows that their graph is kind of wrong. Oh, which isn't good, right?
[14:42]
B
That's not good.
[14:43]
A
And so the Chinchilla people, they built a whole bunch of models of varying sizes and varying training. And they found that, like, no, there's actually a very clear relationship it has to do not just with the compute and not with just how big the model ends up with, but the amount of data that you trained it on, which makes a lot of sense. It's like you give it more information for it to get smarter.
[15:05]
B
Yeah. It needs to know more so that it can have a bigger library to draw from.
[15:11]
A
Yeah. And so you can build it bigger with less information going in. And it's just, it's not smarter, it's just bigger. Right.
[15:17]
B
So key factor is the data.
[15:19]
A
Yeah. If we can give more resources to this thing, it'll be better. Right. So Chinchilla doesn't really break it, it just, it adds a new important wrinkle. Right. It's.
[15:28]
B
It sounds like it reveals an important factor making it work. It's not just compute.
[15:34]
A
You need the data. And so at the same time, the same month as Chinchilla, OpenAI publishes a paper that they call Instruct GPT. Guess, guess what Instruct GPT is.
[15:44]
B
Are instructing it to do something. So you're giving it data to learn on.
[15:49]
A
How would you instruct it?
[15:50]
B
You would have to feed it similar data for what you want to accomplish.
[15:55]
A
So like you're, I mean you're sort of close, but no, this is, this is ChatGPT.
[16:00]
B
So it's a different way of asking it to do something.
[16:03]
A
Yeah. Because before you would give it a bunch of texts and it would predict the next token.
[16:07]
B
So now you're just talking to it.
[16:09]
A
Now you're talking to it. Instructions. So they call it Instruct GPT. So the thing that happens is that becomes the post training step. So there was training and now we have this post training step where we make it more human. But then they decide to rename that first training step where it consumes the whole Internet. Pre training.
[16:25]
B
Right.
[16:26]
A
Which means you end up with this weird world where they have a pre training step and then a post training step. There's no actual training step. Like they've accidentally.
[16:33]
B
They've removed. They removed the training.
[16:35]
A
Yeah, exactly. The training is gone, even though it still exists. Okay, so then mid 2023, start a new training run. Same idea. Right. Let's just, let's do an even bigger pre train. Right. So we've trained on the whole of the Internet or a lot of it. Like let's add even more. Right. As we know, I guess they, they trained on a lot of books that they probably didn't properly have access to. But it's like, let's feed it more data. We understand the formula. Let's make it even bigger. We'll have an even smarter model. So that's in mid 2023. They call this Orion. And this was supposed to be ChatGPT 5. So 3.5 was the first ChatGPT, and then there was 4. And they came really close and they're like, let's make five. Because the difference between 3.5 to 4 was really big. And I remember when they shipped it because Sam Altman said something like, hey, we have this new model. It's pretty cool. Not sure if we'll release it for a long time, but, like, he kind of downplayed it. He's like, it's all right. You know what I mean? He wasn't like, this is the most exciting thing. And so this was. It was supposed to be GPT5, blow everybody's socks off. We're at the next level of smarts. But they released it as 4.5 and it was super expensive if you use the API. And then I used it a little bit and it felt kind of like more natural. I used it to get critiques of my writing at the time, like, hey, what's wrong with this essay? And it felt like more. It's hard to describe. It felt like more human or something. I thought it was great. But they took it away, right? It's gone. You can't use it anymore.
[17:58]
B
What was. Why did they take it away?
[18:00]
A
So, interesting theories, right? But one for sure is, you know, it was ten times the size of GPT, four point whatever, right? A lot more expensive. Like, requires ten times more servers. Like, it's just. And. And it's like, it's a bit better. Like, some people were like, yeah, no, it's. I mean, I can tell it's a bit better, but like, you're like, yeah, but it costs 10 times more.
[18:20]
B
It costs 10 times more. And, you know, the, the results are, you know, maybe not as obvious as,
[18:26]
A
you know, if I play against the Chessbot that's on my phone, like, it will beat me, right? If I play against whatever, Alpha chess, the best chess player in the world will also beat me.
[18:36]
B
It'll also beat you. You don't notice.
[18:38]
A
So, yeah, reportedly it's $500 million that they spent on that first run. And it was, it was like, the model was fine, right? We gotta try again. We gotta get the smarter one. By that time, you're up to a billion dollars in compute. So have you ever had a project this big fail?
[18:53]
B
No. No, I haven't. I can speak from experience. I've never had $500 million project and
[18:59]
A
it like, flopped because that's a huge failure. But there's also this problem, right? The business case is predicated on that they're gonna make this forward progress. So it could be devastating. So that's probably why they did a second time. They're like, we're not giving up on this in the middle of this. Ilya, the guy we're talking about, he left. So he just, he left. OpenAI, not a good sign.
[19:19]
B
There has to be some kind of mitigating factor to why. But to this point they've been operating on the premise that if they just give more computing data that it will increase according to this chart.
[19:29]
A
Yeah, we shall find out. Right. I think a lot of people have this perception that these labs, like OpenAI, they're huge, they're making all this money, they're sitting on these big piles of cash, people are paying them for this product. Like it's an amazing place to work. But if you think of it, it's really high stress. Right. Like they need to keep this promise going. It's very important for their valuation that they're always able to have the next exciting model. Like the whole thing is premised upon, you know, number go up.
[20:00]
B
That's most corporations and.
[20:02]
A
But just being the hottest one with this huge like valuation. And it's not like Apple where they have phones and stuff like installed. It's like they have this API that you. And if it's not getting better and if there's alternatives, it can very quickly.
[20:16]
B
Yeah. And I guess the thing is that when people didn't like it and you say, well, why? It's like, well it just didn't feel as good. It's like, are the results based on feelings? How do they quantify, I guess the benchmarks?
[20:33]
A
Right. In the early days they tested against the LSATs, like lawyer tests, the GRE, like graduate test.
[20:39]
B
Right.
[20:39]
A
Again like.
[20:40]
B
And then when they had the 500 million dollar model and the 10 times more expensive, did they perform those benchmarks and it was like way better. Was it like 10 times better?
[20:49]
A
No, no, it was like a little bit better. Right? It was.
[20:52]
B
Oh, okay. So like that's what they're basing this result on. It's not.
[20:55]
A
Oh yeah, it's. You got an 82, it got 84. And you're like, oh, but it's 10 times better.
[21:00]
B
And you're like, maybe it's one of those phenomenon where like yeah, but that last 10%, it's like being perfect very hard. It's way. It's like a logarithmic SC where you could put in 10 times but you're not going to get 10 times improvement on that score. Right. You're Going to have to put a hundred times in to get 10%. The final 10% is the hardest.
[21:22]
A
So this goes on for two years. So they build this giant model. It's not great. And semi analysis and industry research places OpenAI's leading researchers have not yet completed the successful full scale pre run that has been broadly deployed since May 2024. That was GPT4O. So like that's not good, right? It's like the, the main thing they do, they haven't been able to do a new one and time is, is passing on. Right. Internally you have to imagine they're, they're trying all these things and that they're not like moving things like, like they see and even inside people didn't really agree on why this wasn't working so
[22:01]
B
they looked into it and they couldn't figure it out.
[22:03]
A
Well, I mean, what's your guess?
[22:04]
B
Something to do with the core way in which it operates. It got to the point where more hardware isn't going to actually make up for improvements in the algorithm.
[22:14]
A
So in December 2024 like Ilya who left so he left OpenAI. There was a whole kerfuffle with. He tried to get Sam Altman kicked out and he failed at that.
[22:25]
B
Corporate drama.
[22:26]
A
Yeah, corporate drama, right. The researcher guy tried to let the power move in the executive people.
[22:31]
B
I'll make a movie about that someday.
[22:33]
A
Yeah, right. Anyway, so he starts a new company and then he's at Neurips, he's at this big conference where he's being presented an award for his great earlier work that led to all this. And he gives a talk pre training
[22:45]
B
as we know it will unquestionably end. Why will it end? Because while COMPUTE is growing better hardware, larger clusters, the data is not growing, but we have but one Internet. You could even go as far as to say that data is the fossil fuel of AI.
[23:02]
A
So like they made these early versions, they scrape a lot of the Internet, they scrape all these books, they, they feed it to it and it's great. And then you're like okay, we need 10 times more. So like okay, we used to download the source code on a GitHub repo. Now let's get every revision right, let's get all the history. Well, it's just like less good data, right? Or like we got every Reddit, you know, important posts. Let's go to the really obscure forums or let's. It's like there's just less good data out there.
[23:30]
B
It seems that they've reached the limit to what the core algorithm can actually solve, given its data. We operate every day without the whole Internet to figure out answers to questions. Right?
[23:44]
A
Yeah, yeah.
[23:44]
B
So if you need the whole entirety of the human Internet to be a little bit better, then maybe you're not using the data you have as efficiently as you should be. You've eaten all the good parts, only the crumbs are left, and they're not going to get you where you want to be.
[24:00]
A
Yeah, I agree. And so they call this the pre training wall. Pre training was the original training that they were named the pre training wall. They're like, we just can't. There's nothing here. Like, it's. We can't get past this. We've, as Ilya says it, right? Like there was these fossil fuels, which was all of the Internet and all these books, and we ate it all. We're out, we've eaten, we've hit peak oil. There's nothing left.
[24:21]
B
There's nothing.
[24:22]
A
This is. So they have to find another way. Right? Okay. So to understand what happened next, like how these improvements happened, we have to go back to DeepMind. Right? So DeepMind was the people who released the chinchilla paper. But the more important thing was like, I don't know if you remember, like a decade ago, there was this alphago. Do you remember that?
[24:43]
B
Yeah.
[24:44]
A
The game Go.
[24:45]
B
Yeah, yeah, I remember that.
[24:46]
A
These guys, they had this DeepMind company got bought by Google and originally they. They started it playing Atari games. Then eventually they did Go and then chess. And the. The way that they trained it was this reinforcement learning. So they create something, they get it to play GO against itself, and then whichever one wins, they like let that one continue, make two copies of it.
[25:08]
B
So like kind of like a evolution type thing?
[25:11]
A
Yeah. So it learns, but uniquely, it doesn't need the Internet. It's not reading Go books, right. It's playing Go, creating its own data. It's creating its own data by playing the game against itself. And when they originally created this, Go was considered like uncrackable. And then they had this big. Google had this big tournament against the best Go player in Korea, Lee Sedol. Nobody thought that this thing would beat him, and of course it crushed him because it had been playing GO against itself, like for the compute equivalent of bazillion years, right? It's just like learning and learning and learning, right? So it's creating its own data, as you said, right. Which is like a great solution to this problem. But it needs a scoreboard, Right?
[25:54]
B
Somebody has told it what is the preferable outcome.
[25:57]
A
Like in a game, there's rules and you know when you win, right? So you can generate data because you can always figure out, oh, did I win? Yes or no. Right? So they, they started with this training, right, that became pre training and they added on this chat thing the instruct. Now they add on this new step, which is reinforcement learning, so they call it rlvr. But basically they need ways to have a, an action that we need the LLM to take where we can verify if it got it right or not. Okay, what's an example of something that's easy to verify if you got right? Like math.
[26:28]
B
Yeah, math. Or anything that has like a right or correct answer, right?
[26:31]
A
Or I think the most impactful one of recent years is like coding, right? You can write code and it will
[26:39]
B
work or it won't.
[26:39]
A
It'll work or it won't, right? You can run the compiler, see if,
[26:43]
B
if it works, there's some nuance there because you can write code that will work but isn't good.
[26:48]
A
Yes, I know that. I used to work with you. I know.
[26:50]
B
Yeah. Oh, thanks. Cheap shot.
[26:53]
A
Cheap shot, yeah. And so the cool thing is that they just lean into this, right? So this is a new way to generate data that OpenAI comes up with in their panic and they kind of keep it to themselves. But if they can, you know, they can ask the LLM to. Yeah, to come up with the solution to a bunch of calculus problems. And they ask it to like think out through all the steps, right? So let's ask it 12 times with random problems to solve this calculus problem and think it out step by step. And most of them are wrong, but maybe one is right. And so then they take that one where it got it right, and they can feed that back in as training model, like update the weight and they just start doing this in loops, right? Because once they get it to successfully do some calculus, then they update all the weights. Now it's a little bit better and they can give it more problems, get more right answers. Now they're doing this DeepMind, like go thing, right? They can take their lm, do like thousands and thousands of generations and getting better at problems as long as the problem has like somebody to say like
[27:50]
B
this, right or not.
[27:51]
A
So now they're generating their own data. So this, this becomes 01, this GPT model. And so in a way it's like they had this wall of training and like they hit this wall and then they found just like a new dimension, right? So they can grow by Generating their own data in another direction.
[28:08]
B
Yeah, I mean that, going back to his analogy of how that's the fossil fuel of AI, it's like they've just come up with a more efficient comb
[28:17]
A
engine or a renewable resource, right? Because here the thing, thing, it's like you take the LLM and it can play its own games and if it succeeds, you're feeding that back in, right? So it's renewable in that it's generating its own data. If you have a way to score it, right. I mean, in the places where you can verify the answer, it can learn. But okay, yeah, so in February, so now we're like coming close to modern day, right? So in February 2025, we haven't even talked about anthropic, but anthropic releases Claude code. And the cool thing, I don't know where it happened with them and they, they've never confirmed it, but all of a sudden these LLMs, they don't necessarily start doing a lot better at all different trivia, but they just start getting super good at coding. And the, the theory that I think is pretty much confirmed, right, is just like anthropic builds cloud code, but they can train on cloud code as well, right? So they have all these problems and then they can run cloud code through it and then when it works, they're like, good job, cloud code. And they reinforce it. And so it gets better and better at coding. It's not necessarily better at all kinds of other things, but this is like a very clear signal. If we have a bug on this project and it can solve it, it learns to get better and better at these things, right? And so that makes, but that makes all this synthetic data, right? If they have cloud code run on a problem and it solves it correctly, then, you know, they end up with this thing where it's step through things, right? They're creating their own data, as we said. But so this is a big, this is like a second big breakthrough by OpenAI, right? They have a new way to generate more data. They kind of keep it closely held, but at the same time the government is getting antsy about, you know, AGI. The US is like, I hope we get AGI and not China. And then they outsmart us and, and destroy the world. Or maybe the other idea is the government's just worried like, hey, this is going to be huge industry. We want this industry to be American, right? And so they start putting in place controls on Nvidia, telling Nvidia like don't sell GPUs to China. Like we just don't want that. And Nvidia doesn't love that because they're like, we like to sell these things so we make a profit on them. Right. Companies can buy these Nvidia GPUs but they are handicapped. So they're super good at doing GPU stuff, but they have a very low memory.
[30:28]
B
They did that back for bitcoin mining too. That's when it started.
[30:31]
A
But now it's not a tariff, I guess, but it's like an export constraint. The Chinese just can't get the ones that you can buy here.
[30:37]
B
Yeah, there's, there's proprietary processors meant specifically for AI that Nvidia makes.
[30:42]
A
They're not allowed to sell them. Yeah, to China. So in China there's this company called Deepseek that we talked about at the beginning and they, they spun out of this hedge fund because the hedge fund wanted to do all this machine learning, I'm assuming to like predict the stock market. They decide like we're going to build, we're going to build our own AI. Right.
[30:58]
B
And because they couldn't get the Nvidia chips.
[31:00]
A
Yeah. So they could get the Nvidia chips, but not the really high end ones. They could get the less high end ones. And so they couldn't get the H100, which was the Frontier Lab, ones that cost like $30,000, you know, per. And you end up in the server. They end up putting like eight of these in. So they're very expensive. I mean they would have bought them but they weren't allowed. Right. So they could only get this One called the H800s that just are much less good at talking to each other. And the problem is you need a whole cluster of these to make it work. And so deep seek is like, hey, we gotta crack this code. Right. They released a paper about what they did. So here's one of the things.
[31:38]
B
Low precision training is often limited by the presence of outliers in activators, weights and gradients.
[31:44]
A
So this is one of their tricks where they, they were able to lower the bits. Like it's like they're running like a, like an N64 game on like a NES 8 bit. Like they were able to lower the bits without losing the accuracy somehow, which
[31:59]
B
let them like MP3s, right?
[32:01]
A
Yeah. And then they're able to do more with less.
[32:04]
B
Only 20 SMS are sufficient to fully utilize the bandwidths of IB and NVLink.
[32:10]
A
So what happened there is these things were handicapped at how quickly they could network to each other. And so they found a way to use some of the layers as like a network card so that they could more quickly talk to each other.
[32:23]
B
We employ customized PTX instructions and auto tune the communication chunk size.
[32:28]
A
So basically they figured out the instruction set, or maybe it's known for Nvidia GPUs and instead of using the normal like SDKs, they wrote in assembly how the instructions would work, sidestepping how Nvidia does things so that they get performance speed up. And so they, they published this whole paper on this, they published this new LLM and it blew people's minds. Like, it's a more gritty approach. Right. It's like we're, we're constrained so we need to come up with a different way.
[32:54]
B
Yeah. I mean, and that's how a lot of things were back in the early days of software development. And you had to be very aware of how many bytes of data certain fields were because you only had so much to work with.
[33:08]
A
So the government, like the US government, tried to prevent China from getting a heads up by putting these constraints in place, but the constraints actually just taught these Chinese companies how to do with less. Right. Maybe it even advantaged them because now they can operate on a smaller budget.
[33:23]
B
Yeah, I remember when this happened. They came out with their deep sea and it was like Nvidia was freaking out because, well, if they can do this with their constraints, what's going to happen to us? Right. The gravy train might be coming to an end here because obviously we have all the unlimited hardware and we can't perform as well as this. It's almost like we should have been looking at how to optimize our AI instead of just throwing hardware at it.
[33:53]
A
Yeah, exactly. So they published in their paper that it cost them to do this about $5.6 million, which was a little bit misleading because they were only talking about one specific stage of the training, but that got published and people were using it and they're like, this thing's amazing. And meanwhile OpenAI is saying like we spent a billion and it didn't, we
[34:11]
B
didn't get the same results.
[34:11]
A
Yeah. If we didn't get an improvement. Yeah. Everybody panicked. Right. The Nvidia stock fell. Everybody was like, what's, what's going on here? Other thing that happened is when they published this paper and they released this model that they called R1. So one thing is this has to do with the moat. Right. We found ways to work with less. Right. The other thing is the deep seq people publish in their, in their R1 thing, this whole reinforcement learning idea, the OpenAI, this is their new secret, right? They're like, oh, we can give this thing rewards, have it sync out, provide this feedback. So R1 uses the same trick. They came up with it on their own. Hey, try to solve these problems, try to think it through and we'll take all these results and then we'll feed them back and they publish exactly how they train it. OpenAI's new trick, that's going to blow away the market. This Chinese company just, it's just open, just put it in a PDF and put it on GitHub.
[35:07]
B
Right. Eventually like enough people are going to come to the same conclusion independently. I mean that's how most inventions happened, right?
[35:14]
A
Maybe. But like not right away, right? And, and not right away.
[35:17]
B
Yeah, I guess they were hoping that they could keep that secret a little bit longer. That's our secret sauce.
[35:22]
A
That's our secret sauce. Yeah. So they called it RL on verified reward loops and they described this like multi stage pipeline and that there was this like aha moment where they saw after doing this feedb that the, the, the LLM started to talk to itself and say like, oh, that seems like the wrong answer. Maybe I should try this. And in its thing you're seeing like oh, whether it's thinking or not doesn't matter, but it's starting to be able to put out reasoning loops of like following a chain down one path, backtracking going down another. They're like, oh, we're onto something, right? So they get these reasoning loops where it's succeeding. It's like thousands of generations of it generating a bunch of questions, verifying which are right. Feeding it back in. It's, it's learning, it's generating its own data and they just put the model out, open weight for people to use. They put out, here's how we built it, right? Because they're part of a hedge fund. Like this isn't how they make their money. And it's like, oh my God, this is a crater into this whole capitalistic venture of building these amazing models, right? It's because on the one hand, AI models, they're amazing. The work that they can do, it's ridiculous, right? But it's just the most, it's such a powerful tool. But at the other hand, yeah, you create it and then somebody else is right behind you and then quickly the value of them is like, is like going towards zero.
[36:37]
B
Well can, to use like a, an older analogy, if you think way back in the day when, you know, they came up with tcpip and if they had a walled that technology off and been like, only we know what tcip, how it works. Right. Somebody else would have figured it out.
[36:52]
A
Yeah, yeah, it's networking, because networking is very much about connecting. But yeah, I get your point.
[36:56]
B
Yeah, it was a technology that, you know, they could have held onto it and said, this is how, this is how it works. And now we are the holders to anybody who wants a network proprietary technology that like makes networking work and you have to pay us a license to use it. But other people are going to figure that out eventually, right? Or they'll come up with an open standard and be like, well, you know, everybody should use this because it's easier.
[37:18]
A
Yeah, exactly. So if you, if you go back to where we started, right, we have this like pre training that's actually really training and then this like thing to make it chat, like, and then this thing to do this reinforcement to generate its own data. Right. So like Deep Seek figured out how to do this part, right. That nobody else could, and they figured out how to do it very cheaply. But like doing that first step of consuming the whole Internet is still really expensive. Right. And so like you could think like, oh, that's like a moat, like getting all this data and putting it together
[37:49]
B
and the barrier there is higher because you have to consume the whole Internet and that's something that's logistically hard to do.
[37:55]
A
Yeah. But enter Mark Zuckerberg. Right? So at the similar time, I'm not sure the exact timeframe, Right. Facebook meta, they start building their own base model. They like, we don't want to be left out of it. And then when they release it, they say like, hey, we're not actually in the business of, of being like a, an LLM serving API or something. Like we're, we sell ads about. I don't know what their ads are.
[38:20]
B
Yeah, I don't, I don't use Facebook.
[38:22]
A
Yeah. And so they say like, hey, if you're a researcher, you can just download our model, just ask for permission and you can download it. And so they do that. And then very quickly one of those researchers downloads it and just puts it on BitTorrent. Because literally. Yeah, why not?
[38:38]
B
Oh yeah, why not?
[38:39]
A
Why not? And then Facebook demands that they meta, I guess, demands that they take it down and, but it's, it's too late. And so they change their stance. They say like, no, this, I mean you can use this if for non commercial purposes. Just, just grab it and use it, right? And this, this becomes llama. So this is like the first, I think, open weight model. If you have the GPUs that you can run this on, like you can just grab it and use it for free. So Facebook spent whatever the $500 million to consume the whole Internet, which is weird, right? Like, why would they do that?
[39:10]
B
I don't know.
[39:11]
A
Here's what Zuckerberg said.
[39:14]
B
In the early days of high performance computing, the major tech companies of the day each invested heavily in developing their own closed source versions of Unix. It was hard to imagine at the time that any other approach could develop such advanced software. Eventually though, open source Linux gained popularity. Today, Linux is the industry standard foundation for both cloud computing and the operating systems that run most mobile devices. And we all benefit from superior products because of it. I believe AI will develop in a similar way. That makes sense to me. Right? Yeah, it's like the premise of open source software.
[39:49]
A
So there's like this business strategy, I heard about it from Joel Spolsky and it was called commoditize your compliment. Right. And so it's like if you, if you sell a product and along with this product something else is used, if you can actually decrease the cost of that thing, it makes your thing more valuable, right? Like if electricity is super cheap, electric cars are more valuable, right. So if you're an electric car company, if there was some magic trick to make electricity cheaper, it would help the value of your car. And so meta's like, we're not in the business of LLMs, but we're going to need them, right? We're going to need them to like judge if somebody spams, scamming comments or whatever. Right. We don't want to pay these absorbent fees for OpenAI or whatever, we'll just build our own and then because it's not our business, we'll just give it away. And it also allows me, you know, probably to give the finger to these other companies, right?
[40:39]
B
Yeah. Subvert them.
[40:40]
A
Subvert them in a way, Right. So that's what they did. Right. So that erodes like another thing right now, the base thing, that's very expensive, you can just get it and maybe it's not as good, but it does exist, right?
[40:52]
B
It creates atmosphere of competition.
[40:55]
A
Exactly right. Except he's not doing it. It's not necessarily for charitable reasons. And then it's helpful, like from his perspective, he's if we make the one that we give away for free and everybody else builds on it, we benefit from all of those Things, Right?
[41:08]
B
Yeah.
[41:09]
A
If you build some ORM internally, that's like a crazy dawn creation. You have to maintain it and whatever. But if you build one and then release it and the industry starts using it, they make improvements and then you can pull those in. Right.
[41:22]
B
Oh, there's a. There's an interest in increasing their own business.
[41:25]
A
Yeah. I forget where I am, so. Yeah. What were the original texts that I sent you? Did we. Have we answered any?
[41:32]
B
Yeah, no, I think we have. We've. We've covered it because, like, R1 is on GitHub, llama's on hugging face. And what's this 880 billion for R1 was the model that, that Deepseek came up with. So it is. It has the more efficient algorithm because they were constrained by hardware restrictions. So they came up with a, with like a better, a better way of doing it that wasn't locked into, like, you know, Nvidia's model. And Llama was the, was the Facebook training model that had all of the Internet included in it. So you didn't have to go through all the work of combing the whole Internet. There's no secret sauce. There's no special sauce. Everything's open. You can, you can get a model. It's all, it's all out there for people to develop. There's no, there's no reason why somebody couldn't make our product.
[42:25]
A
So then I think the only thing we haven't answered is like the very first part. Right. Which is like Greg Brockman being like, oh, the.
[42:31]
B
Yeah. I think of Spud as a new base model, like a new, A new pre train.
[42:35]
A
It sounds like he's saying, we're two years. We're two years ahead of everybody else. So Orion, we talked about that was like their $500 million and then a billion dollar run.
[42:45]
B
Yeah.
[42:45]
A
It just didn't amount to anything. Became. It became 4.5, but then they, they pulled the plug on it. So Spud is one of their new models. And so Brockman, in that video I shared was like, from a couple days ago, talking about Spud, their internal model. And then they released it. So it, it became GPT 5.5. Right. Six days ago, they released GPT 5.5. It says a new class of intelligence for real work. Okay, well, like, every time a new. They release a new model, they say revolutionary, of course.
[43:13]
B
Yeah.
[43:13]
A
But there's some interesting things going on here. Right. And I think his quote is the only thing we haven't unpacked.
[43:19]
B
It's the new basement. It's the new pre train.
[43:21]
A
It's a new pre train. Right. But like so, so what is it? Because like we discussed this problem wherein they keep. They tried making them bigger but there was no good data. They made one 10 times the size and it wasn't better.
[43:34]
B
They were diminishing returns.
[43:35]
A
Diminishing returns. And now they have another big one. So what is their secret?
[43:40]
B
Yeah, well wait a couple weeks, it'll get leaked or you'll figure it out
[43:44]
A
because there's this other problem that happens. Right. And so there's this process called distillation. If I am building a new model, I have this 4.5, let's say that was super huge and very expensive to run and was like a little bit better. Well I can chat with it and similar to this reinforcement learning process, I can take that chat log and I can take a smaller model that's not giant and I can train in on that. Right. And so it's learning from the bigger model. It's like it's teaching a small model that I can run at lower cost. The great answers that the bigger model had and like it can't learn at all because it's just, it's, it's not as big but it will get a lot closer.
[44:21]
B
It's like a senior teaching a junior. Right. It's like I went through Some, the last 20 years of. I've seen some stuff.
[44:28]
A
Yeah.
[44:28]
B
You don't have to do all that. You don't have to make all my mistakes. Just do this. Yeah.
[44:33]
A
And so they, they call this like distillation.
[44:35]
B
And if it's like my son, he won't listen to me, he'll make the mistakes anyway and then he'll be like oh yeah, it turns out you were right.
[44:42]
A
That's the learning process. But like the, the big labs do this themselves, right? There is like GPT4 4 and there's like GPT4 4 mini and mini costs less and it's still really smart but it's much smaller and faster and cuz it's like they took this big model and they distilled it, they got all these important lessons from it and gave it to the small1. Well, OpenAI and Anthropic made all these allegations against Deepseek and other open weight companies that this is what they're doing because whenever there's like a new super smart model out, six months later there's an open weight model that seems just as smart. The only thing that you need to do this distillation really is the logs of chats from these smart models, which is in fact their product. If you want to make a smarter model, and I have one, just like have lots of chats with my one and then train your model on it. So it's even more like competitive pressure. Right. If I come up with a really smart model, by definition you can use it to make yours smarter. Right. So this is another moat problem, I guess. Right. And if you look at OpenAI, they have these thinking models, but it won't actually tell you what the thinking is.
[45:52]
B
Well, they don't want people to know how it's thinking because that's, that's their trade secret.
[45:56]
A
That's their secret sauce. And so even without that black box, they're still alleging, I don't think they're lying, that these Chinese labs and other open weight companies are just using their service and using that to train their models. Because like, why not?
[46:09]
B
Then you can use their models and you can see exactly how it's reasoning.
[46:12]
A
Yeah, but so every time they come up with a super smart model, like a couple months later, probably there'll be a new smart model from the open weight companies. Because it's easy to extract things. Right. It's like the problem of MP3s, like it's easy to copy this information. Not as Easy as an MP3, but like sort of easy. The moat is leaking. So if we go back to the 880 or 850, like what can be, where's that value hiding? Right.
[46:36]
B
I think a lot of investors are
[46:38]
A
asking as well, but like I, I'm not so negative on OpenAI. I think they could be a very valuable company. But where, where is that value?
[46:45]
B
Well, what's the good way to try and get some money? They're not subscribing, but we do have subscription.
[46:50]
A
I mean, I'm a subscriber.
[46:51]
B
Yeah. But I think that overall the money that they get from subscriptions isn't at the levels that they would need in order to call it a success.
[46:59]
A
So ads, man, that's the worst. I don't want that.
[47:01]
B
That's, I think they're talking about it like that's how they have to monetize. Right. There are a lot of people are using their product even though there, there are alternatives, brand loyalty, brand whatever. But it just means that you're going to have to meet them where they are.
[47:15]
A
Like, I don't know, like businesses, businesses are using like, I think if you look at it frozen in a moment, right. Like the amount of spend that my company is Spending, like, so it's anthropic, but it could easily be OpenAI just. Just for, like, coding subscriptions. Right. It's massive. The amount of money we're pouring into this company is huge. So I think at any given moment, like, that's real. Right. They're making a profit off that. But the challenge is. Yeah, that it. It can quickly diminish. And there's competitors if there's an open
[47:46]
B
source alternative as well. Right. Like. Well, like, why would your business keep paying a subscription for hundreds and hundreds of dollars on credits when they could just, you know, use this open source alternative that's maybe even locally hosted.
[47:58]
A
Yeah. Or just like there's a company that hosts it, you know, at cost because they didn't have to build it. Right. The open source model didn't have it. Lots of those exist. So it's a Red Queen's race. Right. There's. Have you ever heard this term before? No, I love this term. So it's like through the Looking Glass from Alice in Wonderland, and the Red Queen has a race and they're running, right. And Alice says, like, why aren't we moving? Like, they're running and it's like they're on treadmills and they're not going anywhere. And the Red Queen says, like, oh, here. Like, you have to run as fast as you can just to stay in the same place. If you slow down, you go backwards, but as fast as you run, you stay in the same place.
[48:33]
B
No one can ever win.
[48:34]
A
Yeah. As soon as you have an advantage, keeping that advantage requires working just as hard as you did before. It's a great metaphor for this process. Right. It's like, okay. Every year or so, OpenAI is coming up with a new breakthrough that lets them push the frontier. Or anthropic is. So the open weight models right now are all just like, let's say six months behind. Maybe. I don't know about this new release, but previous to Brockman saying we're two years ahead, the gist is kind of the open weight models are six months behind, but in that six months, the models got so much better that everybody's paying for the premium service.
[49:16]
B
Yeah, well, it's like movie theaters, right? You can go see it in the theater. It's very expensive.
[49:19]
A
It.
[49:19]
B
You get to see it first. There's a lot of people who just wait.
[49:22]
A
If you want to be six months behind, you can use a cheap model and it's fine. But right now the curve is so high that it's. No, you got to get on the new thing. Everybody feels that way. If that curve flattens out, it's over, right? If that curve keeps going up, though, oh my God, who knows where we're going to end up? It's the red queens race. They like all these frontier labs. If that's their product, they're running as fast as they can to stay right. Like right now, ChatGPT has my $20 a month. Anthropic has my $100 a month for the coding agent. But if something better comes out than those, or just everybody else catches up and has a cheaper service that's sold at cost, that money's gone. Anthropic has to go as hard as they can just to keep my money, because I'll just switch. There's this line from Stuart Brand. People always remember, it's like information wants to be free, but his actual quote was, information wants to be free, but information also wants to be expensive. Some information is just so valuable, but at the same time it's free to share it. And like these companies are in this place where they have something that's so amazing, this amazing breakthrough with these AIs that are so valuable, but yet it's depreciating like nothing, like a, like a peach and like a summer day, because everybody's catching up. And so he, he comes up with this new thing, right? Brockman's two years of research is coming to fruition. That's not modesty, right? He's trying to tell people, hey, actually, I think we're more than six months ahead, but the news came out, so Semianalysis, who we talked about before, they said this is the first new scale up in pre training since GPT4.5 bigger model. So they're back on the curve, right? So this curve that we followed this curve up so far and then they could never get past it. Now they're claiming, we're up, up, we got past this wall. Ilya said, there's nowhere else to go up here, but we found a spot, right? Ilya left the company, we're here, we found a way, right? There's these two founders, one the engineering guy and one the researcher. And the researcher left and said fossil fuels have been exhausted. But they're saying like, hey, man, no, actually it's still going. We found something. What are their fossil fuels? And one interesting thing is usually these GPT models have been getting cheaper and cheaper over time. This 5.5 that they just released costs four times as much per token per conversation, okay?
[51:43]
B
So it's obviously going to be more of a moneymaker then.
[51:46]
A
Well, maybe. But 4.5 was also super expensive, and then they pulled it. And the reason is these things get bigger. Like, they just become more expensive to host. They're like, no, man, this costs more. Like, this is a big one, right? This is a chunky model, the biggest we've ever shipped.
[52:03]
B
But will the results be, like, compelling enough for somebody like yourself?
[52:08]
A
Is it worth it? Right. Four times more is a lot. So I don't know. And like, whatever, the podcast will go out, somebody's listening to this, and it's a year later, and we'll know. But, like, it doesn't matter because the next one and the next one and the next one, the race keeps going. I could not get a subscription and use this open weight model from a number of providers. Or I could just pile up some GPUs here and run it or. Or whatever. I have friends who will say, like, hey, this is all a trick. The OpenAI gets us addicted to coding using these coding agents. Then they'll, like, jack up the price and everybody forgets. But coding works.
[52:40]
B
That's what I was. That's what I was, like, worried about. It's a risk for your company, and you want to limit the exposure that you have to that risk. If they start relying on it, then the risk is that Anthropic could jack the price by four times.
[52:54]
A
Yeah, but I'm saying, hopefully, why there's less of a concern if you can jump to a free model, because the free models are always just a little bit behind. These companies are actually fighting tooth and nail with each other. If both anthropic and OpenAI collapse, we'll just lose the latest six months because everybody's racing to keep up. These things are not going away. You can. You can torrent that Llama version. Like, it's not. Not at the lead anymore, but people use that Llama as the base model. They add all their training on. They. They do the distilling, the anthropic and whatever is mad at. And, like, if all of these companies explode, we still end up with. With just like the open weight models of six months ago. And there's a bunch of companies that host these. And you can use, like, open router.
[53:43]
B
Yeah. The cat's already out of the bag, right?
[53:45]
A
Like, cat's out of the bag. This isn't going away. Okay, we gotta wrap this, though. Okay, so let's go through it. Yeah. And so what were the original quotes?
[53:53]
B
There was like, the OpenAI president says, I Think of Spud as a new base, as a new pre train. And then there was, like, the Google memo that was like, guys, we don't have any moat and nobody does.
[54:06]
A
So what's your feeling? True? False.
[54:09]
B
The memo leaked well before the announcement of that new base model. Right. So it could have been true at that point, but maybe it's not. Maybe the moat is now in the new. The new base model.
[54:20]
A
See, I feel like it's still true because Brockman may think that they have a moat, but he's saying we have a moat. That's two years. And, like, I'm very odd. He's like, we think we know six months this time. Now we have two years. That's. We need to reinvent ourselves in the next two years or. Yeah.
[54:35]
B
It seems like the underlying premise of them having something over other companies is temporary. It's still. So it can't be something that won't eventually be discovered.
[54:45]
A
Yeah. So what do you think? We have no moat, and neither does OpenAI. True or false?
[54:50]
B
I feel like it's false.
[54:51]
A
Oh, I feel like it's true, but, yeah, interesting. I mean, I guess it depends on the timeline.
[54:55]
B
Like, they have one, but, like, it's. It's a temporary one. Like, they maybe have one for now. Right. But when that one gets bridged, they will be stuck trying to dig out another one. They do have one, but it's not. It's not permanent.
[55:08]
A
Yeah. Okay, and then the first quote I think we understand. So he says, I think of Spud as a new base, a new pre train, and it's two years worth of research coming to fruition.
[55:19]
B
So the two years of research doesn't mean that they have two years of. Before somebody figures it out. Like, you could spend a lot of time on the research and development and then release it and somebody, like, copies
[55:28]
A
it and then what's the third one?
[55:31]
B
So the quote is, if all this stuff is already built, why are you paying 850 billion? What are you buying with that?
[55:40]
A
I think we actually agree on what the answer is. The answer is people are betting on this horse. They're saying, like, we know this one moat is only going to last so long, but we think this company will build the next moat. Right.
[55:52]
B
They'll keep the treadmill going.
[55:53]
A
They'll keep the treadmill going.
[55:55]
B
Everything's. Everything's, like, on open source. Why are we spending money buying $850 billion of something I can fork today? But you're not buying that. You're buying the process.
[56:04]
A
I think we understand it all. I think we. We got through it. What do you think?
[56:08]
B
Yeah, I think we figured it out.