Summary9 min read

Dwarkesh Podcast: Dylan Patel — Deep Dive on the 3 Big Bottlenecks to Scaling AI Compute

Date: March 13, 2026
Host: Dwarkesh Patel
Guest: Dylan Patel (CEO of SemiAnalysis)

Episode Overview

This episode is a rigorous, deeply technical, and often provocative exploration of the key constraints impacting the rapid scaling of AI compute. Dwarkesh Patel interviews Dylan Patel—one of the industry’s best-connected and most forthright semiconductor analysts—about where the bottlenecks truly lie as the AI industry, hyperscalers, and labs like OpenAI and Anthropic pour unprecedented amounts of capital into data centers, chips, and supply chains. The conversation spans capex spending, the economics and physics of semiconductors, supply chain strategy, geopolitical competition, energy, labor, and speculative futures for both Earth- and space-based computing.

Key Themes & Discussion Points

1. The Scale and Timing of AI Compute Expansion

Timestamps: [00:14], [01:41], [04:01]

The combined 2026 CapEx of Amazon, Meta, Google, and Microsoft is forecast at ~$600B; total AI-centered capex globally is nearing $1T.
Expansion isn’t all immediate: big tech CapEx is staggered—today’s spending on turbines or power agreements is for capacity coming online years ahead.
OpenAI and Anthropic have already raised record sums, but much of their forecasted compute—measured in gigawatts—not available immediately.
- “Anthropic needs to get to well above 5 GW by the end of this year. And it’s going to be really tough for them to get there. But it’s possible.” — Dylan [03:36]
Labs that move first and boldest in locking up compute (OpenAI) gain pricing and availability advantages over more conservative rivals (Anthropic).

2. How Labs Secure Compute, and Downstream Effects

Timestamps: [04:20], [06:17], [09:20]

OpenAI has “YOLO”-ed their compute procurement, signing massive, sometimes risky, multi-year deals with hyperscalers and lesser-known neoclouds.
- “OpenAI has kind of got way more access to compute than Anthropic by the end of the year.” — Dylan [04:20]
When demand exceeds capacity, late buyers must go to less-preferred providers and pay higher prices, sometimes 50%+ above baseline.
Compute supply contracts vary (5-year vs. spot purchases); as AI demand surges, short-term contracts cost more, and labs find themselves fighting over the scraps.
- “I've seen deals...as high as $2.40 for two to three years for H100s, which if you think about the margin…those margins are way higher.” — Dylan [07:36]
Financially, this implies gross margin advantages persist for those who locked in early; latecomers are squeezed.

3. Depreciation Cycles and Value of a GPU

Timestamps: [10:03], [11:21], [15:08]

Debate over GPU depreciation cycles: Michael Burry et al. predicted 3 years or less; Dylan argues much longer cycles are possible—even well over 5 years—because AI demand outstrips chip production and hardware is far from commoditized.
- “The other lens is...what is the utility you get out of the chip? ...Because you are so limited…the prices these chips is not...what’s the comparative thing I can buy today? It’s actually what is the value I can derive out of this chip today, right?” — Dylan [13:29]
As models get more powerful and efficient, the utility and thus market value of “old” GPUs goes up, not down.

4. CapEx Economics, Lock-in, Market Power

Timestamps: [19:37], [21:01], [24:51]

Alchian-Allen effect: as the fixed cost of compute rises (GPU price inflation), buyers put more value on higher quality outputs (i.e., better AI models).
Who captures the value?
- Labs with locked-in cheap compute enjoy windfalls. Similarly, chip, memory, and foundry suppliers (esp. Nvidia, TSMC, SK Hynix) accumulate substantial margin.
- “Who is able to accrue all the margin dollars is...potentially the cloud, potentially the chip vendors and the memory vendors. Until TSMC or ASML break out and they're like, no, actually we're going to charge a lot more.” — Dylan [22:22]

5. Semiconductor and Supply Chain Bottlenecks

Timestamps: [24:51], [26:07], [34:51], [36:51], [37:03], [41:06], [46:07], [51:31]

Chips/Logic Production: Nvidia outcompetes on “AGI-pilled” long-term orders with TSMC; Apple (once dominant) is rapidly losing its strategic position.
Memory Crunch: HBM demand, compounded by AI workloads’ need for large and fast memory, has driven up prices and crowded out the consumer market.
- “A third of their capex is going to memory.” — Dwarkesh [83:33]
Wafer Fabs and EUV Tools: Fabs (factories) and tools (esp. ASML’s EUV lithography machines) are the ultimate global bottleneck.
- “By 2028-29, the bottleneck falls to the lowest rung on the supply chain, which is ASML.” — Dylan [37:03]
- Physical and organizational complexity means even large prepayments can only accelerate fab/tool production so much.
Margins and Power: Memory market squeeze elevates AI, depresses smartphone/PC markets—“people are going to start hating AI even more...PCs, smartphones, getting incrementally worse.” [83:41]

6. China vs. the West: Geopolitics and the Future of Compute

Timestamps: [65:36], [66:21], [66:29], [69:29], [70:36]

China’s massive state capacity could, by 2030+, rival or overtake the “West” in pure chip production, especially if “timelines” for AI progress prove slower than expected.
- “Fast timelines, US wins; long timelines, China wins.” — Dylan [75:49]
But for now, China’s reliance on imported ASML tools and its lag in capex/networking means US labs have pulled ahead.
If progress is rapid, as in the last 12 months, US and allied companies will enjoy a period of massive returns on infrastructure investments.

7. Energy, Labor, and Space-based Compute

Timestamps: [93:26], [103:19], [110:28], [111:37], [114:38]

Power supply: No sign that energy infrastructure will bottleneck AI scaling before chips do. The US grid can be flexibly leveraged, alternative turbines, batteries, engines, and modular/behind-the-meter solutions abound.
Labor: Labor to physically build datacenters and power systems is a growing constraint, but could be alleviated by modularization and importation of skilled workers.
Space Centers: Elon Musk’s idea of space-based GPU arrays only makes sense when terrestrial energy or permitting becomes a hard constraint; in the current paradigm, deploying the scarce chips instantly, on Earth, is much more valuable.
- “All that matters in a chip constrained world is get these chips working on producing tokens ASAP in a world...space data centers will eventually be a 10x game, potentially as Earth resources get more and more contentious. But that’s not this decade.” — Dylan [122:13]

Notable Quotes & Memorable Moments

“There’s this meme: having the best model is a depreciating asset, but the reason it’s important [to have it first] is because you can sign these deals and lock in compute.” — Dwarkesh [10:03]
“H100 is worth more today than it was three years ago.” — Dylan [15:09]
“Who is AGI-pilled enough to buy compute in long timelines at levels that seem ridiculous to people who aren’t AGI-pilled?” — Dylan [29:30]
“If smartphone volumes are halved...the percentage of the BOM that goes to memory and storage is much larger and the margins are lower. So there’s less capacity to even eat the margins.” — Dylan [84:09]
“Chips are the bottleneck. You want them deployed, working on AI the moment they’re done being manufactured.” — Dylan [122:13]
“Fast timelines, the US wins. Long timelines, China wins.” — Dylan [75:49]
“If in 2019 that issue that Huawei was not banned from using TSMC, Huawei would have already eclipsed Apple as the biggest TSMC customer... It’s very arguable that Huawei, if they had tsmc, would be better than Nvidia.” — Dylan [142:25]

Key Technical Insights

Bottleneck 1: Semiconductor Manufacturing Tools (EUV)

ASML EUV machines: 70 built this year, 100/year by 2030; each supports only a few gigawatts of AI compute per year.
Upstream supply chain (Zeiss, Cymer, etc.) is artisanal, slow to scale.
- “You can't just train random people for this in the snap of a finger.” [51:32]

Bottleneck 2: HBM/Memory

AI chips need HBM, which is 3-4x less bits per wafer than standard DRAM; caps how quickly memory supply can scale.
As memory for AI grows, consumer device markets shrink.

Bottleneck 3: Power/Data Centers/Physical Buildout

Energy bottleneck now seen as less severe; creative solutions in turbines, batteries, permitting, and grid management.
Labor may be limiting, but is partially addressable with modular construction and global talent.

China vs. US: Who Wins the Compute Race?

If AI progress is rapid, the consolidation of capex, supply chain lock-in, and scale in the US/West is a major, perhaps enduring, advantage.
If timelines slow down (to 2035+), China can leverage central planning, scale, and eventual 100% domestic supply chain to catch up or surpass.

Technology Futures: Robotics, Cloud Centralization, and More

Humanoid Robots: Centralized cloud compute will drive many, with only action/interpolation at the edge, to conserve semiconductors and power.
Space Compute: Not viable this decade—deployment lag and reliability challenges with current chip supply.
Apple’s Downfall: As TSMC’s capacity shifts, Apple will no longer be the dominant customer; AI accelerators will take precedence.
Potential Wildcards: Technologies such as 3D DRAM could shift bottlenecks, but timelines are uncertain and require major retooling.
If Taiwan is lost: Airlifting engineers is not enough; global compute capacity crashes, growth slows massively.

Structured Timeline of Major Topics

[00:14-04:01]: Scale and timeline of CapEx & compute for labs/hyperscalers
[04:01-09:32]: How OpenAI/Anthropic are securing compute, competition over supply
[09:32-15:09]: The value lifecycle and depreciation of GPUs, implications for margins
[19:37-24:51]: CapEx logic, lock-in, who captures margins, Alchian-Allen effect in AI
[24:51-46:07]: How Nvidia locked up the chip/memory supply chain, logic, and memory—TSMC, SK Hynix, Samsung, etc.
[46:07-54:42]: Why EUV tool supply can’t easily scale, component-level bottlenecks, ASML supply chain complexity
[55:00-65:36]: Can “old” process nodes step in as a fallback, real limits of non-leading-edge chips
[65:36-75:49]: China’s potential to catch up; “fast timelines, US wins; slow timelines, China wins”
[75:49-91:42]: Memory bottleneck’s implications for AI scaling and the consumer market
[93:26-111:37]: Power infrastructure scale-up, labor constraints, modularization, global logistics
[114:38-124:39]: Space-based compute: why it’s not imminent; edge devices vs. centralized brains
[126:08-134:06]: Differences in compute network topology (Nvidia vs. Google vs. Amazon), scale-up domains
[134:06-144:36]: SemiAnalysis’s data, how and why it’s used for financial advantage, Apple’s waning dominance
[144:36-end]: Robotics, centralized intelligence, implications for supply chain risk, the Taiwan scenario

Final Takeaways

Chips (and the tools to make them) are the ultimate bottleneck for AI scaling—far more so than power, land, or even memory, though all play a role.
Labs and companies that moved first and are “AGI-pilled” enjoy compounding economic and competitive advantages.
Geopolitics and industrial policy (esp. in China and the US) will determine who “wins” as the field centralizes.
Space, robotics, supply chain modularization, and even “slow-mode” AI all have roles to play in the battle to maximize useful compute.
The landscape is shifting so rapidly that even inside players and their suppliers can misjudge the pace and magnitude of change. As Patel observes: “Our numbers are always too high—until suddenly, they aren’t.”

Listen to the episode here for more: The full discussion is packed with industry anecdotes, contrarian insights, and an on-the-ground look at what’s really driving the next wave of AI infrastructure.

Loading summary

Transcript199 lines

[00:00]
A
All right, this is the episode of My Roommate Teaches Me Semiconductors.
[00:04]
B
It's also the sendoff for this current set.
[00:07]
A
Yeah, after you use it, I'm like, I can't use this again. I gotta get out here.
[00:11]
B
No sloppy seconds for door catch.
[00:15]
A
Okay. Dylan is the CEO of Semianalysis. Dylan, the burning question I have for you. If you add up the big four Amazon, Meta, Google, Microsoft, their combined forecasted CapEx that you published recently this year is $600 billion. And given yearly prices of renting that compute, that would be like close to 50 gigawatts. Now obviously we're not putting on 50 gigawatts this year, so presumably that's paying for compute that is going to be coming online over the coming years. So I have a question about how to think about the timeline around when that capex comes online. Similar question for the labs where OpenAI just announced that they raised $110 billion. Anthropic just announced they raised $30 billion. And if you look at the compute that they have coming online this year, you should tell me how much it is. But isn't it another 4 gigawatts total that they'll have this year? It feels like the cost to rent the compute that OpenAI and anthropic will have this year to sustain their compute spend at 10, $13 billion a gigawatt, those individual raises alone are enough to cover their compute spend for the year. And then this is not even including the revenue that they're going to earn this year. So help me understand first, when is the timescale at which the big tech CapEx is actually coming online? And two, what are the labs raising all this money for if like the, the yearly price of a 1 gigawatt data center is like $13 billion?
[01:42]
B
So when you talk about the CapEx of these hyperscalers, right, on the order of $600 billion and you look at, across the rest of the supply chain gets you to, on the order of a trillion dollars. A portion of this is, you know, immediately for compute going online this year. Right? The chips and the, the, the other parts of CapEx that do get paid this year, but there's a lot of setup capex as well, right? So when we have, when we're talking about 20 gigawatts this year in America, roughly incremental, incremental added capacity, a portion of this is not spent this year. A portion of that capex is actually spent the prior year. And so when you look at, hey, Google's got $180 billion actually, a big chunk of that is spent on turbine deposits for 28 and 29. A chunk of that is spent on Data center construction for 27. A chunk of that is spent on power purchasing agreements and down payments and all these other things that they're doing for further out into the future so that they can set up this super fast scaling. And this applies to all the hyperscalers and other people in the supply chain. And so, you know, 20 gigawatts roughly deployed this year. A big chunk of that being hyperscalers, chunk not being. And all of these companies, their biggest customers are anthropic and OpenAI. Anthropic and OpenAI are in the 2 gigawatt and 2 1/2 gigawatt and 1 1/2 gigawatts roughly. Right now they're trying to scale to much larger. Right. If you look at what Anthropic has done over the last few months, you know, 4 billion, 6 billion revenue added. And if we just draw a straight line, hey, yeah, they'll add another $6 billion of revenue a month, people would argue that's bearish and that they should go faster. What that implies is that they're going to add $60 billion of revenue across the next 10 months. Right. And $60 billion of revenue at the current gross margins that Anthropic had at least last reported by media would imply that they have, you know, roughly $40 billion of compute spend for that inference for that 60 bill of revenue. That 40 billion of compute at roughly $10 billion a gigawatt rental cost means that they need to add 4 gigawatts of inference capacity just to grow revenue. And that's saying that their research and development training fleet stays flat. Right. So, you know, in a sense, Anthropic needs to get to well above 5 GW by the end of this year. And it's going to be really tough for them to get there. But it's possible.
[04:02]
A
Can, can I ask a question about that? So, yeah, if Anthropic was not on track to have 5 gigawatts by the end of the end of this year, but it needs that to serve both the revenue that's gone crazier than expected and maybe it's going to be even more than that. But plus the research and training to make sure its models are good enough for next year, how, how, where is that going to come from?
[04:21]
B
You know, Dario, when he was on your podcast, was very, very like, conservative. He's like, you know, I'm not going to Go crazy on compute because if my revenue inflects at a different rate at a different point, I don't want to go bankrupt. You know, I want to make sure that we're being responsible with this scaling. But in reality, you know, he's definitely missed the pooch in terms of like going like OpenAI, which was, let's just sign these crazy fucking deals. Right? And OpenAI has kind of got way more access to compute than Anthropic by the end of the year. And so what does Anthropic have to do to get the compute? Well, they have to go to lower quality providers that they would not have gone to before. Right. You know, optimally, you know, Anthropic at least historically has had the best quality providers been like Google and Amazon. Whereas, you know, at least historically minded, you know, the biggest companies in the world now Microsoft, and now they're expanding across the supply chain and going to other players that are newer. OpenAI has been, you know, a bit more aggressive on going to many players. Yes, they have tons of capacity from Microsoft. They have Google and Amazon as well. But they also have like tons with Core Weave and Oracle. And they've gone to like random companies or, you know, one would think random companies like SoftBank Energy, who has never built a data center in their life. But, you know, they're building data centers now for OpenAI. So they've gone to and many others like N Scale and others that they're going and getting capacity from. And so there's this like, conundrum for Anthropic because they were so conservative on Compute because they didn't want to go crazy. Right. And in some sense, a lot of the financial freakouts in the second half of last year were like, OpenAI signed all these deals, but they don't have the money to pay for them. Okay, Oracle stock's gonna tank. Oh, okay. Core Reef stock's gonna tank. Oh, okay. Like, you know, all these companies, stocks tanked and credit markets went crazy because people are like, the end buyer can't pay for this now. It's like, oh, wait, they raised a ton of money. Okay, fine, they can pay for it. But in the sense, Anthropic was a lot more conservative. Like, we'll sign contracts, but we'll be principled and we'll purposely undershoot what we think we can possibly do and be conservative because we don't want to potentially go bankrupt.
[06:18]
A
The thing I want to understand is, so what does it mean to have to acquire compute in a pinch? Is it that you have to go with like Neo clouds that is it that they have worse computer. In what way is it worse? And is it that you have to pay gross margins to a cloud provider that you wouldn't have otherwise had to pay to because you're coming in at the last minute? Who built the spare capacity such that it's available for Anthropic and OpenAI to get last minute? And basically what is the concrete advantage that OpenAI has gotten if they end up at similar compute numbers by 2027? Is it just like they're going to end this year with different gigawatts? If so, how many gigawatts is Anthropic and OpenAI going to have by the end of this year?
[06:54]
B
Yeah. So to acquire excess compute, I mean yes, there is capacity at hyperscalers and not all contracts for compute are long term. Right? Five years. Right. There's compute that in 2023 or 2024, H100 2025 that were signed at not five year deals. Right. OpenAI, the vast majority of their compute is signed at five year deals. But they can, you know, there were many other customers that had one year, two year, three year deals, six month deals on demand. And as these contracts roll off, who is the participant in the market most willing to pay price? And in this sense, right, we've seen H100 prices inflect a lot of and go up and people willing to sign long term deals for, you know, as above $2 even. Right. Like I've seen deals where certain AI labs, I'm going to be a little bit vague here for a reason, have signed at as high as $2.40 for two to three years for H1 hundreds, which if you think about the margin, $1.40 for Hopper when you release it or Hopper to build it across five years and now two years in, you're signing deals that are two to three years that are at $2.40. Those margins are way higher. Right. And so now you can crowd out all of these other suppliers, whether it's Amazon had these or Core, we've had these or together AI or Nebbys or whoever it is. Right. You know these, these Neo clouds are the firms that had a higher percentage of Hopper in general because they were more aggressive on it, A and B, they tended to sign shorter term deals. You know, not Corey, but the others tended to sign shorter term deals. And so hey, if I want Hopper, there is some capacity out there. And then also while most of the capacity at like an Oracle or a Core Weave is signed for a long term deal. In terms of Blackwell, anything that's going online this quarter's already sold and in some cases they're not even hitting all the numbers that they promised they would sell because there are some data center delays, not just those two, but like Nebias and all the other folks, Microsoft, Amazon, Google. But there is a lot of neo clouds as well as some of the hyperscalers who have capacity they're building that they did not sell yet, or capacity that they were going to allocate to some internal use that is not necessarily super AGI focused that they may now turn around and sell or they may. You know, in the case of Anthropic, they don't have to have all the compute directly, right? Amazon can have the compute, they can serve Bedrock or Google can have the compute and serve Vertex, or Microsoft can have the compute and serve Foundry and then dual revenue share with Anthropic or vice versa.
[09:21]
A
Basically you're saying Anthropic is having to pay either this like 50% markup in the sense of the revenue share or in the sense of last minute spot compute that they wouldn't have otherwise had to pay had they bought the computer early.
[09:33]
B
Right? And you know, there's a trade off there, but also at the same time, you know, for a solid like four months, everyone was like, OpenAI, we're not going to sign deals with you. Like that sounds crazy, right? Because you guys don't have the money now everyone's like, yeah, Open AI. We believed you the whole time. We can, we can sign any deal because you've raised all this money. But in a sense Anthropic is constrained and in that sense there are not that many incremental buyers of compute yet because Anthropic hit the capabilities here first, where their revenues are mooning.
[10:03]
A
That's interesting because otherwise you're like, well, having the best model is a extremely depreciating asset that three months later or you don't have the best model. But the reason it's important is that you can sign these deals and then lock in the compute in advance, get better prices. Doesn't this also imply, by the way, and maybe this is an obvious point, but there's at least until recently people had made this huge point about oh, what is the depreciation cycle of a gpu? And the bears, Michael Burry's or whatever have said, look, people are saying that four or five years for these GPUs and in fact if you maybe it's because the technology is improving so fast or whatever, it might make sense to have two year depreciation cycles for these GPUs which increases the sort of like reported amortized capex in a given year and so makes it maybe financially less lucrative to building all these clouds. But in fact you're pointing at like maybe the depreciation cycle is even longer than five years. Because if we're using hoppers and then especially if AI really takes off and in 2030 we're like fuck, we gotta like get the 7 nanometer fabs up and we gotta like, we gotta go back to the A1 hundreds, like turn on the A1 hundreds again. Then it's like actually the depreciation cyber cycle is incredibly long. And so I feel like that's an interesting financial implication of what you're saying.
[11:21]
B
There's a few strings to pull on there. One is what happens to depreciation of GPUs, right? And I guess I didn't answer your prior question, which is like anthropic. I think we'll be able to get to like 5 GW ish, maybe a little bit more by the end of the year through themselves as well as their product being served through Bedrock or through Vertex or through Foundry. I think they'll be able to get to 5 or 6 gigawatts, which is way above their like initial plans. Right, right. You know, and, and anyways that's, that's sort of like an OpenAI will be a little roughly the same, maybe a little higher actually a little bit higher based on our numbers. But anyways, the depreciation cycle of a gpu, right. Michael Burry was saying it's you know, three years or less. Right. Is like sort of his argument. And there's sort of two ways and lenses to look at this, like mechanically in this, you know, there's a TCO model, right? Total cost of ownership of a GPU where we sort of project pricing out for GPUs and build up the total cost of a cluster. But there's a number of costs, right? There's your data center cost, right? There's your networking cost. There's your smart hands and people in the data center swapping stuff out. There's your spare parts, right? There's your actual chip cost, there's your server costs. All these, all these various costs get slumped together and there's some depreciation cycles on it. You know, there's certain credit costs on it and you get to, okay, that's how you build up. Hey, an H100 costs $1.40 an hour to deploy at volume across five years. If your depreciation is five years, and then if you sign a deal at $2 an hour for those five years, your gross margin is roughly 35%. It's a little bit above that, but you know, if you sign it for $1.90, it's 35% roughly. And then you assume at that fifth year the GPU falls off a bus, right? It's debt. And in some cases, sort of the argument people are making is, well, if you didn't sign a long term deal, because every two years Nvidia's tripling, quadrupling the performance, while only 2xing the price or 50% increasing the price, then the price of an H100, sure, maybe the value in the market was $2 at 35% gross margins in 2024. But in 2026, when Blackwell is in super high volume and deploying millions a year, you're actually now worth $1 an hour. And when Rubin in 27 is in super high volume, right, Even though it starts shipping this year, is in super high volume next year, doing millions of chips a year deployed into clouds, you've got another 3x in performance, another 50% or 2x in price. Actually the hopper is only worth 70 cents an hour. And so the price of a GPU would continue to fall. That's like one lens. The other lens is what is the utility you get out of the chip, right? Because if you could build infinite Ruben or infinite of the newest chip, then yes, that's exactly what would happen. The price of a hopper would fall at a spot or a short term contract rate as the new chips come out and the price per performance goes up. But because you are so limited on semiconductors and deployment timelines and all these things, you end up with actually what prices these chips is not, hey, what's the comparative thing I can buy today? It's actually what is the value I can derive out of this chip today, right? And in that sense, let's take GPT 5.4. GPT 5.4 is both. Way cheaper to run than GPT 4, has fewer active parameters, it's much smaller, right, in that sense of active parameters. Plus because it's a sparser MOE versus GPT4 being a coarser MOE. There's also been so many other advancements in training, RL, model architecture, et cetera, et cetera, data qualities, all these things that have made GPT5.4 way better than GPT4 and it's cheaper to serve. And so when you look at an H100, it can serve more tokens per GPU of 5.4 than if you had ran GPT4 on it. Right. So at some sense it's producing more tokens of a model that is of higher quality.
[15:09]
A
Interesting.
[15:09]
B
And so in some sense, you know, obviously GPT4, what is the maximum TAM for its tokens? You know, maybe, maybe it was a few billion dollars, maybe it was tens of billions of dollars. Adoption takes time. For GPT 5.4, that number is probably north of 100 billion. But there's an adoption lag and there's competition, so other people are getting it and there's the constant improvements that everyone else is having. So if improvements stopped here, the value of an H100 is now predicated on the value that GPT5.4 can get out of it, instead of the value that GPT4 can get out of it. And the margins and all that stuff that these labs are doing and they're in a competitive environment, so their margins can't go to infinity. So you sort of have this dynamic that is quite interesting in that. And H100 is worth more today than it was three years ago.
[15:51]
A
That's crazy. And it's also interesting from the perspective of. Just take that forward. If we had actual AGI models developed, if we had genuinely human on a server and a human on a flop basis, an H100, these are such hand wavy numbers about how many flops can the brain do? But on a flop basis, an H100 is estimated to 1015is like how much some people estimate the human brain does in flops. Um, obviously in terms of memory, the human brain has way more. H100 is like 80 gigabytes and brain might have petabytes.
[16:24]
B
Oh yeah, you've got petabytes. Name. Name a petabyte of ones and zeros, bro. Name me a string.
[16:32]
A
Well, this is actually the point where like actually in.
[16:34]
B
No, we've just got the best sparse attention techniques ever.
[16:36]
A
Genuinely. Right. Like in, like in the sort of like amount of information that is compressed, it might be petabytes, but like the actual. It's extremely sparse. Moe. But anyways, imagine if we had a human knowledge worker can produce six figures a year of value. And so if an H100 can produce something close to that, if we had actual humans on a server, the value of an H100 is like, it can repay itself in the course of like a couple of months. So as I've been going through everything to prep for taxes. I realized that I worked with over 50 different contractors last year, from cinematographers to audio technicians to editors, and I owed all of them. 1099. In the past, I've just used a spreadsheet and a big folder of invoices to figure out who I need to collect tax forms from. But with so many contractors, this takes a bunch of time and I've almost missed some people. This year, though, Mercury made my process way more straightforward. Whenever I pay somebody in 2025, I just hit a toggle to have Mercury request a W9 from them. Because of that, everything that I needed to issue 1099s got sent directly to Mercury. I literally just clicked a button and Mercury generated and sent them all out. This is just one of the many things that I never would have assumed that a banking platform could just handle for me. Mercury has a bunch of features like this which are going to collectively save me multiple days this tax season. You can learn more@mercury.com Mercury is a fintech company, not an FDIC insured bank. Banking services provided through Choice Financial Group and Column NA members FDIC so when I interviewed Dario, the point I was trying to make is not that I think the Singularity is two years away and therefore Dario desperately needs to buy more compute, although the revenue is certainly there that he needs to do by buying more compute. But the point I was trying to make is that given what Dario seems to be saying, given his statements that we're two years away from a data center of geniuses, certainly not more than five years away, and data center geniuses should be earning trillions upon trillions of dollars of revenue. It just does not make sense why he keeps making these statements about being more conservative on compute or to your point, being less aggressive than OpenAI on computer. And I guess that point got lost because then people were like roasting me about like, oh this podcaster is like trying to convince this like multi hundred billion dollar company CEO. Like why don't you yolo it bro? But no, I was trying to say that internally his statements are inconsistent anyway, so it's good to iron it out.
[18:53]
B
Yeah, I think, you know, going back to like sort of the earlier view that if the models are so powerful, the value of a GPU goes up over time as we approach closer and closer to, let's say a point where right now only OpenAI and Anthropic have that viewpoint. As we approach further and further out, actually everyone is going to, even with open source models, be able to sort of start to see that value skyrocket per gpu. And so in that sense you should commit now to compute. But interestingly, in like in anthropic fashion, right, you know, there is a bit of a meme that they have problems with commitment issues and they're like sort of polyamorous, not dad, Dario. But this is a bit of a
[19:38]
A
meme, explains everything, by the way. So there's this interesting economics effect called Alken Allen, which is the idea that if you increase the fixed cost of different goods, one of which is higher quality and one of which is lower quality, that will make people choose the higher quality good on the margin. So to give a specific example, suppose the better tasting Apple costs $2 and then the shittier Apple costs $1. Okay, now suppose you put an import tariff on them. And so now it's $3 versus $2 for great apple, medium apple. Right?
[20:15]
B
Is that because they both increase by a dollar or should it be like 50% increase?
[20:19]
A
No, no, because they both increase by a dollar. The whole effect is that if there's a fixed cost that's applied to both the relative price, the price difference between them, the ratio changes. So previously it was like the more expensive one was 2x more expensive. Now it's just 1.5x more expensive. So I wonder if applied to AI, that would mean that, look, if GPUs are going to get more expensive, there will be a fixed cost increase in the price of compute.
[20:43]
B
Yes.
[20:44]
A
As a result, that will push people to be willing to pay higher margins for slightly better models. Because the calculus is I'm going to be paying all this money for the compute anyways. I might as well just pay slightly more to making sure it's like the very best model rather than a model that's slightly worse.
[21:02]
B
Right? So the Hopper went from two to three dollars. And if Hopper can make a million tokens of Opus and It can make 2 million tokens of Sonnet, the price differential between Opus and Sonnet has decreased because the price of the GPU has increased by a dollar from two to three. Exactly. Interesting. I think that makes a ton of sense also. I think we just see all of the volumes are on the best models today. All the revenue is on the best models today. And in a compute limited world, there's sort of two things that happen, right? A companies that have locked up, you know, and don't have commitment issues, you know, have these five year contracts for compute, they've kind of locked in a humongous margin advantage because They've locked in compute for five years at a price of, of what it transacted at five years ago or three years ago or two years ago, whatever it is. Whereas if you're now three years into that five year contract and someone else's two year contract or three year contract rolled off and now you're trying to buy that at, you know, modern pricing, when you're priced to the value of models, the price is going to be up a lot more. And so in a sense like the person who committed early has better margins in general and the percentage of the market that is in long term contracts is much larger than the percentage of the market in short term contracts. That can be this sort of flex capacity that you add at the last second and at the same time. Right. So where does the margin go? Right, because models get more valuable. How much can the cloud players flex their pricing? Well, if in fact like if you look at core weave, their average term duration is like over three years right now for like 90% plus of their compute, it's over three years. And so they end up with this like conundrum of like, well, they can't actually flex price, but every year they're adding incrementally way more capacity than they had previously. Right. This year alone. Right. Meta's adding as much capacity as they had in the entire fleet of compute and data centers for all purposes for serving WhatsApp and Instagram and Facebook in 2022 and doing AI, right. They're adding that alone this year. So in the same sense, you know, you talk about Meta's doing that Core weave and Google and Amazon, all these companies are adding insane amounts of computer year on year on year that new compute gets transacted at the new price. So in a sense, yes, you've locked in as long as we're in a sort of a takeoff. Right. Oh, OpenAI went from 600 megawatts to 2 gigawatts last year and from 2 gigawatts to, you know, 6 plus this year and you know, 6 to 12 next year. Right. The incremental added compute is where all the cost is, not the prior long term contracts. So then who holds the card is the infra providers for charging margin. Right. So now the cloud players, the Neo clouds or the hyperscalers can charge the margin. Oh, they can't because, um, or they can to some extent, but then as you go upstream to oh well, who has access to all the memory and logic capacity? Well, it's, it's Nvidia for the Most part they've signed a lot of long term contracts. You know, they've got like $90 billion of long term contracts today and they're negotiating three year deals with the memory vendors today. You know, you've got, you've got, you know, obviously Amazon and Google through Broadcom and they're, you know, Amazon directly and all these companies sort of amd, these companies hold all the cards because they've secured the capacity and TSMC is not raising prices but memory vendors are just like sort of to some extent raising a lot of price. Right. So they're going to double or triple price again, but then they're also signing these long term deals. So who is able to accrue all the margin dollars is actually, you know, potentially the cloud, potentially the chip vendors and the memory vendors. Until TSMC or ASML break out and they're like, no, actually we're going to charge a lot more. But at the same time do the model vendors get to charge crazy margins? I think at least this year we're going to see margins for the model vendors go up a lot. Right. Because they're so capacity constrained they have to destroy demand. Right. There's no way they can continue Anthropic can continue at the current pace without destroying demand.
[24:51]
A
Yeah, let's get into logic and memory. How specifically Nvidia has been able to lock up so much of both. So if you, I think according to your numbers, by 27 Nvidia is going to have like 70 plus percent of N3 wafer capacity or something like that around that area. And then I forget what the numbers were for memory at sk, Hynix and Samsung and so forth. But if you look at, so think about how the NEO cloud business works and how Nvidia works with that, or how the RL environment business works and how Anthropic works with that. In both those cases, Nvidia is purposely trying to fracture the complementary industry to make sure that they have as much leverage as possible. So they're giving, you know, allocation to random neoclouds to make sure that there's not one person that has all the compute. Similarly, Anthropic or OpenAI, when they're working with the data providers, they say no, we're going to just seed a huge industry of these things so that we're not locked into any one supplier for data environments. And I wonder why on the 3 nanometer process that's going to be trainium 3, that's going to be TPU V7, other accelerators potentially and why is TSMC just giving it all up to Nvidia rather than trying to fracture the market?
[26:07]
B
Yeah, so I think there's a couple points here, right on 3 nanometer. If we go back to last year, the vast majority of 3 nanometer was Apple, Apple's biggest being moved to 2 nanometer. Memory prices are going up. So Apple's volumes may go down. Right, because as memory prices go up they have to either they cut margin or they move, move on. You know, there's some time lag because they have long term contracts, but basically Apple likely reduces demand, slash moves to 2 nanometer faster where 2 nanometers only capable of sort of mobile chips today. And in the future AI chips will move there. So sort of Apple has that. And then Apple's also talking to third party vendors because they're getting squeezed out of TSMC a little bit because TSMC's margins on high performance computing, HPC, AI chips, et cetera, is higher than it is for mobile because they have a bigger advantage in mobile, sorry in HPC than they do in mobile. But anyways, when you look at what's TSMC running calculus here, actually they're providing really good allocations to companies that are doing CPUs. Right? So when you think about, hey, Amazon has Trainium and Amazon has Graviton, both of Those are on 3 nanometer. Graviton being their CPU training being their AI chip, they're actually TSMC is much more excited to give allocation to Graviton than they are to Trainium because they view CPU business as more stable long term growth. Right. And as a company that is conservative and doesn't want to ride cycles of growth too hard, you actually want to allocate to the market that is more stable and lower growth rate first before you allocate all the incremental capacity to the fast growth rate market. Now that is, that is the case generally. And so when you look at like hey, same for AMD, right? The allocations they get on, you know, their CPUs is, is like TSMC is much more excited about those than they are for GPUs. Likewise for Amazon and Nvidia is, is a bit unique because all, yes, they have CPUs. Yes, they make switches, yes, they make NETWORKING, they make NVLink, they make all these different Infiniband, Ethernet, all these different products, Nixon, by and large most of these things will be on 3 nanometer by the end of this year with the Ruben launch and all the chips that are in that family, the GPU being the Most important one. And yet Nvidia is getting the majority of supply, right? Part of this is because you look at the market and you like, sort of like, you know, TSMC and others like they, there are many ways that they forecast market demand, but also it's market signal, right? The market signaled, hey, we need this much capacity next year, we need this much, we need this much. We'll sign non cancelable, non returnable, we may even pay deposits, right? Things like this. Nvidia just did it way earlier than Google or Amazon. And in some cases Google and Amazon had stumbling blocks. You know, there was one, one of the chips got delayed slightly by a couple quarters trainium and all these sorts of things happened. And so in that case there was a huge sort of like, okay, well these guys are delaying, but Nvidia is wanting more, more, more, more, more. And we are checking with the rest of the supply chain, is there enough capacity? Right? So they're going to all the PCB vendors and they're saying, hey, is there enough victory giant? Is there enough PCB? This is like one of the largest suppliers of PCBs to Nvidia and they're a Chinese company. All the, all the PCBs come from China, sort of from them or many of them. And anyways they're like, do you have enough PCB capacity? Great. Oh, hey, memory vendors. Who has all the memory capacity? Okay. Nvidia does.
[29:30]
A
Great.
[29:30]
B
So when you look at sort of in the same way, you know who, who is AGI pilled enough to buy compute in long timelines at levels that seem ridiculous to people who aren't AGI pilled, but nonetheless they're willing to pay a pretty good margin and sign it now because they view in the future that that ratio is screwed up. The same thing happens with the supply chain for semiconductors, right? Nvidia was while I don't think Nvidia is quite AGI pilled. Right. You know, Jensen doesn't believe software is going to be automated fully and all these things. Right?
[30:00]
A
Accelerated computing, not AI ship AI chips. Right, but that's what he calls it, right?
[30:05]
B
Yeah, because I mean, I think there's a broader term, right. AI is within that, but like physics modeling and simulations and like, or, but
[30:12]
A
he just like, he's not embracing the sort of like main use case and
[30:15]
B
I think he's embracing it, but like, I just don't think he's like AGI pilled like Dario, right. Or Sam. But he's still way, way more AGI pilled than Google was at Q3 of last year, or Amazon was at Q3 of last year,. And he saw way more demand. Right? And the reason is pretty simple. You know, you can see all the data center construction is like, okay, I want to have this market share. You know, we sort of like have all the data centers tracked. And, you know, you can see, you know, there's. There's a lot of data centers that you could say, well, they could be one or the other, Right? And so in some, to some extent, Google and Amazon, you know, Google especially, even though their, you know, their TPU is just better for them to deploy, they have to deploy a crapload of GPUs because they don't have enough TPUs to fill up their data centers. They can't get them fapped.
[30:55]
A
Wait, can I sort of. I have a question about that. Google sold, I think, a million. Was it the V7s of the Ironwoods to Anthropic. And you're saying in general, there's this, the big bottleneck right now, this year or next year. I mean, I guess going forward forever now is going to be the, you know, logic, memory, the stuff that, like, it takes to build these chips. And Google has DeepMind. This is the other, third prominent AI lab. And if this is the big bottleneck, why would they sell it rather than just giving it to DeepMind?
[31:24]
B
Right? So this is again, like a problem of like, you know, DeepMind. People were like, this is insane. Why did we do this? Yeah, right. But then Google Cloud people and Google executives saw a different, like, thought process. Right? And basically, you know, you and I know the compute team, there's one guy from, you know, both of them actually came from Google. The main people on the compute team at Thropic, they saw this dislocation, they negotiated a deal and they were able to get access to these, to this computer before Google realized. And so the, actually the chain of events, at least from our data that we found was in, in early Q3, we saw over the course of two, over the course of like, six weeks, we. We saw capacity on Anthropic, or sorry, on TPUs, go up by a significant amount over the course of those six weeks. And it went up, like multiple times in those six weeks. Right. There were multiple requests. Google even had to go to TSMC and explain to them why they needed this increase in capacity because it was so sudden, but that a lot of that capacity increase was for selling to Anthropic. Yeah, because Anthropic saw it before Google. And then Google had Nanobanano and Gemini 3 which caused their user metrics to skyrocket. And leadership at Google was like, oh, and then they started making the statement of we have to double compute every. Is it six months or. I don't remember the exact number that they said but they really woke up a lot more. And then they're like oh hey tsmc, we want more, we want more. And it's like well sorry guys, like we're sold out for next year. We can work on next year. We can maybe get like 5, 10% more for 26, but really we're going to work on 27. Right. There's sort of like, you know, there's this like information asymmetry of the labs in my mind, right. I don't know if this is exactly. It's the narrative I've spun myself from seeing all the data in the supply chain on like wafer orders and like what's going on with the data centers that you know, anthropic signed and fluid stack signed and all this like sort of, it's, it's, it's pretty clear to me that Google screwed up. And you can see this from Google's Gemini arrs, right? They had next to nothing in Q1, Q3, Q3 a little bit, right. Once they started inflecting, but Q4 they were at like 5 billion ARR, right exiting or something like this. So it's like or 5 billion revenue for Q4 on an ARR basis. And so it's clearly like Google didn't see revenue skyrocket and in a sense, right, Anthropic was not willing. You know, it's kind of had like a little bit of commitment issues before their ARR exploded. Even though they have far more information asymmetry and see what's coming down the pipe. Google is going to be more conservative than Anthropic is. A and B, Google had had even less ARR. So they sort of were like, I think just not willing to like sort of do it. And then they realized they should do it. And so now since then Google has gotten absurdly AGI pilled, right, in terms of like what they're doing. They bought an energy company, they're buying, putting depos down for turbines, they're buying a ridiculous percentage of the powered land, they're going to utilities and negotiating long term agreements. They're doing this on the data center and power side very, very aggressively. Right. So you know, I think Google woke up towards the end of last year, but it took them some time and
[34:26]
A
how many gigawatts do you think Google will have by the end of next year? By my data, you charge for that kind of information?
[34:32]
B
Yes, yes.
[34:35]
A
I feel like every year the bottleneck for what is preventing us from scaling AI compute keeps changing. A couple years ago was coas. Last year it was power. This year. You'll tell me what the bottleneck is this year, but I want to understand five years out, what will be the thing that is constraining us from deploying the singularity?
[34:52]
B
Yeah, I think the biggest bottleneck is compute. And for that, the longest lead time supply chains are not power or data centers. They're actually the semiconductor supply chain themselves. Right. It switches back from being power and data center as a major bottleneck to chips. And in the chip supply chain, there's a number of different bottlenecks, right? There's memory, there's logic wafers from tsmc, there's fabs themselves. Construction of the fabs takes a couple years, two to three years versus a data center takes less than a year. Right. We've seen Amazon build data centers in as fast as eight months. Right. So there's a big difference in lead times because of the complexity of the building. The fab that actually makes the chips and then the tools. Right. Those also have really long lead times. And so the bottlenecks as we've scaled have shifted from, hey, what is the supply chain currently not, what is it currently not able to do? Which was coas and power and data centers, but those were all shorter lead time items. Right. COAS is a much more simple process of packaging chips together. Power and data centers are ultimately way more simple than the actual manufacturing of the chips. And so there has been some sliding of capacity across, you know, mobile or PC to data center chips, but that's been somewhat fungible. Whereas coas and power and data centers, those have sort of had to start anew as supply chains. But now there's sort of no more capacity for the mobile and PC industries, which used to be the majority of the semiconductor industry, to shift over to AI. Right. Nvidia is now the largest customer at tsmc and Nvidia is the largest customer, SK Hynix, the largest memory manufacturer.
[36:33]
A
Right.
[36:33]
B
So it's sort of impossible for this scaling or the sliding of resources away from the common person, right, PCs and smartphones to shift any more towards the AI chips. And so now how do we scale the AI chip production? And that's the biggest bottleneck as we go to 2030 is those.
[36:52]
A
It'd be very interesting if there is an absolute gigawatt ceiling that you can project out to 20, 30 based just on, hey, we can't produce more than this many EUV machines, right?
[37:04]
B
So to scale, compute further, right, there's some different bottlenecks this year, next year, but ultimately, by 28, 29, the bottleneck falls to the lowest rung on the supply chain, which is ASML. Right? ASML makes the world's most complicated machine. That is an EUV tool. And the selling price for those is 300, 400 million dollars. And currently they can make about 70. Next year they'll get to 80. Even under very aggressive supply chain expansion, they only get to a little bit over 100 by the end of the decade. And so what does that mean? Okay, they can make a hundred of these tools by the end of the decade, and 70, right? Now, how does that actually translate to AI compute? Right. We see all these numbers from Sam Altman and and many others across the supply chain. Gigawatts, Gigawatts. Gigawatts, right? How many gigawatts are we adding? And we see, you know, Elon saying, hey, the 100 gigawatts in space.
[37:55]
A
A year.
[37:56]
B
A year, right? The problem with any of these numbers or the challenge to these numbers is, you know, actually not the power, not the data center, we can dive into that, but it's, it's, it's manufacturing the chips, right? So a gigawatt of, you know, Nvidia's Ruben chips, right? So Ruben is announced at gtc, I believe, the week this podcast goes live. And to make a gigawatt worth of data center capacity of Nvidia's latest chip that they're releasing at the end of this year, towards the end of this year, you need, you know, a few different wafer technologies, right? You need about 55,000 wafers of 3 nanometer. You need about 6,000 wafers of 5 nanometer. And then you need about 170,000 wafers of DRAM, right, memory. And so across these three different buckets, each of these requires different amounts of EUV, right? So when you manufacture a wafer, there's thousands and thousands of process steps where you're depositing material, removing them. But the sort of key critical step, which at least in advanced logic is like 30% of the cost of the chip is something that doesn't actually put anything on the wafer, right? You take the wafer, you deposit photoresist, which is like a chemical that basically chemically changes when you expose it to light. And then you Stick into the EV tool, which shines light at it in a certain way. It patterns it. Right. Because there is what's called a mask, which is a stencil, effectively for the design. And so when you look at a wafer, you know, a leading edge 3 nanometer wafer has 70 or so masks, right? 70 or so layers of lithography. But 20 of them are the most advanced EUV, right. And that specifically, you know, if you think about, okay, well, if I need 55,000 wafers for a gigawatt, if I do 20 EUV passes per wafer, you then you can do the math. That's like, okay, that's 1.1 million passes of EUV for a single gigawatt. So actually, like, it's pretty simple. And then once you add the rest of the stuff, it ends up being 2 million right across 5 nanometer and all the memory, you're at roughly 2 million EUV passes for a single gigawatt. You know, these tools are very complicated. So when you think about what it's doing across a wafer, it's taking the wafer and it's scanning and it's stepping across, right? It's Danny stepping across. And it does this hundreds of times across the entire, or dozens of times across the whole wafer. And so when you're talking about, hey, how many UV passes, that's. The entire wafer is being exposed at a certain rate. A wafer UV tool can do roughly 75 wafers per hour. And the tool is up roughly 90% of the time. Right. So in the end, you end up with, actually, I need about 3 1/2 EUV tools to do the 2 million EUV wafer passes for the gigawatt. So 3 1/2 EUV tools satisfies a gigawatt. So it's funny to think about the numbers, right, because we're talking about, oh, what's the gigawatt cost? It costs like $50 billion roughly. Right. Whereas what does three and a half EUV tools cost? That's like 1.2. Right? Right. It's actually like quite a lower number, which is, which is interesting to think about, like, oh, 50 gigawatts of economic, you know, sort of capex in the data center. And what gets built on top of that in terms of tokens is even larger. Right. It might be $100 billion worth of AI value into the supply chain is held up by this $1.2 billion worth of tooling that simply just cannot expand its supply chain quickly.
[41:07]
A
And I think so you, you read this article recently where you're saying over the last three years TSMC has done $100 billion of capex. It's like 30, 30, 40. And if you think about, I mean a small fraction of that is sort of like being used by Nvidia for the 3 nanometer that it's going to, or you know, previously 4 nanometer that it's using for its chips. But Nvidia has turned that into what are its like earnings? Last quarter it was like 40 billion. And so 40 billion times 4, so $160 billion. So Nvidia alone is turning some small fraction of 100 billion in CapEx. That's going to be depreciated over many years, not just this one year into $160 billion in a single year. And then that gets even more intense when you go down the supply chain to asml, which is taking a billion dollars worth of machines to produce a gigawatt. And of course those machines last for more than a year. Right. So it's doing more than that. Okay, so now I want to understand, okay, well how many such machines will there be by 2030 if you include not just the ones that are sold that year, but have been compiling over the previous years. And what does that imply about the Sam Altman says he wants to do a gigawatt a week in 2030. When you add up those numbers, is that compatible with that?
[42:17]
B
Right, that's completely compatible. Right. Because if you think about TSMC and the entire EcoSystem has something 250 to 300 EUV tools already, and then you stack on 70 this year, 80 next year, growing to 100 by 2030. You're at like 700 EUV tools by the end of the decade, 700 EUV tools, three and a half tools per gigawatt, assuming it's all allocated to AI, which it's not. But three and a half tools per gigawatt gets you to 200 gigawatts worth of AI chips for the data centers to deploy. Right. So 200 gigawatts. Sam wants 50 gigawatts, right? 52 gigawatts a year. He's only taking 25% share. Then obviously there's some share given to, you know, mobile and PC, assuming that, you know, for some reason we're allowed to even have consumer goods still, you know, and we don't get priced out of them. But you know, roughly like he's saying 25%, 50%, you know, 25% market share of the total chips. Fab that's that's kind of like very reasonable given, you know, this year alone, I think he's going to have access to 25% of the Blackwell GPUs that are deployed. Right. So it's not that crazy.
[43:20]
A
I find it surprising that, you know, when was the first. When did ASML start shipping EUV tools? When the 7nm started. So I don't know when that was exactly. But you're saying in 2030 they're going to be using machines that initially were shipped in 2020. So 10 years, you're using the same most important machine in this most technologically advanced industry in the world. I find that surprising.
[43:42]
B
So ASML's been shipping EUV tools now for roughly a decade, but it only entered mass volume production around 20. You know, the tool's not the same. You know, back then, the tools were even lower throughput. There's various specifications around them called overlay. Right. You know, I was mentioning, you're stacking layers on top of each other, right? You'll do some euv, you'll do a bunch of different process steps, depositing stuff, etching stuff, cleaning the wafer, you know, dozens of those steps. Before you do another EUV layer, there's a spec called overlay, right? Which is, okay, you did all this work, you know, you drew these lines on the wafer. Now I want to draw these dots, right? Let's just say I want to draw these dots to connect this, these lines of metal to. And then, you know, holes. And then the next layer up is another set of lines that goes perpendicular. So now you're connecting wires going perpendicular to each other there. You have to, you have to be able to land them on top of each other. So it's called overlay. And overlay is a spec that's been improved rapidly by asml. Wafer throughput has been improved rapidly by asml. And also the price of the tool has gone up, but not as much as the capabilities of the tool. Right. Initially, the EUV tools were like 150 million, and over time they're now like 400 million as I look out to 2028. But the capabilities of the tools have more than doubled as well. Right. Especially on throughput and overlay accuracy, which is the ability to stack accurately, align the subsequent passes on top of each other, even though you do tons of steps between. And so this is, you know, ASML is improving super rapidly. I think it's also something noteworthy to say ASML is, you know, maybe one of the most generous companies in the world, right? They have this linchpin thing. No one has anything competitive. Maybe China will have some EUV by the end of the decade, but no one else, you know, has anything even close to euv. And yet they haven't taken price and margins up like crazy. Right? You know, you go ask, you know, some other folks, you know, that we talk to all the time, like, you know, for example, Leopold, and they're like, you know, let's, let's, you know, let's, let's have the price go up, right? Because they can, the margin is there. You can, you can take the margin like Nvidia takes the margin. Memory players are taking the margin. But ASML has never risen the price more than they've increased the capability of the tool. And so, in a sense, they've always provided net benefit to their customer. It's not that the tool is stagnant, it's just that, like, you know, these tools are old. Yes, you can upgrade them some. And the new tools are coming. And for simplicity's sake, we're kind of ignoring, you know, the advances for this podcast, the advances in overlay or throughput per tool.
[46:07]
A
So you say we're producing 60 of these machines this year and then 70, 80 over subsequent years. What would happen if ASML just decided to double its capex or triple its capex? What is preventing them from producing more than 100 in 2030? Why so confident that even five years out you can be relatively sure what their production will be?
[46:29]
B
So I think a couple factors here, right? ASML has not decided to just go yolo. Let's expand capacity as fast as possible, right? In general, the semiconductor supply chain has not, right? It's lived through the booms and busts, and we can talk a bit more about it, but basically no one, you know, some players as of very recently have like woken up, but in general, no one really sees demand for 200 gigawatts a year of AI chips or, you know, trillions of dollars of spend a year in the semiconductor supply chain. They're just like, they're not, they're not AI pilled, right? They're not AGI.
[47:03]
A
We're going to get to a trillion dollars this year.
[47:05]
B
Yeah, I feel you, but I'm saying, like, no one really understands this in the supply chain, constantly we're told our numbers are way too high, and then when they're right, they're like, oh, yeah, but your, your next year's numbers are still too high. And it's like, but anyways, like ASML has sort of their tool has four major components, right? It has the source, right, which is made by Simer in San Diego, has the reticle stage, which is made in Wilmington, Connecticut, right. Has the wafer stage and the, the optics, right, the lenses and such. And those two are made in Europe, right? And so when you, when you look at each, for each of these four, they're tremendously complex supply that A, they have not tried to expand massively, and B, when they try to expand them, the time lag is quite long, right? And so again, this is the most complicated machine that humans make, period. Right? At a volume, at any sort of volume. But like, let's talk about the source specifically, right? What does the source do? It drops these tin droplets. It hits it three subsequent times with a laser perfectly. So the first one hits this tin droplet expands out, it hits it again. So it expands out to this perfect shape and then it blasts it at super high power. And the tin droplets get excited enough that they release EUV light 13.5 nanometer. And then it's in this thing that is like basically collecting all the light and directing it into the lens stack, right? Then you have the lens stack, which is Carl Zeiss, right. As you mentioned, and some other folks, but Zeiss being the most important part of it. They also have not tried to expand production capacity because they don't see any. You know, they're like, oh, yeah, yeah. Like we're growing a lot because of AI. We're growing from 60 to 100, right? It's like, no, no, no, no. We need to go to like a couple hundred. But it's, it's fine. Whatever each of these tools has, you know, I think 18 of these lenses effectively mirrors. They are, they're multi layer mirrors which are perfect layers of molybdenum and ruthenium, if I recall correctly, stacked on top of each other in many layers. And then the light bounces off of it perfectly. But it's not just like, you know, like when we think about a lens, you know, it's like in a shape and it focuses the light. This is a, this is like a mirror that's also a lens. And so it's pretty complicated. Any defect in this perfect layer of stack in this, in these like super thinly deposited stacks will mess it up. Any curvature issues. Like, there is a lot of challenges with scaling the production. It's quite artisanal, right, in the sense, right, because you're not making tens of thousands of these a year. You're Making hundreds, you're making thousands, right? You know, talk about 60 tools a year. 18 of these per tool. You end up with, you know, you're still in the, you know, hundreds of tools or thousand. You're at the thousand number roughly for these, these lenses and projection optics. So then you, and then you step forward to the reticle stage, which is also something really crazy. This thing moves at, I want to say, nine GS. Like, it will shift nine GS because as you step across a wafer, the tool will go. And the wafer stage is complementary. It's the wafer part. So you line these two things up. You're taking all the light through the lenses. That's focused. And here's the reticle, here's the wafer. And you're passing. The reticle's moving one direction, the wafer's moving the other direction. As it scans a 26 x 33 millimeter section of the wafer and then it stops, it shifts over to another part of the wafer and does it again. And it does that in just seconds. Right? And each of them are moving at nine GS in opposite directions. So each of these things is like a wonder and marvel of like, chemistry, fabrication, you know, you know, sort of like mechanic, mechanical engineering, optical engineering. Because you have to align all these things and make sure they're perfect. All these things have crazy amounts of metrology because you have to perfectly test everything because if anything is messed up, the yield goes to zero. Right. Because this is such a finely tuned system. And by the way, you. It's so large that you're building it in all these. You're building in the factory in Hindhoven, Netherlands, and they're deconstructing it and shipping it on many planes to the customer site. And then you're reassembling it there and testing it again. And that process takes many, many months. So, like, it's. It's just there's so many steps in the supply chain. Right. Whether it's Zeiss making their lenses and projection optics, or Cymer, which is an ASML owned company, making the EUV source. And each of these has its own complex supply chain. Right. ASML's commented their supply chain has over 10,000 people in it. Right.
[51:31]
A
Like individual suppliers.
[51:32]
B
Yes. And it might not be directly. It might be through, like, hey, you know, Zeiss has so many suppliers. And you know, XYZ company has so many suppliers. But, you know, they, these, you know, if you just think about, like, okay, you're Talking about two physically moving objects that are like this large and this large, you know, the size of a wafer, right? And it has to be accurate to the level of, you know, single digit nanometers or even smaller. Because the entire system, the overlay, right, Layer to layer variation has to be on the order of 3nm, right? And so if the overlay is 3nm, that means each individual part, the accuracy of its physical movement has to be even less than that, right? Has to be sub 1nm in most cases because the, the error of these things stacks up, right? And so there's no way to like, you know, just like snap your fingers and increase production, right? You know, things simple as power, right? The US going from 0% power growth to 2% power growth, even though China's already at 30 was like so hard for America to. Right? And that's a really simple supply chain with very few people in the supply chain, right, who make difficult things. And there's, you know, probably what a hundred thousand electricians slash people work in the supply chain of electricity or more in the US and you know, when you look at, oh, ASML employs like so few people. Carl Zeiss probably employs like less than a thousand people working on this. And all of those people are like super, super specialized. So it's, you know, you can't just train random people up for this. Like in the snap of a finger, you can't just get your entire supply chain to get galvanized, right? Nvidia's had to do a lot to get the entire supply chain to even deliver the capacity they're going to make this year. Even though when you go talk to Anthropic, they're like, well, we're short of TPUs or short of training, we're short of GPUs. When you go talk to OpenAI, they're like, we're short of these things, right? So OpenAI anthropic, they know they need X. Nvidia is not quite as AGI pilled and they're building, you know, x minus one. And you go down the supply chain, everyone's doing minus one and in some cases they're doing like divided by two, right? Because they just don't. They're not AGI pilled, right. I think. And so you end up with the time lag for this whip to react, right? You know, the, the sort of AI pilledness is, and desire to increase production is so long. And then once they finally understand, hey, we need to increase production rapidly, right? And they think they understand, oh, AI means we have to go from 60 to 100. In addition to the tools all just getting better and faster, you know, the source getting higher power from 500 watts to a thousand, and, you know, all these other aspects of the supply chain, you know, advancing technically plus increase of production, they think they're, they're like, actually increasing production a lot. But if you float through the numbers of, hey, what does Elon want? He wants 100 gigawatts a year in space by 2028, is it, or 2029. And, you know, Sam Altman wants 50 gigawatt, 52 gigawatts a year by the end of the decade. And you look at, you know, probably anthropic needs the same. And then, you know, Google needs that. You know, you go across the supply chain, it's like, wait, no, the supply chain can't possibly build enough capacity for everyone to get what they want. On the side of compute, real conversations
[54:42]
A
are full of fits and starts and pauses and interruptions. I mean, just listen to this episode. At least superficially, voice models have gotten pretty good at handling these kinds of things. But at a deeper level, interruptions can throw off a model's understanding, understanding and degrade the quality of its responses. And it's not always clear why. Labelbox realized that this was a huge bottleneck for their customers. So they built an evaluation pipeline called Echo Chain to help you diagnose and fix your voice model's specific failure modes. EchoChain starts by feeding conversations into your voice model. It then injects interruptions at specific intervals and classifies any failures into one of three different modes. One, did it acknowledge a correction but keep the old plan? Two, did it adapt briefly but then slide back to old assumptions? Or three, did it abandon the old task entirely? This is extremely useful information because Labelbox can get your model the exact data it needs to fix whatever issue is preventing it from being a viable and competent voice model. So if you want to ensure that your voice model stays performant in real conversations, you should reach out to LabelVox. Go to labelvox.com dwarcash so I feel like in the, in the data center supply chain, for the last few years, people have been making arguments of this specific thing. We are bottlenecked by, therefore, AI Compute can't scale more than X. But then, as you've written about, oh, no, if, say, the grid is a bottleneck, then we just do behind the meter on the site. We do gas turbines, et cetera. If that doesn't work. There's all these other alternatives that people fall back on. And I want to ask you a question about whether we can imagine a similar thing happening in the semiconductor supply chain. So if EUV becomes a bottleneck, well, what if we just went back to 7 nanometer and do what China is doing currently and producing 7 nanometer chips with multi patterning with DUV machines. And if you look at a 7 nanometer chip like the A100, there's been a lot of progress obviously since from the A100 to the B100 or B200. But how much of that progress is just numerics? And then if you just told constant, say FP16 from a 100 to B100, the B100 is like a little over one petaflop. And then a 100 is like 300 teraflops. And so you have basically 3x holding numerics constant, you have like a 3x improvement from a 100 to B100. And then some of that is the process improvement, some of that is just the accelerator design improving, which we could replicate again in the future. And so then it just seems like, actually it's like very small effect from the process improving from 7 nanometer to 4 nanometer. So I don't know, say we have, I don't know the numbers offhand, but let's say there's like 150k wafers per month of 3 nanometer, and then eventually similar amounts for 2 nanometer, but then there's a similar amount for 7 nanometer. Right. So if you have all those old wafers and then there's maybe a 50% haircut because the process, you know, the bits per wafer area are like, was it 50% less or something, Then it's like, it doesn't seem like that bad to just bring on seven nanometer wafers and then, oh, that gives you another 50 or another hundred gigawatts. Yeah. Tell me why that's naive.
[57:55]
B
Yeah, so I think, you know, we potentially do go crazy enough that this is. This happens because we just need incremental compute. And the compute is worth the higher cost, power, et cetera, of these chips. But it's also unlikely to some extent, to a large extent, because of, I think just comparing some of these are not fair comparisons. Right. For example, from a 100, which is 312 teraflops to Blackwell, which is like 1000ish of FP16, or maybe it's 2000, and then Rubin is 5000 or so FP16. It's, it's not a fair comparison because these chips have vastly different design targets. Right. At a 100, that is what Nvidia optimized for was FP16 bfloat 16 numerics. When you look at Hopper, they didn't care as much about that. They cared about FP8. When you look at Ruben, they don't care about FP16 and BF16 as much. They care mostly about FP4 and 6. Right. And so numerics like are what they've designed the search designed their chip for. And so there's a couple like, you know, okay, let's just say let's redesign, let's make a new chip design on 7nm. Sure, we can do that. And then it's optimized for the numerics of the modern day. The performance difference is still going to be much larger than the flops. Different you mentioned. Right. Often it's easy to boil things down to flops per watt or flops per dollar, but that's actually not a fair comparison. Right. And so this is where sort of you can bring in, hey, let's look at Kimik 1 or DeepSeq. When you look at Kimi or Kimik 2.5. Sorry. And DeepSeek, when you look at these two models and you look at their performance on Hopper versus Blackwell, on very optimized software, you get vastly different performance. Right. And most of this is not attributed to flops. A lot of this is. Or numerics. Right. Because those models are actually eight bit. So it's not like Blackwell's and Hopper. They're both optimized for 8 bit. And Blackwell's not really taking advantage of its 4 bit there. The performance gulf is actually much larger. And the way you can sort of compare them and think about them is sure, it's one thing to shrink process technology and make the transistors smaller and each chip has X number of flops. But you forget the big gating factors, which is that these models don't run on a single chip. They run on hundreds of chips at a time. Right. If you look at DeepSeq's production deployment, which is well over a year old now, they were running on 160 GPUs. Right. And that's what they serve production traffic on. And so they split the model across 160 GPUs. Every time you cross the barrier of a chip to another chip, there is an efficiency loss because you now have to transmit over, you know, high speed electrical serdes. And there is a latency cost, there's a power cost, there's all these dynamics that hurt. As you shrink and shrink and shrink the process node, you've increased the amount of compute in a single chip. Now, in chip, right, movement of data is, you know, at, at hundreds or of at least tens of terabytes a second, if not hundreds of terabytes a second. Whereas between chips you're on the order of a terabyte of second, right? And so this, this movement of data between chips that are super close to each other physically. And then you can only put so many chips close to each other physically. So you have to put chips in different racks. The order of data between that is on the order of hundreds of gigabits a second, right? 400 gig or 800 gig a second, so 100 gigabytes a second, roughly. And so you've got this huge ladder of like, oh, on chip I can communicate at super fast speeds within the rack, I can communicate at order of magnitude speeds outside the rack, I can communicate at an even order of magnitude lower than that. And as you break the bounds of chips, you end up with this performance loss. So anyways, the reason I explain this is because when you look at Hopper vs Blackwell, even if both of them are using a rack worth of chips, the hopper is significantly slower because the amount of performance that you have leveraged to the task within each domain of, hey, tens of terabytes a second of communication between these transistors or these processing elements and terabytes a second between these processing elements is much, much higher. And therefore the performance is much higher. So when you look at inference at, let's say 100 tokens a second for Deepseek and Kimike 2.5 hopper versus Blackwell, the performance difference is on the order of 20x interesting. Not 2 or 3x like the flops performance difference indicates, even though those are on the same process node. You know, there's just differences in networking technologies and what they've worked on. And so you can translate some of these back. But when you look at like Rubin, what they're doing on 3nm, some of these things are just not possible to do all the way back on a 100. Even if you make a new chip for 7 nanometer, there's just like certain architectural improvements you can port, there's certain ones you cannot. And so the performance difference is not just going to be the difference in flops, it's in some senses cumulative between the difference in flops per chip networking speed between chips. How many flops are on a chip versus a system memory bandwidth on a single chip and on an entire system? All of these things compound.
[63:04]
A
Can I ask you a very naive question? So this year, Last year, the B200 has now two dies on a single chip. So you can get that bandwidth on a single chip without having to go through Enemy Link or Infiniband. And then next year, Ruben Ultra will have four dies on one chip. What is preventing us from just doing that with an old. Like how many dies could you have a single chip and still get these? Tens of terabytes a second.
[63:28]
B
Yeah. So even within Blackwell, there are differences in performance. When you go, when you're communicating on the chip versus across the chips, those bounds are obviously much smaller than when you're going out of the entire chip. But each die versus within the package. And so anyways, when you scale the number of chips up, there is some performance loss. It's not just perfect, but it is way better than different entire packages. Now how large can advanced packaging scale? The way Nvidia is doing it is coas the way, you know, Google and with Broadcom and Mediatek and you know, Amazon Trainium, all these chips are doing is called coas. But actually you can go and look back at what, what Tesla did with Dojo, right? Dojo, which they canceled and restarted. I don't know. Anyways, Dojo was a chip that was the size of an entire Wafer. They had 25 chips on it. And there were some trade offs, right? They couldn't put HBM on it. But the positive side was that they had 25 chips on it. And so to date, it is still probably the best chip for running convolutional neural networks. It's just not great at transformers because sort of the shape of the chip, the memory, the arithmetic, all these various specifications of it are just not well suited for transformers. They're well suited for CNNs. And anyway, so Dojo chips were optimized around that they made a bigger package. But at the same time, as you make packages bigger and bigger and bigger, you have other constraints, right? Networking speed, memory bandwidth, cooling capabilities. All of these things start to rear their heads. It's not simple, but yes, you will see a trend line of more chips on the package. And yes, you're going to be able to do that on 7 nanometer. In fact, that's what Huawei did with their Ascend 9, 10 C or D. They were initially just one and then they did two. And they're focusing on scaling the packaging up because that is an area where they can advance faster than sort of process technology where they can't shrink. But at the end of the day, that's still, you know, that's something that you can do on the leading edge chips too, Right. Anything you do on 7 nanometer, you can also probably do on 3 nanometer in terms of packaging.
[65:37]
A
So if you end up in this world in 2030, where the west has the most advanced process technology but it has not ramped it up as much, whereas China, I don't know if you think by 2030 they would have EUV and I don't know, 2 nanometer or whatever, but they are semiconductor pills. So they are producing in mass quantity, basically. I'm wondering what the year is where there's a crossover where our advantage in process technology has faded enough and their advantage in scale has increased enough. And also their advantage in like having one country that has the entire supply chain envisionized, rather than having random suppliers in Germany and Netherlands and whatever, would mean that China would be ahead in its ability to like, produce mass flops.
[66:21]
B
Yeah. So to date, China still does not have, you know, entire indigenized semiconductor supply chain. Right.
[66:29]
A
But were they in 2030?
[66:30]
B
Yeah, by 2030, it's, it's possible that they do. But, but to date, right, all of China's 7 nanometer and 14 nanometer capacity uses ASML DUV tools. Right. And the amount that they can ship and import from ASML is large. But the point being that the vast majority of ASML's revenue, especially on EUV, all of it, is outside of China. So the scale advantage is still in the favor of, let's call it the west plus Taiwan, Japan, et cetera.
[66:59]
A
But they're trying to make their own DUV and EUV tools.
[67:01]
B
Right? They're trying to do all these things. The question is how fast can they advance and scale up production as well as quality. And to date, we haven't seen that. Now, I'm quite bullish that they're going to be able to do these things over the next five to 10 years, right? Really scale up production, really kick it into high gear. They have more engineers working on it, they have more desire to throw capital, less.
[67:25]
A
So by 2030, do they have fully indigenized DUV?
[67:27]
B
I think for sure, for sure. DUV, yes.
[67:29]
A
And fully indigenized EUV.
[67:30]
B
By 2030, I think they'll have working tools. I don't think that they'll be able to manufacture a bunch yet. Right. You know, there's, there's sort of having it work and then there's production hell. Right. And ultimately like ASML had EUV working in the early 2010s at some capacity.
[67:48]
A
Right.
[67:48]
B
Right now the tools were not accurate enough. They were not scaled for high production, for scaled for high volume manufacturing reliable enough. And then they had to ramp production. And that all took time. Production hell takes time. Right, which is why it took another five to seven years to get EUV into mass production at a fab rather than just it working in the lab.
[68:07]
A
So how many duv tools do you think anybody will manufacture in 2030?
[68:12]
B
ASML?
[68:13]
A
No, China.
[68:14]
B
Oh, that's a great question. You know, current, it's a bit of a challenge to look into this supply chain especially. We try really hard, but in some instances they're like buying stuff from Japanese vendors and if they want a fully indigenized supply chain, they need to not buy these lenses or buy these projection optics or stages from Japanese vendors. They need to build it internally. So it's really tough to say where they'll be able to get to. Like I honestly think it's like a shot in the dark. But it's probably not unlikely that they'll be able to do, you know, on the order of 100 DUV tools a year, whereas ASML is doing hundreds of duv tools a year currently. You know, no one's made a process node, no company has a process node where they make a million wafers a month. Right. Elon says he wants to do it and China's obviously going to do it. Right. And I don't think TSMC is trying to do that. The memory makers may get there as well. Right. To the million wafers a month, but not in a single fab. It's sort of mind boggling to think of that scale and challenging to see the supply chain galvanized for that. So I'm not sure, you know, I don't want to doubt China's capability to scale.
[69:30]
A
Right. I guess this is an interesting question and I think it might, you know, at some point semi analysis will do the deep dive on this. But I think this question of like by when would China be able, like indigenized Chinese production could be bigger than the rest of the west combined. If you just add up like and put in the input of your model when they'll have DUV machines that scale, when they'll have UV machines that scale. Because I think there's this question around if you have long timelines on AI by long meaning 2035, which is not that long in the grand scheme of things, should you expect a world where China is dominating in semiconductors? Which I think, I don't know, doesn't get asked enough in San Francisco. We're just thinking on timescale of weeks and then if you're outside of San Francisco, you're not thinking about AGI at all. And so this question of like, okay, what if we have AGI? What if you have this transformational thing that is commanding tens of trillions of dollars or hundreds of trillions of dollars of economic growth and token output and so forth, but then it happens in 2035 and what does that imply for the west versus China? I think it's just like, I don't know, the semi analysis has got to write the definitive model on this.
[70:37]
B
Yeah. So I think it's really challenging when you move timescales out that far. Right. What we tend to focus on is like we're tracking every data center, we're tracking every fab, we're tracking tools and we're tracking where they're going. But the, the time lags for these things are, are relatively short. Right. We can only make like reasonably accurate estimates for data center capacity based on, you know, land purchasing and you know, permits and turbine purchasing and all these things. And we know where all these things are going and we like that's what the data we sell is. But like, you know, as you go out to like 2035, you know, things are just so radically different and you know, your error bars get so large it's kind of hard to make an estimate. But at the end of the day, like you know, there is if takeoff or timelines are slow enough. Right. Then certainly China, I don't see why they wouldn't be able to catch up drastically. Right. You know, in some sense we've got like this valley right of where you know, call it three to six months ago Chinese models were, or maybe even now Chinese models are competitive as they've ever been. I think Opus 4.6 and GPT 5.4 have really pulled away and made the gap a little bit bigger. But I'm sure, you know, some new Chinese models will come out. But as we move from hey, these companies are selling tokens where they provide the entire reasoning chain and all that to selling automated white collar work, automated software engineer. Send them the request, they give you the result back and there's a bunch of thinking on the backend that they don't show you the ability to distill out of American models into Chinese models will Be harder A B as the scale of the compute that the labs have. Right. OpenAI exited the year with roughly 2 gigawatts last year. Anthropic will get to 2 plus gigawatts this year and by the end of next year they'll both be at like 10 gigawatts of capacity. China is not scaling their AI lab compute nearly as fast. And so at some point when you can't distill the learnings from these labs into the Chinese models plus this compute race that opens Google, et cetera, meta are all racing on. At some point they end up getting to a point where the model performance should start to diverge more. And then all of this capex that's being spent on data centers and all that. Right. Amazon 200 billion, Google 180, so on and so forth, all these companies are spending hundreds of billions of dollars of capex. There's nearly a trillion dollars of capex being invested in data centers in America this year. Roughly. Right. You end up with okay, well what's the return on invested capital here? You and I would think that the return on invested capital for data center capex is very high. And at least if we look at anthropic revenues in January they added like 4 billion. In February which is a shorter month, they added like six. We'll see what they can do in March and April given compute constraints are what's bottlenecking their growth. Right. The reliability of cloud code is actually quite low because they're so compute constrained. But if this continues in the ROIC on these data centers is super high.
[73:35]
A
Yeah.
[73:36]
B
And at some point the US economy starts growing faster and faster over the next, you know, this year and next year because of all this capex and all this revenue that these models are generating and downstream supply chain versus China doesn't have that yet. Right. They have not built the scale of infrastructure to then invest in model, to invest in models to get to the capabilities to then deploy these models at such scale. Right. Because when you look at like anthropics, hey, they're at call it 20 billion arrangements of that the margins are sub 50% at least last reported by the information. So then you're at okay, that's like 13, $14 billion of compute that it's running on rental cost wise which is actually like $50 billion worth of capex that someone laid out for Anthropic to generate their current revenue and China has just not done this. If and when anthropic 10x's revenue again and I think our answer would be when, not if. Then China doesn't have the compute to deploy at that scale. And so there is some sense of like, oh, we're in fast takeoff. Ish. Right. It's not like we're talking about, you know, Dyson Sphere by X day. It's more like the revenue is compounding at such a rate that it does affect the economic growth. And the resources these labs are gathering are going so fast that, you know, and China hasn't done that yet. So in that case, the US and the west is actually diverging. The flip side is actually these, these infrastructure investments have middling returns. Maybe they're not as good as, as maybe Google is wrong for wanting to take free cash flow to zero and spend $300 billion on capex next year. Maybe they're just wrong. And people on Wall street who are bearish and people who don't understand AI are correct. Right. And in which case then the US is building all this capacity. It doesn't get really great returns. And China is able to build the fully vertical indigenous supply chain, not us. Japan, Korea, Taiwan, Southeast Asia, you know, Europe, all these, all these countries together, building this like, less vertical supply chain. And in a sense, at some point, China is able to scale past us. If AI takes longer to get to certain capability levels than, you know, I would say the vast majority of your guests on this podcast believe it's like
[75:49]
A
fast timelines, US wins, long timelines, China wins.
[75:52]
B
Right, But I don't know, like, I don't know what fast timelines means. Right. Like, I like, don't think you have to believe in AGI to have the timelines where the US wins.
[76:00]
A
Okay, let's go back to memory because I think this is. Maybe people on Wall street and people in the industry are understanding how big this is, but maybe generally people don't understand how big a deal this is. So we've got this memory crunch as you're talking about. And earlier I was asking about, oh, could we solve for the EUV tool shortage by going back to 7nm? So let me ask a similar question about memory. HBM is made of Dram, but has 3-4x less bits per wafer area than the DRAM it's made out of. Is it possible that accelerators in the future could just use commodity DRAM and not hbm and so just we can make much more capacity out of the DRAM we get. And the reason I think this might be possible is, look, if we're going to have agents that are just going off and doing work and it doesn't really, it's not a synchronous chatbot application, then you don't necessarily need extremely high fast latency kinds of things anymore. And so maybe you can have the low bandwidth, because the reason you stack DRAM into stacks and make HBM is for higher bandwidth. And so is it possible to go to HBM accelerators and basically have the opposite of CLAUDE code fast, like have cloud code slow and do that?
[77:17]
B
Yeah, I think at the end of the day, the incremental purchaser who's willing to pay the highest price for tokens also ends up being the one that's less price sensitive. And the compute should be allocated in a capitalistic society towards the goods that have the highest value. And the private market determines this by willingness to pay. And so to some extent, sure, Anthropic could actually release a slow mode. Right? They could release CLAUDE slow mode and have an increase in tokens per dollar by a significant amount. They could probably reduce the price of Opus 4.6 by 4x5x and reduce the speed by maybe just like 2x. The curve on inference throughput versus speed is there already just on HBM. And yet they don't, because no one actually wants to use a slow model. And furthermore, on these agentic tasks, you know, it's great that the model can run at this time horizon of hours. That's kind of like, okay, well, if the model was just running slower, that hours would become a day. Right. Or vice versa. Right. If the model's running faster, that hours becomes hour. And yet no one really wants to move to that day long wait period because the highest value tasks also have some time sensitivity to them. Right. And so I struggle to see, you know, yes, you could use DDR, but then there's a couple like things that are challenging with this. Right? You could use regular dram. One is you're still limited. One of the core constraints of chips, even though they're sort of like a chip is a certain size, all of the I O escapes on the edges of the chip. Right. So oftentimes what you see is the left and the right of the chip are hbm. The I O from the chip to the HBM is on the sides, and then the top and bottom are I O to other chips. Right. And so if you were to change from HBM to DDR, then all of a sudden this IO on this edge would have significantly less bandwidth, but it had significantly more capacity per chip. Yeah, because. And so, yes, you're making less. You know, the metric that you actually care about is bandwidth per wafer, not bits per wafer.
[79:31]
A
Because the thing that is constraining the flops is just getting in and out the next matrix. And for that you just need more bandwidth.
[79:40]
B
Yeah, getting out the weights and getting in and out the KV cache.
[79:43]
A
Right.
[79:43]
B
And so in many cases these GPUs are not running at full memory capacity. Yes. It's obviously like a system design thing. You know, model hardware, software, co design of, hey, how much KV cache do I do? How much do I keep on the chip? How much do I offload to other chips and call when I need it for tool calling or whatever? How many chips do I parallelize this on? Obviously these are like the search space of this is like very broad, which is why we have like inference X, which is like an open source model, like searches all the optimal points on inference for a variety of eight different chips and models. Anyways, like the point is you're not always necessarily constrained by memory capacity. You can be constrained by flops, you can be constrained by network bandwidth, you can be constrained by memory bandwidth or you can be constrained by memory capacity. There's sort of like four or if you were really to simplify it down, there's like four constraints and each of these can break out into more. But in this case, if you switch to DDR, yes, you produce 4x the bits per DRAM wafer. But all of a sudden the constraints shift a lot and your system design shifts a lot. You go slower. Yes. Is the market smaller? Okay, maybe, possibly. But also now all of a sudden all these flops are wasted because they're just sitting there waiting for memory. It's like, great, I don't need all that capacity because I can't really increase batch size because then the KV cache is going to take even longer to read. And so you never, you can.
[81:02]
A
Yeah, interesting. What is the bandwidth difference between HBM and a normal DRM?
[81:07]
B
Yeah, so an HBM stack of HBM4. Let's just talk about like the stuff that's in Ruben, because that's what we've been indexing on is 2048 bits across connected in an area that's like 13 millimeters wide. So 2048 bits and it transfers memory at around 10 gigatransfers a second. So HBM, a stack of HBM4 is 2048 bits on an area that's 13 millimeters wide, roughly or 11. And that's the shoreline that you're taking on the chip and in that shoreline, you have 2048 bits transferring at 10 gigatransfers per second. You multiply those together and you divide by eight bits to bytes. You're at roughly two and a half terabytes a second per HBM stack. Right. When you look at DDR in that same area, it's maybe 64 or 128 bits wide. And that DDR5 is true transferring at anywhere from 6.4 gigatransfers a second to maybe 8,000 gigatransfers a second. So your bandwidth is significantly lower. Right. It's 64 times 8,000 divided by eight, you're at 64 gigabytes a second. And even if you take a generous interpretation of 128 times eight gigatransfers, you're at 128 gigabytes a second for the same shoreline versus two and a half terabytes a second. There's an order of magnitude difference in bandwidth per edge area. And if your chip is a Square or it's 26 by 33, right. Is the maximum size for a chip individual die. You only have so much edge area. And then on the inside of that chip you put all your compute. There's things you can do to try and change, right? More sram, more caching, blah, blah, blah. But at the end of the day, you're very constrained by bandwidth.
[82:41]
A
Interesting. So then there's a question of like, where can you destroy demand to free up enough for AI? And I guess the picture is especially bad because as you're saying, if it takes 4x more Wayfair area to get the same byte for HBM, you had to just drive 4x as much consumer demand for laptops and phones and whatever in order to free up one byte for AI. So yeah, what does this imply for the next year or two of. Sorry for the run on question. I think on your newsletter you said 30% of the capex in 2026 of big tech is going towards memory.
[83:16]
B
Yes.
[83:16]
A
That's insane. Right?
[83:17]
B
Yeah.
[83:18]
A
Like of the 600 billion or whatever you're saying, 30% is going just to.
[83:23]
B
And you know, obviously there's some level of like margin stacking that Nvidia does. And so if you separate out, you know, and you apply their margin to the memory and the logic, but at the end of the day, yeah, like a third of their capex is going to memory.
[83:33]
A
That's, that's so that's crazy. Okay, so what is the question I'm trying to ask is something like. Yeah, what does this imply basically, what should we expect over the next year or two as this memory crunch hits?
[83:42]
B
Yeah, so memory crunch will continue to be harder and harder and prices continue to go up. And this affects different parts of the market differently. Right. Gets to sort of the like, are people gonna hate AI more and more? Yes, because now smartphones and PCs are not gonna get incrementally better year on year and in fact they're gonna get incrementally worse.
[84:00]
A
If you look at the bill of materials of a iPhone, what fraction of it is the memory? Like, how much more expensive does an iPhone get if the memory is 2x more expensive or whatever? It has to be.
[84:09]
B
So I believe an iPhone has 12 gigabytes of memory. Each gig costs used to cost roughly three or four dollars, so it's fifty bucks, but now the price of memory is like tripled, let's call it. If it's now it's 12 bucks per gig for DDR. So now you're talking about $150 versus $50. Right. $100 increase in cost on Apple. Also Apple has some margin. They're not just going to eat the margin. So now that's a $100 cost increase. That's just on the DRAM. The NAND also has the same sort of like market. So in fact, you know, it's probably $150 increase on the iPhone. Apple has to either pass it on to the consumer A or B, they have to eat it. I don't see Apple reducing their margin too much. Maybe they eat a little bit. But at the end of the day that means the end consumer is paying $250 more for an iPhone. And now that's on like, hey, what is last year's memory pricing versus today's? Now there is some lag for Apple to have to feel the heat because they have tended to have, you know, three, six or a year long contracts for a lot of memory. But at the end of the day, Apple gets hit pretty hard by this. But they won't really adjust until the next iPhone release. But that's the high end of the market. Actually that's only a few hundred million phones a year. Right. Apple sells what, 2,300 million phones a year. The bulk of the market is this mid range low end. Right? Used to be 1.4 million smartphones were sold a year. Now we're at like 1.1. But our projections are we maybe get down to like 800 million this year and next year like 600 or 500 million because. And we look at like, you know, there's some data points out of China from some of our analysts in Asia and Singapore and Hong Kong and Taiwan. They've been tracking this and they see Xiaomi and Oppo are cutting low end and mid range smartphone volumes by half. Because yes, it's only $150 price increase on $1,000 smartphone or $150 BOM increase on $1,000 iPhone where Apple has some larger margin. But if we look at the smaller phones, the percentage of the BOM that goes to memory and storage is much larger and the margins are lower. So there's less capacity to even eat the margins. And they have like generally tended not to do as long term agreements on memory. And why this is like a big deal is if smartphone volumes, let's say half the halving will frankly happen in the low and mid range, not in the high end. So it's not like the bits released are halving, right? You know, currently consumers more than half of memory demand. Even if you half the smartphone volumes because of the shape of the halfing, right? It's like low end gets cut by more than half, high end gets cut by less than half. Because you and I will buy, you know, the high end phones that cost north of $1,000, we'll buy them even if they get a little bit more expensive. And Apple's volumes will not go down as much as like a low end smartphone provider. And the same applies to PCs. And what this does to the market is quite drastic, right? DRAM gets released, goes to AI chips who are willing to do longer term contracts, willing to pay higher margins, et cetera, et cetera. Because at the end of the day the margin that they extract is much larger from the end user or whatever. And so this probably leads to people hating AI even more, right? Because they're going to start being like today you already see all the memes on PC subreddits and PC Twitter, gaming. PC. Twitter is like you know, cat dancing videos. And it's like this is why memory prices has doubled and you can't get a new gaming gpu, right? Or you can't get a new desktop. And it's going to be even worse when memory prices double again, especially dram. Another dynamic that's quite interesting is it's not just dram, it's also nand. NAND is also going up in price. Both of these markets have expanded capacity very slowly over the last few years. NAND almost zero, but, but smartphones, the percentage of NAND that goes to phones and PCs is larger than the percentage of DRAM that goes to phone and PC. So as you destroy demand, you unlock mostly for the DRAM purposes, you unlock more NAND that gets allocated and can sort of go to other markets. And so the price increases of DRAM will be larger than those of NAND because you've released more from the consumer and in fact you've produced more memory for AI.
[88:16]
A
Sorry, but the NAND is. Maybe you just explained it and I missed it. Is it because SSDs are being used in large quantities for data centers or
[88:24]
B
they are, but not as large quantities as dram.
[88:28]
A
Okay, but you're saying they will also increase because they're using some quantity, but there's not as much a need as there is for hbm. Makes sense. One thing I didn't appreciate until I was reading some of your newsletters is that basically the same constraints that are preventing logic scaling over the next few years, it's quite similar to what's preventing us from producing more memory wafers. In fact, literally the same exact machine, this EUV tool is needed for memory. So I guess, yeah, maybe there's a question that somebody could be asking right now, like, well, why can't we just make more memory?
[89:01]
B
Is that somebody you?
[89:03]
A
Yeah, who knows?
[89:05]
B
So I think the constraints, as was mentioned earlier, are not necessarily EUV tools today or to next year. They become that as we get to the latter part of the decade. But currently, right, the constraints are more so they physically just haven't built fabs. Right. So over the last three to four years, these vendors have just not built new fabs. That's because memory prices were really low, their margins were low and in fact they were losing money in 2023 on memory. So they're like, oh, we're not building new fabs. And then like the market slowly recovered over time, but never really got amazing until last year, you know, in 2024 we were like banging on the drums that like, hey, reasoning means long context, which means large KV cache, which means you need a lot of memory demand. And we've been talking about that for like a year and a half, two years. And people who understand AI like went really long memory then, right? You know, and so you've seen that sort of like dynamic. But now it finally played out in pricing. It took so long for what was obvious, right? Hey, long context, KV cache gets bigger, you need more memory. And accelerators, half their cost is memory. So of course they're just going to start, you know, they're going to Start like going crazy on took a year for that to actually reflect in memory prices. Once memory prices reflected, then it took another six months, three months for the memory vendors to start building fabs. And those fabs take two years to build. And so we don't have really meaningful fabs that you can even put these tools in until late 27 or 28, right? And so instead what you've seen is like some really crazy stuff to get capacity right. Micron bought a fab from a company in Taiwan that makes lagging edge chips, right? Hynix and Samsung are doing, you know, some pretty crazy things to try and expand capacity at their existing fabs that also have like very not large knock on effects in the economy. And so hey, why can't we build more capacity? Is like there's nowhere to put the tools right? And it's not just euv, there's other tools involved in DRAM and logic, right? Like logic, you know, N3, 30% or so of the cost, you know, 28% of the cost is EUV of the wafer, of the final wafer. When you look at like dram, it's in the teens and it's going up, but it's in the teens so it's a much smaller percentage of the cost is DRAM or is euv? These other tools are also bottlenecks, although their supply chains are not AS complex as ASMLs. And so you see applied materials and LAM research and all these other companies also expanding capacity a lot. And anyways you don't have anywhere to put the tool because the most complex building that people make is fabs. And fabs take two years to build.
[91:43]
A
You can think of Jane street as a research lab with a trading desk attached. Their infrastructure team has built some of the biggest research clusters in the world with tens of thousands of high end GPUs and hundreds of thousands of CPU cores and exabytes of storage. This compute is part of how Jane street surfaces all the hidden patterns that are embedded in incredibly noisy market data. Even beyond the noise, the nature of the signal changes constantly in reaction to things like pandemics and elections and new regulations and even changes in sentiment. There's this unremitting game of trying to figure out whether your old models still reflect the real world and if not, what to do about it. If you're interested in working on this sort of thing, Jane street is hiring ML researchers and engineers. They're also accepting applications for their summer ML internship program with spots in London, New York and Hong Kong. And if you happen to find yourself at gtc, which is happening the week after this episode drops, Jane Street's GPU performance team is giving a talk. Go to janestreet.com thwarcash to learn more. I interviewed Elon recently, and his whole plan is that, I guess they're going to build this gigafab terafab, some power of 10, and they're going to build the clean rooms. I don't even ask you about the dirty rooms thing, but let's say they build the clean rooms. Okay, I have a couple questions. One, do you think this is the kind of thing that Elon co could build much faster than people are conventionally building it? Where this is not about building the end tools, this is just about building the facility itself. How complicated is it to just build a clean room and do it extremely fast? Is this something that Elon, with his move fast thing could do much faster? If that's what we're bottlenecked on this year or next year? And two, does that even matter if in two years your view is that we're not bottlenecked on clean room space, but we're bottlenecked on the tooling?
[93:27]
B
So I think, as with any complex supply chain, it takes time, and constraints shift over time. And even if something isn't any longer a constraint, that doesn't mean that market no longer has margin, right? So, for example, energy will not be a big bottleneck as we get to a couple years from now. But that doesn't mean energy is not growing super fast and there's no margin there. It's just like it's not the key bottleneck. And in the space of fabs, right, Clean rooms are the biggest bottleneck this year and next year. And as we get over time, 29, you know, 28, 29, 30, there will be still constraints there. The thing about Elon is I think he's had a tremendous capability to garner physical resources and really smart people to build things. And the way he's able to recruit really amazing people is just try and build the craziest stuff, right? In the case of AI, that's not really worked because everyone's trying to build. Everyone's very ambitious. But in the case of, like, we're going to make, you know, we're going to go to Mars and we're going to make rockets that land themselves, or we're going to make fully autonomous cars that are electric, right? Or we're going to make human aid robots, right? Like, these are methods of recruiting the People who think that's the most important problem in the world to work on that problem. Because he's the only one trying really hard. In the case of semiconductors, I want to make a fab that's a million wafers per month. No one has a fab that big. That's what he stated, right? He wants to make a million wafers a month. You know, it's possible that he's able to recruit a lot of really awesome people and get them on this heroically, you know, this crazy task of trying to build a fab that does a million wafers per month. Step one is to build the clean room. And I think that he probably can do. Right. I think, you know, there's some mindset, you know, his, his mindset around like delete things. It can be dirty, it's fine. Probably not right? Or actually I, I think 100% it's not right. You like need the fab to be very clean. I think the entire air, the entire, all of the air in the fab gets replaced like every three seconds. It's like that fast and there's so few particles per. But I think he can build the clean room. It'll take a year or two. Maybe initially it won't be super fast, but then over time he'll get faster and faster at it. But then the really complex part is actually developing the process technology and building wafers. And I don't think he can develop that quickly. I think that has a lot of built up knowledge. It's again like the most complicated integration of very expensive tools and supply chain that's done is a TSMC or an intel or a Samsung. And some of these two other two companies aren't even that great. And they're like tremendously complex.
[95:43]
A
How surprised would you be if in 2030 people there just happened to be some total disruption? We're not using uv, we're using something that has much better fractures, much simpler to produce. We can produce in much bigger quantities. I'm sure as an industry insider that sounds like a totally naive question, but do you see what I'm asking? What probability should we put on? Oh, something totally out of the left field comes out and none of this is relevant.
[96:08]
B
Something that's very simple and easy to scale. I have very, very low probability for there are a number of companies working on effectively, like particle accelerators or synchrotrons that generate light that's either 13.5 nanometer, like EUV or even X ray, like even narrower wavelength, like 7 nanometer or whatever wavelengths of light to then use in lithography tools. But those things are like massive particle accelerators that are then generating this light. It's a very complicated thing to build. So there's a couple companies and I think that that could be a big disruption to the industry beyond what EUV is. I don't necessarily think that like we're going to just magically build something new that is like direct. Right. And super simple and can be manufactured at huge volumes. Although there are some attempts to do things like this.
[96:51]
A
Yeah, because I asked. Because if you think about ELON codes in the past, rocketry was this thing that was thought to be. I mean, it is incredibly complicated.
[96:59]
B
Look, I'm just a naive yapper compared to Elon, right? What have I built? So maybe it's possible, right?
[97:04]
A
Yeah, yeah. In order to be able to build more memory in the future, could we build 3D DRAM the way we do 3D NAND and then go back to DUV?
[97:15]
B
This is the hope. Currently, everyone's roadmap for 3D DRAM is that you'll still use EUV because you want to have that tighter overlay. Because now when you're doing these subsequent processing steps, you want it to be, you know, everything's vertically stacked, you have more layers on top of each other and you want the pitches to be tighter and all these things. So generally people are still trying to do an EUV, but what 3D would do is it would take the, you know, hey, a single EUV pass, how many bits can it make? Right. If you do this sort of like calculation, and that number would go up drastically if you go to 3D dram. That is the hope. But right now everyone's roadmap is sort of like you go from current, it's called a 6F cell to a 4F cell and then finally 3D dram like by the end of the decade or early next decade. So there's still a lot of R and D and manufacturing and integration to be done. I wouldn't call that out of the cards. I think it's very much likely going to happen. It also is going to require a huge retooling of fabs. Right. The breakdown of tools in a FAB are very different. Right. Actually, the lithography tool is the only thing that isn't like that different. But the number of them relative to different types of chemical vapor deposition or atomic layer deposition, or dry etch, or different kinds of etch chambers with different chemistries. All of these things you have all These different kinds of tools for different process nodes. You can't just convert a logic fab to a DRAM fab or vice versa back and forth or a NAN fab to a DRAM fab in a short amount of time. And in the same way, existing DRAM fabs require a lot of retooling just to go from 1B or 1 Alpha to 1 Beta to 1 Gamma. Process nodes, because now they have to add DEUV and change the, the chemistry stacks for when you're using EUV in terms of deposition and etch. And the EUV tool has to be there. And furthermore, when you change to 3D dram, there's going to be an even larger shift. And so there's a lot of retooling of these fabs that needs to happen in terms of the tools. And so that would be a big disruption that would make EUV demand generally lower. But as we've seen across time, EUV demand as a percentage of wafer costs has trended up initially. Or lithography. Right, Lithography initially, I want to say in like 2014ish era was like 16% of the wafer cost, 17% and it's gone to 30 over the last 15 years. And for DRAM it was in the mid teens as well, or low teens. And now it's trended towards the high teens. And before we get to 3D DRAM, it'll likely cross into the 20s percentage range. But then if we get to 3D DRAM, it tanks again in terms of the total end wafer cost as a percentage of euv.
[99:39]
A
Yeah, I guess you care less about the percent of cost and more about how much it bottlenecks.
[99:44]
B
Right. But the percentage of cost is sort of a proxy.
[99:47]
A
Yeah, yeah, yeah. So if you're Jensen or Sam Altman or whoever, who stands to gain a lot from scaling up AI compute, there's these stories that they'd go to TSMC and say, hey, why can't we X, Y and Z? But I think the point you're making here is it doesn't really matter in some sense what TSMC does. And in fact, even if you have have intel and Samsung building more foundries, in the long run you're going to be bottlenecked by ASML and other toolmakers and other material makers. So first, is that correct interpretation? And second, then why should basically Silicon Valley people be going to the Netherlands to try to pitch ASML right now? Should they be trying to pitch ASML to make more tools so that in 2030 they can have more AI compute.
[100:30]
B
It's a funny dynamic. We saw in 23, 24 and 2025 people who saw the energy bottleneck before others asymmetrically went to Siemens, Mitsubishi and of course GE Vernova and bought up turbine capacity. And now they're able to charge excess amounts for deploying these turbines places because of energy. And in the same sense this could be done for euv. Except ASML is not just going to trust any random bozo who wants to buy EUV tools in the sense that these turbines are much cheaper than EUV tools and there's many more of them produced. Especially once you get to like industrial gas turbines or like, you know, not, not just combine cycle, but like the cheaper, smaller, et cetera, less efficient ones. People put down deposits for these. So in a sense someone could do this, right? Someone should go to the Netherlands and be like, I'll pay you a billion dollars, you give me the right to purchase 10 EUV tools two years from now, right? And I, I have, I'm first in line two years from now. And then over those two years you then go around and wait for everyone to realize, oh crap, I don't have enough EUV tools. And then you try and sell your option at some premium, but all you're effectively doing is you're saying asml, you're dumb, you weren't making enough margin on these. I'm going to make margin. And the question is, will ASML even agree to this? Right. And I'm like, I don't think so. Right.
[101:49]
A
But there's a world where they at least get the demand signal from that to increase production.
[101:54]
B
Potentially. Potentially.
[101:55]
A
I agree, but it sounds like you're saying, oh, they couldn't even increase production if they wanted to, given the.
[102:00]
B
Exactly. The market in which if they can't increase production, just like TSMC cannot increase production that fast and yet demand is mooning, then the obvious solution is to arbitrage this because you and I know demand is way higher than they're projecting and their capability to build. So then you arbitrage this by locking up the capacity and then sort of doing like a forward contract and, and, and then trying to sell it at a later date once other people realize actually shit, everything is fucked and we don't have enough capacity. And then you'll have like this insane margin that ASML and TSMC should have been charging. But the thing is, I don't know if ASML and TSMC will ever agree to this.
[102:34]
A
Okay, let me ask about power now. So it sounds like you think power can be arbitrarily scaled. Not arbitrarily, but yes, but beyond these numbers. And I think if I'm remembering correctly, your blog post on the power, how AI lives are increasing power, you were like where you were implying that G of Renova and Mitsubishi and Simons could produce and gas turbines was like, like 60 gigawatts a year. And then there's these other sources, but they're like less significant than the turbines. And so. And only a fraction of that goes to AI, I assume. So yeah. If in 2030 we have enough logic and memory to do 200 gigawatts a year, do you just think that these things are on a path to ramp up to more than 200 gigawatts a year? Or what do you see?
[103:20]
B
Yeah. So I mean right now we're at 30, right? Or 20. 2020. So this is critical IT capacity, by the way. Right. This is an important thing to mention. When I'm talking about these gigawatts, I'm talking about critical IT capacity. Server plugged in. That's how much power it pulls. But there's losses along the chain, right? There is loss on the transmission, there's losses on the conversion, there's losses on cooling, et cetera. And so you should gross this factor up from 20 gigawatts for this year or 200 gigawatts by the end of the decade to some number 20, 30% higher. And then you have capacity factors, right? Turbines don't run at 100%. In fact, if you look at PGM, which is the largest grid I think in America, sort of the Midwest, sort of Northeast kind of area. Ish, not the full Northeast, but anyways, pjm, they rate in their models for like hey, turbines, how much capacity we want to have excess, you know, roughly 20% capacity. In addition, in that 20% excess capacity, we're running all the turbines at 90% because they are derated some for reliability. Oh, things go down, maintenance, et cetera, et cetera, et cetera. So then in reality the nameplate capacity for energy is always way higher than the actual end. Critical IT capacity because of all of these factors. Yeah, but it's not just turbines, right? If you're just making power from turbines like that's simple, boring, easy, right? We're, you know, humans and capitalism is far more effective. And so the whole point of that blog was, yes, there's only three people making combined cycle gas turbines, but there's so much more we can do. Do Right. We can do aero derivatives. Right. We can take airplane engines and turn them into turbines as well. And there's even new entrants in the market, like Boom. Supersonic's trying to do that. Right. And they're working with Crusoe. And also there's all the other ones like that already exist in the market. There's medium speed reciprocating engines. Right. Engines that spin in circles. Right. So sort of like any diesel engine. Right. There's like 10 people who make engines that way. Right. So Cummins, you know, you know, at least I'm from Georgia and we, you know, people used to be like, oh man, you got a Cummins engine in there. You know, like, you know, regarding Ram truck. But it's like, well, actually automobiles manufacturing is going down. These companies all have capacity and could scale and convert that to for data center power. Right. Stick all these reciprocating engines. Yes. It's not as clean as combined cycle. Maybe you can convert them from diesel to gas if you want. But at the end of the day, these spinning engines. Oh, what about ship engines? Right. All of these engines for these massive cargo ships. Those are great. Nebius is doing that for a data center for Microsoft in New Jersey. Crazy, right? They're running these ship engines to generate power. Oh, there's Bloom Energy is doing fuel cells. We've been very positive on them for like a year and a half now because they have like such a capability to increase their production and their payback period for production increase is like very fast. Even if the cost is a little bit higher than combined cycle, which is like the best cost and efficiency, you know. And then, and then there's solar plus battery, which as these cost curves continue to come down, those can come online. There's wind and you know, of course the derating of those, you know, hey, when you put on a wind turbine, you might say, oh, I'm only going to expect 15% of the maximum power because things just oscillate. But you add batteries, there's all these things. And then the other thing is that like the grid is scaled for, you know, hey, we are not going to cut off power at peak usage, which is like the hottest day in the summer. But in reality that's a load spike that is 10, 15, 20% higher than the average. Well, if you just put enough utility scale batteries or you put peaker plants that only run a small portion of the year, then all of a sudden, and those could be gas, they could be industrial gas turbines, they could be combine cycle they could be any of the other sources of power I mentioned. They could be batteries. Then all of a sudden, you've unlocked 20% of the US grid for data centers, because most of the times that capacity is sitting idle. And it's really only there for that peak. Right. Which is. Is a day or two. Right. And it's a few hours of, like, maybe a few days of the full year is that peak. And so you just have enough capacity to absorb that peak load. And all of a sudden you've transferred all. And today, Data centers are only 3.4percent of the power of the US grid. And by 28, they'll be 10%. But if you can just unlock 20% of the US grid like this, like, it's like, not that crazy, you know? And the US Grid is terawatt level, not hundreds of gigawatts level. Right. So we can add a lot more energy. It's not easy. I'm not saying it's easy. These things are gonna be hard. There's a lot of hard engineering. There's a lot of risks that people have to take. There's a lot of new technologies people have to use. But Elon was the first to do this behind the meter gas. And since then, we've seen an explosion of different things that people are doing to get power. And they're not easy, but people are gonna be able to do them. And the supply chains are just way more simple than chips.
[107:56]
A
Interesting. So I guess he made the point during the interview that the specific blade for the specific turbine he was looking at, the lead times for that go out beyond 2030. And your point is that.
[108:06]
B
That's great. There's so many other ways to make energy.
[108:08]
A
Okay.
[108:08]
B
So you're like, just be inefficient. Like, it's fine.
[108:10]
A
Right. So you're like, right now, I guess combined cycle gas turbines have CAPEX of fifteen hundred dollars per kilowatt. And you're saying you could just. It would make sense to have either technologies that are much more expensive than that, or other things are getting cheap enough to that to make it competitive.
[108:25]
B
Exactly, exactly. You know, it can be as high as $3,500 per kilowatt even. Right. So it could be twice as much as the cost of combine cycle. And the total cost of the gpu, you know. You know, on a TCO basis, has gone up a few cents per hour. Right. Again, if we're. Because we've been talking about hopper pricing. $1.40 now becomes, you know. Oh, the power price doubles. Okay. The hopper that was $1.40 is now $1.50 in cost cost. It's like, oh, I don't care. Because the models are improving so fast that the marginal utility of them is worth way more than that 10 cent increase in energy.
[109:00]
A
Okay, and then so you're saying 20% of the grid so winter, what about 20% of that can just come online from utility scale batteries? Increasing what you'd be comfortable putting on
[109:12]
B
the regulatory mechanism there is like not
[109:13]
A
easy, but like that's 200 gigawatts. Like if that hypothetically happens. But you're saying on just from the different sources of gas generation, you mentioned the different kinds of engines and turbines combined, how many gigawatts could they unlock by the end of the decade?
[109:28]
B
Yeah. So we're tracking in some of our data where there's over 16 different manufacturers of power generating things just from gas alone. Right. So yes, there's only three turbine manufacturers for combined cycle, but we're tracking 16 different vendors and we have all of their orders and things like that. And it turns out there is just hundreds of gigawatts of orders to various data centers. As we get to the end of the decade, we think like something like half of the capacity that's being added will be behind the meter. And when we look at like a lot of this is actually behind the meter is almost always more expensive than grid connected. But there's just a lot of problems with getting grid connected and you know, permits and interconnection queues and all this sort of stuff. So ended up being even though it's more expensive, people are doing behind the meter and then what they're doing behind the meter with ranges widely. Right. It could be reciprocating engines, it could be ship engines, it could be aero derivatives, it could be combined cycle. Although combine cycle's not that great for behind the meter. It could be Bloom Energy fuel cells, it could be solar plus battery. Right. Like it could be any of these things.
[110:28]
A
You're saying any of these individually could do like tens of gigawatts.
[110:33]
B
Any of these individually will do tens of gigawatts. And in a whole they will do hundreds of gigawatts watts.
[110:37]
A
Okay. So that, that alone should more than,
[110:40]
B
I mean it's going to take, I mean like electric wages probably double or triple again. Right. And like there's going to be a lot of new people entering that field and there's going to be a ton of people who make money. But it is something that I don't like I don't see that as the main bottleneck. Right.
[110:53]
A
So right now In Abilene, the 1.2 gigawatt data center that Crusoe is building for OpenAI, I think they have like 5,000 people working there, or at peak they had to. And if you turn that into 100 gigawatts and I'm sure things will get more efficient over time, but that would be like 400k people it would take to build 100 gigawatts. And if you think about the US labor force of how many electricians there are, how many construction workers there are. Yeah, I guess there's like 800k electricians. I don't know if they're all substitutable in this way. There's millions of construction workers. But if we're in a world where we're adding 200 gigawatts a year, are we going to be crunched on labor eventually? Or do you think that is actually not a real constraint?
[111:37]
B
So labor is a humongous constraint in this. People have to be trained. Likewise, we probably start importing the highest skilled labor in this way because now it makes sense that, hey, a really high skilled electrician in Europe who was working on destroying power plants now comes to America and is building data center, high voltage electricity, you know, power moving across the data center. Right. Something like this. Right. Humanoid robots maybe start to, or robotics at least start to. But the main factor is going to be for reducing the number of people is modularizing things and making them in factories in Asia, unfortunately. But you know, at least for America, but you know, Korea, Southeast Asia, in many ways China as well. But you know, these areas are going to do, are going to ship more and more built out sections of the data center and those will be shipped in. Right. Maybe today you currently ship servers in or a rack in and then you plug that into different pieces that you're shipping from different places. But now you'll ship it to a factory and integrate the entire, hey, maybe this is a 2 megawatt block and this block goes from high voltage power to the, the voltage power that you, the voltage and maybe DC that you deliver to the rack instead of being AC and high voltage. Right. Or something like this. Right. Or cooling. You ship a fully integrated thing that has a lot of the cooling subsystems already put together or because plumbers are also a big constraint here. Or furthermore, you take instead of just a single rack and now you have people wiring up all these racks of power and electricity and blah, blah, blah, blah, blah. You take A skid. And you, you put an entire row of servers and that is shipped from the factories. And today a single rack may be 120, 140 kilowatts, but as we get to next generation Nvidia Kyber and things like that, it's almost a megawatt. And then in addition, if you do an entire row, it'll have the rack, it'll have the networking, and it'll have the cooling and the power racks all integrated together. So now when you come in, actually you have much less stuff to cable, whether it be networking fiber, whether it be the power. Right. There's fewer power that power things to connect and then there's fewer plumbing things to connect. Right. And so this drastically can reduce the amount of people working in data centers, and therefore the capability to build these will be much larger. And along the way there will be new things, mean some people move faster to new things, some people move slower. Right. Crusoe and Google have been talking a lot about this modularization, as has people like Meta and many others have been talking a lot about this modularization. And others are going to be slower to doing it, but at the end of the day. And people who move faster to new things may have more delays or people who are slower have labor problems. So there will always be dislocations in the market because this is a very complex supply chain. At the end of the day, it's still simple enough that we will be able to solve it through capitalism and human ingenuity on the timescales that are required.
[114:39]
A
Yeah. Okay. So speaking of big problems to solve, Elon Musk is very bullish on space GPUs, if you're right, that power is not a constraint on Earth. I guess the other reason they would make sense is that even you can. There is enough, there will be enough gas turbines or whatever to build it on Earth. I think Elon's next argument then is like, you can't get the permitting to build hundreds of gigawatts on Earth. Do you buy that argument?
[115:03]
B
Land wise, it's pretty. America's big data centers don't take that much space. You can solve that permitting wise. Air pollution permits are a challenge, but the Trump administration's made it much easier. You go to Texas and you can skip a lot of this red tape. And so, you know, Elon, Elon had to deal with a lot of like this complex stuff in Memphis and then building a power plan across the border and all these things for Colossus 1 and 2. But at the End of the day, there's a lot more you can get away with in the middle of Texas, right?
[115:32]
A
Given that Elon lives in Texas, why didn't he just go to Texas?
[115:35]
B
I think, I think it was partially like they over indexed on grid power for a temporary period of time, right? Because that's just what they thought they needed more of because they had an
[115:43]
A
aluminum refinery connected to the grid there.
[115:46]
B
It was an appliance factory that was idled. But I think they may have indexed more to what was grid power. They may have indexed more to water access and gas access. Because actually I think they bought that knowing that the gas line was right there and they were going to tap it. Same with water. It was a whole host of different constraints. It was probably an area where electricians and things like that were easier to find. But at the end of the day, I'm not exactly sure why they chose that site. I bet Elon would have chosen somewhere in Texas if he could have like gone back. But because of the regulatory faces, these challenges, these challenges he's faced, it's ultimately like permitting is a challenge, but America is a big place and there are 50 states and things will get done and there are a lot of small jurisdictions where you can just transport in all the workers that you need for a temporary period of six months to a year, depending on the type of contractor. It can be even three months for, depending on the type of the contractor that's coming in and put them in temporary housing, pay out the butt. Because labor is very cheap relative to the GPUs and the power, or not the power, but the GPUs and the like, the networking and so on and so forth and the end value of the tokens it's going to produce. So all of these things have plenty of room to be paid for. And so I think it's fine. Right. And also people are diversifying now, right? Australia, Malaysia, Indonesia, India, these are all places where data centers are going up at a much faster pace. But currently still 70% plus of the AI data centers are in America. And that continues to be the trend. And so I think people are figuring out how to build these things. And permitting. I just like ultimately like permitting and red tape in middle of nowhere Texas or middle of nowhere Wyoming or middle of nowhere like New Mexico is probably a hell of a lot easier than sending stuff into space, right?
[117:30]
A
Well, other than the fact that the economic argument makes less sense once you consider the fact that energy is a small fraction of the cost of ownership of a data Center. What are the other reasons you're skeptical?
[117:41]
B
Yeah. So obviously power's free in space.
[117:43]
A
Basically, that's the reason to do it.
[117:45]
B
Yeah, that's the reason to do it. But then there's all the other counterarguments, right? Which is because even if power costs double, you're still at a fraction of the total cost of the gpu. The main challenges is, and what we've seen that disperses, right? We have clustermax, which rates all the NEO clouds, and we test them. We test over 40 cloud companies, including the hyperscalers and neoclouds. What differentiates some of these clouds the most outside of software is their ability to deploy and manage failure. Right. GPUs are horrendously unreliable. Even today, 15% of Blackwells or so that get deployed have to be rma. You have to take them out, you have to maybe just plug them and plug them back in, but sometimes you have to take them out and ship them to Nvidia or rather their partners who do these RMAs and such.
[118:28]
A
What do you make of Vlan's current argument? That once you have the initial after initial phase, they actually don't fail that much.
[118:34]
B
Sure. But now you've done this, you've tested them all, you deconstructed them, put them on a spaceship, fucking put them into space, and then put them online again. That's months, right? And if your argument is that, hey, GPUs have a useful life of X years, right? If a GPU has a useful life of, of five years and it takes three additional months, probably six, let's say six additional months, then that is 10% of your cluster's useful life. And because we're so capacity constrained that compute is most valuable theoretically in the first six months, you have it because we're more constrained now than in the future, because that COMPUTE now can contribute to a better model in the future or can contribute to revenue now, which you can use to raise more money to get better, you know, all these sorts of things. Now is always the most important moment. And so you've delayed your compute deployment by six months, potentially. And the thing that separates these clouds is we see clouds that take six months to deploy GPUs today on Earth, right? We see clouds that take a lot less than six months. Right. And so the question is, where does space get in there? I don't see how. You would test them all on Earth, deconstruct them and ship, ship them and shoot them into space. And it not take longer than just putting them in the spot that you were testing them.
[119:45]
A
Yeah. So the question I wanted to ask is the topology of space communication. So right now, Starlink satellites talk to each other at 100 gigabits per second. And you could imagine that being much higher with optical inner satellite laser links that are optimized for this. And that actually ends up being quite close to the infiniband bandwidth, which is like 400 gigabytes a second.
[120:09]
B
Right, but that's per GPU, not per rack.
[120:11]
A
I see.
[120:12]
B
Okay, so multiply that by 72. Also like that was Hopper when you go to Blackwell and Rubin, that two X's and two X's again.
[120:19]
A
All right, but how much compute is happening per, like during inference? Are the different scale ups still working together or is it just happening? It's a batch within a single scale up.
[120:31]
B
A lot of models fit within one scale up domain, but many times you split them across multiple scale up domains. I think, think that you really have to. As models become more and more sparse, at least this is like the general trend, then you want to ping just a couple experts per gpu. And if leading models today have hundreds, if not thousand experts, then you want to run this across hundreds of chips or thousands of chips, even as we continue to advance into the future. And so then you end up with this problem of, well, now you need to connect all these satellites together, comms wise as well.
[121:09]
A
Okay, so that would be tough because I was imagining if there's a world where you could do inference for a batch on a single scale up, then maybe it's more plausible. But if not, then it's easy to.
[121:21]
B
Yeah, I mean, networking these chips together is a problem. And you can't just make the satellite infinitely large. Right. There are a lot of challenges with physics to making a satellite really big. Right. So then these inner. That's why you need these inner interconnects between the satellites. Those interconnects are more expensive than the, you know, a cluster like 20% of the cost or 15% of the cost is networking. All of a sudden now you're making it like space lasers instead of like pretty simple. Like lasers that are manufactured in millions of volumes with, you know, pluggable transceivers. And those things are very unreliable as well. More unreliable than the GPUs, by the way, across the life of a cluster, you have to unplug, clean it all the time. Right, right. Unplug, replug it just for random reasons. These things are just not as reliable. So you've got that problem as well, like you've got a more expensive, complicated space laser to communicate instead of this pluggable optical transceiver that's been in super high volume.
[122:11]
A
Okay, so all in all, what does that imply for space data centers?
[122:14]
B
So space data centers effectively are not limited by, you know, hey, we have this energy advantage. It's actually just limited by the same contended resource. We can only make 200 gigawatts of chips a year by the end of the decade. So what are we going to do to get that capacity? It doesn't matter if it's on land or in space. It doesn't really matter. Right. Because you can build that power. And I think human capabilities and capacity could get to the period where we're adding a terawatt a year globally of various types of power. At some point we do cross the chasm where space data centers make sense. But it's not this decade. Right. It is much further out. Once you have energy constraints actually being a big bottleneck, once you have space, land permitting, be a much bigger bottleneck as it subsumes more and more of the economy. And chips are no longer the bottleneck because chips are the biggest bottleneck. And so you want them deployed, working on AI the moment they're done being manufactured. And so there's a lot of things people are doing to increase that speed faster and faster, whether it be modularizing data centers or even modularizing racks where you actually put the chip in at the data center, but only the chip and everything else is already wired up and ready to go at the data center. So there's things like this that people are doing to decrease that time that you cannot do in space. And at the end of the day, all that matters in a chip constrained world is get these chips working on producing tokens ASAP in a world. And maybe 2035, once the semiconductor industry and ASML and Zeiss and all these other suppliers, LAM Research, Applied Materials, FAB manufacturers, pendulum swings and they're able to make enough chips and really we're optimizing every dial and it makes sense to optimize the 10% of energy costs or 15% of energy costs or as we move to Asics potentially and Nvidia's margins aren't 70 plus percent, maybe that energy cost is 30% of the cluster and fab construction, all this like these are the things or data center construction, these are the things to optimize. But that's not a, you know, Elon doesn't win by doing, you know, 20% gains. Elon never wins that way. Elon wins when he swings for the fences and does 10x gains. Right. That's what SpaceX is about. That's what Tesla was about. That's what all of its success has been about. Right. It's not been about these chasing the 20%. So I think, I think space data centers will eventually be a 10x game, potentially as, as Earth resources get more and more contentious. But that's not this decade.
[124:40]
A
Yeah, I mean, I think just to drive some intuition about how much land there is on Earth, obviously the ships themselves, especially if we move to a world where you have racks that have megawatts, megawatt each charge, like literally, it's not even a rhino factor.
[124:52]
B
That's the other thing. Right. The power density, you know, if chips are in manufacturing is the constraint right now, roughly it's 1 watt per millimeter squared for AI chips and such. One easy way is to pump that to 2 watts per millimeter squared. Now, you may not get 2x the performance. You may only get 20% more performance. And that requires much more exotic cooling. Right. It requires more complicated cold plates and very complicated liquid cooling. Or maybe it requires things like immersion cooling. But in space, higher watts per millimeter is very difficult, whereas on Earth, these are solved problems. And one of these things enables you to get a lot more tokens. Maybe it's 20% more tokens per wafer that's manufactured. And that's a humongous wave to a millimeter.
[125:33]
A
You mean of diarrhea?
[125:34]
B
Yeah, of diarrhea. Square millimeters of diarrhea.
[125:36]
A
I mean, it would be better for space because if you can run more watts per millimeter would be the chip runs hotter. And the hotter the chip, I guess this is a question of computer chip engineering. But, but it like, it cools to the power of fourth by Stefan Boltzmann's law. So if you can run a very hot chip, because it allows a lot of.
[125:51]
B
You can't run it hotter, you can only run it denser. And the problem is getting the heat out of that dense area means you, you have to move away from standard air cooling and liquid cooling to more exotic forms of liquid cooling or even immersion to get to higher power densities. And that's more difficult in space than it is on Earth. Yeah.
[126:08]
A
And maybe it's at this point worth explaining what exactly a scale up is and what it looks like for Nvidia versus Trainium versus TPUs.
[126:20]
B
Yeah. So earlier I was mentioning how communication within a chip is super fast. Communication within chips that are in the same rack is fast, but is not as fast. And then you know, it's on the order of terabytes. And then communication very far away is on the order of gigabytes, hundreds of gigabytes, right? So this, this order of magnitude as you get further distance compute and maybe across the country it's on the order of gigabytes a second, right? Scale up domain is this like tight domain where the chips are communicating on the order of terabytes a second. And so for Nvidia previously This meant an H100 server had eight GPUs and those eight GPUs could talk to each other at terabytes a second. With Blackwell and VL72 they implemented Rack scale scale up and that meant all 72 GPUs in the rack could connect to each other at terabytes a second speed. And the speed doubled gen on gen. But also the most important innovation they did was going from 8 to 72 in the domain. When we look at Google, their scale up domain is completely different, right? It has always been on the order of Thousands, right? With TPUV4 they had pods the size of 4000 chips. With V8 they have pods even, you know, or V7 they have pods in the 7000 or sorry, 8000, 9000 range. And what's relevant here is that it's not the same as Nvidia. It's not like for like Google has a topology that's a tourist, right? So every chip connects to six neighbors rather than Nvidia. The 72 GPUs connect all to all, right? So they can send terabytes a second to each other, to any arbitrary other chip in that pod of scale up. Whereas Google you have to bounce through chips, right? So this means if TPU1 one needs to talk to TPU76 then it has to bounce through various chips. And there is always some blocking of resources when you do that because that one TPU is only connected to six other TPUs. And so there's a difference in topology and bandwidth and there are trade offs and advantages of both, right? Google gets to have a massive scale up domain, but then they have the trade off of you have to bounce across chips to get from one chip to another. You can only talk to six direct neighbors. And so there is like this trade off in Amazon has mutated their scale up domain. They're somewhere in between Nvidia and Google effectively where they're trying to make larger scale up domains. They try and do all to all to Some extent, which is what with switches, which is what Nvidia does. But also to some extent they use Taurus topologies like Google does. And as we advance forward to next generations, all three of them are moving more and more towards a Dragonfly topology. Which means there's sort of like there is some fully connected elements and there's some elements that are not fully connected. So you can get the scale up to be be hundreds or thousands of chips, but also have it not contend for resources when you're bouncing through chips.
[128:57]
A
Related question. I heard somebody make the claim that the reason that parameter scaling has been slow and only now are we getting bigger and bigger models from OpenAI and Anthropic. Is that so original GPT4 is over a trillion parameters and and only now are models starting to approach that again. And I heard a theory that the reason is that Nvidia's scale ups have just not had that much memory capacity. And so what was the claim exactly? If you have say one, let's say you have a 5T model running at FP8. So that's 5 trillion gigabytes. Yeah. And then you have the KV cache, let's say it's like just call it the same size. Okay, let's say it's the same size for one batch. So you need 10 gigabytes, sorry, 10 terabytes to be able to run a single forward pass. Yeah. And then only with the GB 200 and VL 72 do you have an Nvidia scale up that has 20 terabytes and before that they were much smaller. Whereas Google on the other hand has had these huge TPU pods that are not all to all, but still have I think hundreds of terabytes of capacity in a single scale up. So does that explain why parameter scaling has been slow?
[130:16]
B
I think it's partially the capacity and bandwidth, but also as you build a larger model, the ability to deploy it is slower. Right. Like in terms of like hey, what is the inference speed for the end user? That's kind of irrelevant. What's really relevant is rl and what we've seen with these models and allocation of compute at a lab is sort of there's a few main ways you can allocate components. Compute, you can allocate it to inference, I. E. Revenue, you can allocate it to development, I. E. Making the next model and you can allocate it to research and in development specifically you split it between pre training and rl. Right. And so when you think about hey, what exactly is happening well, the model, the compute efficiency gains you get from research are so large, you actually want most of your compute to go to research, not to development. Because you know, all these researchers are generating new ideas, trying them out, testing them, and continuing to march along this and push the Pareto optimal curve of scaling laws further and further and further. And at least what we've seen empirically is like model cost gets 10x cheaper every year or even more than that, which at the Same scale gets 10x cheaper or to reach new frontiers that cost the same amount or more, right? So you don't want to train, you don't want to allocate too many resources to pre training and post. And rl, you actually want to allocate most of your resources to research. And then in the middle is this sort of this like development period. If you pre train a 5 trillion parameter model now you have to spend all this time, how many rollouts do you have to do in these RL's? And these rollouts for a trillion parameter model versus a five trillion parameter model are five times larger. Which then means it takes, if you wanted to do as many rollouts, maybe the larger model is more sample efficient. Let's say it's 2x more sample efficient. Okay, great. Now you need 2 and a half as much time of RL to get the model smarter. Or you could RL the smaller model for 2x the time and you'd still have a 50% or you'd still have a 25% difference in the big model, which is 2x more sample efficient and doing x number of rollouts versus the small model, which is a trillion parameters doing, although it's less sample efficient, is doing twice as many rollouts. It's still done faster. And so you get the model faster sooner and you've done more rl and then you can take that model to help you build the next models, help your engineers train and do all these research ideas. And so this feedback loop is actually weighed towards smaller models in every case, no matter what your hardware is. And then as you look to Google, Google does deploy the largest production model of any of the major labs, right? With Gemini Pro, it is a larger model than GPT5. Four, it's a larger model than Opus. And so you end up with yes, Google does this because they have a unipolar set of compute, right? Almost all TPU. Whereas Anthropic is dealing with H1 hundreds, H2 hundreds, Blackwell Trainiums TPUs of various generations, right? And OpenAI is dealing with mostly Nvidia right now, but going towards having AMD and Trainium as well. The fleets of compute like Google can just optimize around a larger model and they can leverage 1000 chips in a scale up domain to get the RL speed much faster so that you can actually have this feedback loop be fast. But at the end of the day in isolation, you almost always want to go with a smaller model that gets rl'd faster and gets deployed into research and development so you can build the next thing and get more compute. Efficiency wins. And then this compounding effect of oh, I made a smaller model that I rl'd more that I then deployed into research and development earlier and I spent less compute on the training itself because I was able and I was able to allocate more compute to the research. This compounding effect of being able to do the research faster and faster and faster is potentially a faster takeoff. And that's all these companies want is fastest takeoff possible.
[134:06]
A
Okay, spicy question. You're explaining semi analysis sells these spreadsheets and you're always like ah, six months ago or a year ago we told people the memory crunch, or now you're telling people the clean room crunch and then the in the future the tool crunch. Why is Leopold the only person that is using your spreadsheets to make outrageous money? What is everybody else doing?
[134:30]
B
I think there are a lot of people making money in many ways. I think obviously Leopold, Leopold jokes that, you know, he's the only client of mine that tells me our numbers are too low. Everyone else tells me our numbers are too high almost ad nauseam, you know, whether it's a hyperscaler, saying hey, that other hyperscaler, their numbers are too high, you know, and we're like, nah, that's it. And they're like no, no, no, no, it's impossible, blah blah, blah. And then you're like, finally have to convince them through all these facts and data when we're working with hyperscalers or AI labs that in fact, no, that number isn't too high. That's correct. But event and eventually like sometimes it's like six months later it takes them to realize or a year later. I think, I think other clients like on the trading side also use our data. Right? We sell data to a lot of, you know, I think roughly 60% of my business is industry. So AI labs, data data center companies, hyperscalers, semiconductor, the whole supply chain across AI infrastructure. But then like 40% of our revenue is like hedge funds, right? And I'm not going to comment on who our customers are. But I think a lot of people use the data. It's just how do you interpret it? And then what do you, like, view as beyond it? And I will say Leopold is pretty much the only person who tells me my numbers are too low always. And sometimes he's too high, sometimes I'm too low. Right. But in general, I think other people are, you know, doing that. And you can check certain, you can, you can look across the space at hedge funds and look at their 13 Fs and see actually they own maybe not exactly what Leopold does, because it's always like a question of like, what is the most constrained thing, what's the thing that's going to be that's most outside of expectations, and that's what you're really trying to exploit is inefficiencies in the market. And in a sense, what our data shows is, is like making the market more efficient by making the base data of what's happening more accurate versus like, but, but in a sense, I think many, many funds do trade on information that is out there. And it's not. I don't think, I don't think Leopold's the only person. I think he has the most conviction on the entire, in the entire. Like, about the AGI takeoff though. Right?
[136:33]
A
Right. I mean, but the bets are not about like, what happens in 2035. The bets that you're making that are at least exemplified by public returns we can see for different funds, including Leopolds, are about what has happened in the last year, year and the last year stuff could be predicted using your spreadsheets. Right. So it's like, it's less about, it's about buying like the next year of spreadsheets.
[136:53]
B
Not just spreadsheets. You know, there's reports, there's API access to the data. There's a lot of data. But anyways, you know, I think.
[136:58]
A
Do you see what I mean? Like, it's like, it's not about some crazy singularity thing. It's about like, oh, do you buy the memory crunch?
[137:03]
B
A simple one though, is like, you only buy the memory crunch if you believe AI is going to take off in a huge way. And the memory crunch, a lot of it was predicated on like, you know, at least for like people in the Bay Area who think about infrastructure, it's like obvious KV cache explodes as context lenses get longer, so you need more memory. And then you do the math and you, and you also have to have a lot of supply chain understanding of like, what fabs are being built and what data centers are being built and how many chips and all these things. And so we track all these different data sets, like very tightly. But at the end of the day, it takes, you know, someone to fully believe that this is going to happen. Like, I think a year ago if you told someone memory prices were quadruple and smartphone volumes are gonna go down 40% over the year or two after that, people are like, you're crazy. That never happened. Except a few people do believe that. And those people did trade memory, right? And people did. I don't think Leopold is the only person buying memory companies. I think there are a lot of people buying memory companies. He, of course, sized and positioned and did things in better ways than some, maybe most. Right. They don't want to comment on whose returns are what. But he certainly did well. But other people also did really well, right? Trying to be like this. Wow, you've made me diplomatic for the first time ever. No, no, you're fine. I think it's hilarious, right? I'm being a diplomat, you know, whereas usually I'm like spicy.
[138:25]
A
Yeah, okay, maybe some rapid fire to close out. Can tsmc, if you're saying, look, the, the memory logic, et cetera, the N3 is mostly going to be AI accelerators, but then there's N2, which is mostly Apple now. And then in the Future, I guess AI would also want to go on N2. Can they kick out Apple if Nvidia and Amazon and Google say, hey, we're willing to pay a lot of money for N2 capacity.
[138:59]
B
So I think the challenge of this is chip design timelines take a while. And so that's more than a year. And the designs that are on 2 nanometer are more than a year out. And so what would really happen is Apple or sorry, Nvidia and all these others will be like, hey, we're gonna prepay for the capacity and you're gonna expand it for us. And then Apple would be. And maybe TSMC takes a little bit of margin, but not a ton. They're not gonna kick Apple out entirely, right. What they're gonna do is when Apple orders X, they might say, hey, we project you only need Y or X minus one. And so that's what we're gonna give you. And then that flex capacity, Apple's kind of screwed on. Whereas traditionally Apple's always over ordered by like 10% and cut back by 10% over the course of the year. And some years they, they hit the entire 10%. Just, you know, volumes vary Right. Based on the season, macro blah blah, blah, blah blah. And so I don't think TSMC would kick out Apple. I think Apple will become a smaller and smaller and smaller percentage of TSMC's re revenue and therefore be less relevant for TSMC to cater to their demands. And TSMC could eventually start saying hey, you got to pre book your capacity for next year for two and you have to prepay for the capex because that's what Nvidia and Amazon and Google are doing.
[140:08]
A
Yeah, I wonder if it's worth going to specific numbers on like I don't have any of them on hand of like how many N2 wafers or what percentage of N2 does Apple have its hands on versus over the coming years versus AI?
[140:21]
B
Yeah, I mean this year Apple has the majority of N2 that's going to get fabricated. There's a little bit from amd. They are trying to make some AI chips and CPU chips early. There's a little bit, but for the most part it's, it's Apple. And as we go forward to the year after that, Apple still gets closer to half of it as other people start ramping, but then it falls drastically. Right. Just like for N3 they were half. And when I say N2 that includes a 16 which is a variant of N2. Over time those nodes will be the majority. And what's also interesting is traditionally Apple's been the first to a process node. 2nm actually the first time they're not, well, besides Huawei. Right. Huawei back in 2020 and before was the first with Apple, but they were both making smartphones. Now with 2 nanometer you've got AMD trying to make a CPU and a GPU chiplet that they used advanced packaging to package together in the same timeframe as Apple. And this is a big risk for AMD that causes potential delays potentially because it's a brand new process technology. It's hard but at the end of the day this is, this is a bet that they want to do to scale faster than Nvidia and try and beat them as we move forward. Actually when we move to the A16 node, the first customer there is not even Apple, it's AI. And as we move forward that will become more and more prevalent. Not only will Apple not be the first to a node, they will also not be the majority of the volume to the new node and then they'll just be like any old customer. And because the scale of TSMC's CAPEX keeps ballooning. But Apple's business is kind of not growing at the same pace. They become become a less and less relevant customer and they also will just cut their orders because things in the supply chain are kicking them out. Whether it be packaging or materials or DRAM or nand, these things are increasing in cost. They can't pass on all the cost to customers, likely because the consumer is not that strong. And you end up with like this conundrum where they are just not Apple TSMC's best bud like they have been historically.
[142:20]
A
Do you think if Huawei had access to 3 nanometer they would have a better accelerator than Ruben?
[142:26]
B
Potentially, yeah. I think Huawei, they were the first with a 7 nanometer AI chip as well. They were the first with a 5 nanometer mobile chip, but they were the first with a 7 nanometer AI chip. The Huawei Ascend was like two months before the TPU and like four months before Nvidia's. I want to say, was it V100 or a 100? A 100, I think. And so I mean that's just moving to a process. No, that doesn't imply software, doesn't imply hardware design, all these other things. But Huawei is arguably the only company in the world that has all the legs, right? Huawei has cracked software engineers, Huawei has cracked networking technologies. That's in fact their biggest business historically. Right. And they have cracked AI talent. But furthermore beyond Nvidia they actually have better AI researchers. And furthermore beyond Nvidia they, they have their own fabs and furthermore beyond Nvidia they have their own end market of selling tokens and things like that. And Huawei tends to be, they're able to get the top, top, top talent. Nvidia is as well, but not as much concentration. And Huawei has a bigger pool in China. It's very arguable that Huawei, if they had tsmc, would be better than Nvidia. And there are areas where China has advantages outside of in areas that Nvidia can't access as easily right around not just scale, but also like some things around certain optical technologies China's actually really good at. So there's certain. I think it's very reasonable that if in 2019 that issue that Huawei was not banned from using TSMC, Huawei would. Huawei had already eclipsed Apple as the biggest TSMC customer. And Huawei has huge share in networking and compute and CPUs and all these things. They would have kept gaining share and they'd likely be TSMC's biggest customer.
[144:14]
A
Wow, that's crazy. I've got kind of a random final question for you. So the other part of the Elon interview was robots. And so if humanoids take off faster than people expect, if by 2030 there's millions of humanoids running around which each need local compute, Any thoughts on what that implies? What would be required for that?
[144:37]
B
You know, there's a lot of difficulties with like the VLMs and all these things that people VLAs, that people are deploying on robots. But to some extent, you don't need to have all the intelligence in the robot and it would be much more efficient to not do that. Right? Because in the server, in cloud, you can batch process and all these things. So what you may want to do is, hey, a lot of the planning and longer horizon tasks are determined by a much more capable model in the cloud that runs at very high batch sizes. And then it pushes those directions to the robots, who then interpolate between each subsequent action or is given like, hey, pick up that cup. And then the model on the robot can pick up the cup. And it's like, as it's picking up, it's like, oh, you know, in fact, this, you know, things like weight and all these things might have to be. And like force may have to be like, determined by the model on the robot. But not everything needs to be like, you know, hey, pick up the robot. You know this, right? Or like, hey, that's a headphone. Actually, I'm the supermodel in the cloud. I know that this headphones are, you know, Sony XM6s, which is not a dorkish ad spot. But, you know, why is this guy
[145:42]
A
plugging this thing so hard? It's like on the table, it's like on his neck when we're entering satya together. Like, is he getting paid by Sony?
[145:51]
B
Unfortunately not. Unfortunately not. But anyways, like, you know, it might say, hey, the headband is soft and this is the grant, this is the weight of it and all these things. And then the model on the robot can be less intelligent and take these inputs and do the actions. And it may get told by the model in the cloud every second, every ten times a second. Maybe you know, depends on the hertz of the action. But a lot of that can be offloaded to the cloud because otherwise, if you do all of the processing on the device, I believe it'll be more expensive because you can't batch two. You couldn't have as much intelligence as you do in the cloud because the models will just be bigger in the cloud. And three, we're in a semiconductor shortage world, and any robot you deploy needs leading edge chips because the power is really bad for robots, right? You need it to be low power and efficient. And all of a sudden you're taking power and chips that would have been for AI data centers and you're putting them in robots. So now that 200 gigawatts gets lower if you're deploying millions of humanoids.
[146:43]
A
I think this is very interesting because something people might not appreciate about the future is how centralized, in a physical sense intelligence will be, where right now with humans, your compute, there's 8 billion humans, and their compute is on their heads, on their person. And in the future, even with robots that are out physically in the world world, I mean, obviously knowledge work will be done in a centralized way from data centers with huge, like, hundreds of thousands of instances or maybe millions of instances. But even for robotics, the future you're suggesting is one where there's, like, more centralized thinking and centralized computation that's driving, you know, millions of robots out in the world. And so I think that just like, yeah, that's an interesting fact about the future that I think people might not appreciate.
[147:29]
B
I think Elon Musk recognizes this, which is why he's, like, going to different places for his chips, right? He signed this massive deal with Samsung to make his robot chips in Texas because he thinks, you know, like, I. I personally think he thinks that, you know, Taiwan risk is huge. And because of that and the centralization of resources in Taiwan, him having his robot chips in Texas and also being a separate supply chain that is not as constrained by. No one's making AI chips, really on Samsung. Besides Nvidia's new LPU that they launched. They're launching it next week, but we're recording it the week before.
[148:03]
A
It's coming out this week. This episode's gonna be on Friday.
[148:05]
B
Oh, this episode's coming out before Sick. So they're launching this new AI chip next week, which is built on Samsung, but that's like sort of a recent development from Nvidia. And then that's the only other AI, like AI demand there. Whereas on tsmc, everything is competing. So he gets this like, both geopolitical diversification, but also supply chain diversity for his robo robots. And he's not competing as much with the willingness to pay of infinity of the data center of geniuses.
[148:34]
A
Okay, final question. On Taiwan, if we believe that tools are the ultimate bottleneck, how much of Taiwan's place in the AI semiconductor supply chain could we de risk simply by having a plan to airlift every single process engineer at TSMC Apple out when things come to if they get blockaded or something? Or do you actually still need to ship out the EUV tools which would be multiple plane loads per single tool and would not be practical if you
[149:04]
B
ship out all the process engineers and assuming it's hot enough that you destroy the fabs. No one has all the fabs in Taiwan now, which is a big risk. Right. These tools actually use a lot of semiconductors which are manufactured in Taiwan. So it's like a snake eating its own tail sort of meme. Because. Because you can't make the tools without the chips from Taiwan, which you can't use without the tools in Taiwan. There's obviously some diversification there and they don't use super advanced chips in lithography tools. But at the end of the day, there is some tail eating the dragon. Just shipping out all the engineers and blowing up the fabs means China has a stronger semiconductor supply chain than the rest of the world in terms of verticalization. Now that you've removed Taiwan and, and now you've got all the know how, but you've got to replicate it in, let's say Arizona or wherever for tsmc. And it's going to take a long time to build all the capacity that TSMC has had built over the years. And so you've drastically slowed US and global gdp, not just growth, you've shrank the GDP massively and you've got a lot bigger problems. And your incremental ability to add compute goes to almost zero. Right. Instead of hundreds of gigawatts a year, by the end of the decade, let's say by the end of the decade, something happens to Taiwan. Now you're at maybe like 10 gigawatts across intel and Samsung or 20 gigawatts. It's like nothing. But now all of a sudden you've really caused some crazy dynamics in AI. Of course you have all the existing capacity, but that existing capacity pales in comparison to the capacity that's being expanded. Yeah.
[150:38]
A
Okay. Dylan, that was excellent. Thank you so much for coming on the podcast.
[150:41]
B
Thank you for having me. And see you tonight.
[150:43]
A
Yes.