
Nvidia’s $5 billion investment in Intel is one of the biggest surprises in semiconductors in years. Two longtime rivals are now teaming up, and the ripple effects could reshape AI, cloud, and the global chip race. To make sense of it all, Erik Torenberg is joined by Dylan Patel, chief analyst at SemiAnalysis, joins Sarah Wang, general partner at a16z, and Guido Appenzeller, a16z partner and former CTO of Intel’s Data Center and AI business unit. Together, they dig into what the deal means for Nvidia, Intel, AMD, ARM, and Huawei; the state of US-China tech bans; Nvidia’s moat and Jensen Huang’s leadership; and the future of GPUs, mega data centers, and AI infrastructure.
Loading summary
Dylan Patel
How you buy GPUs is like buying cocaine. You call up a couple people, you text a couple people, you ask, you know how much you got, what's the price?
Guido Appenzeller
If you're two arch nemesis suddenly team up, it's the worst possible news you can have. I did not see this coming. I think it's amazing development, like a.
Sarah Wang
Warren Buffett coming into a stock. Jensen is like the Buffet effect for the semiconductor world.
Dylan Patel
It's kind of poetic that everything's gone full circle and intel sort of crawling to Nvidia.
Host (Marc Andreessen or Chris Dixon)
Today we're talking about one of the biggest surprises in semiconductors in years. Nvidia just put $5 billion into Intel. Two long term rivals now teaming up on custom data centers and PC products. A deal nobody saw coming. For Nvidia, it's the Buffett effect. For intel, it's a lifeline. And for AMD ARM and the global chip race, the fallout could be massive. To break it all down, I'm joined by Dylan Patel, chief analyst at Semianlysis, Sarah Wang, general partner at A16Z and Guido Appenzeller, partner at A16Z and former CTO of Intel's data center and AI business unit. Let's get into it. Dylan, welcome back to the podcast.
Dylan Patel
Thanks for having me. Yeah.
Guido Appenzeller
It just so happens that there's some.
Host (Marc Andreessen or Chris Dixon)
Big news just as we're having you. Nvidia announcing $5 billion investment in intel and them teaming up to jointly develop custom data centers and PC products.
Dylan Patel
What do you think about the collaboration? I think it's hilarious that Nvidia could invest. It gets announced and their Investment's already up 30%. $5 billion investment, $2 billion profit already. Right. I think it's fun because they need their customers to really have big buy in. So when their potential customers buy in and commit to certain types of products, it makes a lot of sense. Right. And it's kind of funny in a way because in the past there was this whole like thing around how intel was sued for being anti competitive with their chipsets and Nvidia actually got like a settlement from Intel. Right. Way back when when like the graphics were separate from the GPU and the graphics were really put on the chipset which had like all this other IO, like USB and all this stuff. So, so it's kind of a, a funny like turn of events that now intel is going to make like a chiplet and package it alongside a chiplet from, from Nvidia and then that's like a PC product. Right. So you know, it's Kind of poetic that everything's gone full circle and intel sort of crawling to Nvidia, but actually it might just be the best, like device. Right. I don't want an ARM laptop because it can't do a lot of things. And so an x86 laptop with Nvidia graphics fully integrated would be probably the best product in the market.
Guido Appenzeller
Sir, are you optimistic? How do you think this will go?
Dylan Patel
I mean, sure, I mean, I hope, I hope, right. I'm, I'm a perpetual optimist on intel because. Have to be. I was thinking that the structure of the deal that at least like a lot of the government folks and intel were sort of trying to go for was people get, you know, big customers and the biggest suppliers directly give capital to Intel. But this is sort of the other way around where they're buying some of the stock having some ownership, but they're not really like diluting the other shareholders. And then the other shareholders will get diluted, slash, everyone will get diluted when intel finally does raise the capital from the capital markets. But because they've announced these deals and they're pretty small, right? 5 billion Nvidia, 2 billion SoftBank. US government was 10. You know, these are still relatively small.
Guido Appenzeller
Pretty small, yeah.
Dylan Patel
Yeah. On the nature of things, right. I mean, like, you know, last time I think I said intel needs like $50 billion right now. Now when they go to the capital markets, it's, it's better. And, and, and, and hopefully they get another, you know, couple of these announcements. Maybe, you know, there's, there's all sorts of speculation that Trump is involved in, you know, sort of getting these companies to invest. Nvidia and now, now, you know, the government as well, of course. And now, you know, is Apple going to come invest? Right. And, and also do something with intel or who else will come in and that'll really boost investor confidence and they can dilute slash, go get debt like.
Sarah Wang
A Warren Buffett coming into a stock. The Jensen is like the Buffett effect for the semiconductor world. Guido, you were the CTO of the Intel Data center and aibu.
Dylan Patel
Yep.
Sarah Wang
What are your thoughts?
Guido Appenzeller
I think it's really good for customers and consumers in the short term. Right. Having both intel and like specifically the laptop market. Right. Having the two collaborate is amazing. I wonder what's going to happen with any of the internal graphics or AI products at Intel. They might just push a reset and give up on that for now. They currently don't have anything competitive. There was the Gaudi effort. That's more or Less done there was the internal graphics chips which never competed really at the high end. So from that perspective it makes a lot of sense. Right. For both sides. Look, I think for intel they needed a breath of fresh air. They were sort of desperate. So I think it's a very good thing. I think AMD is fucked. I mean there you just, if your two arch nemesis suddenly team up, it's the worst possible news you can have. Right. They were already struggling. Right. Their, their cards are good, their software stack is not. Right. They were getting very limited traction. Right. They now have a bigger problem that side. I think ARM is a little bit screwed as well. Right. Because their, their biggest selling point was sort of like look, we can partner with everybody that doesn't want to partner with intel. And that's what they in a sense the number one, you know, like Nvidia is probably the most dangerous of the future CPU competitors. Right. And so they now suddenly have access to intel technologies and might get in that direction. It remixes the card. Right. I did not see this coming. I think it's amazing development.
Sarah Wang
Yeah, it'll be very interesting to see this play out to Eric's point. PAC News week. The other thing that we wanted to pick your brain on since we have you here, Dylan, is the other news. Dropping on Huawei, unveiling their kind of AI roadmap and you know, obviously they're hyping up the Capab. I think you guys have been sort of ahead of the curve of trying to gauge hey what, what can the 950 super cluster actually do. But would love your thoughts on everything that's going on from the China front. Right. And this is kind of coupled with Deep Sea saying their next models are going to be on domestically produced Chinese chips. The Chinese government kind of banning companies from buying the produce specifically for China Nvidia chips. So there's just sort of a lot of dominoes falling right now and the semi market in China. But would love your take overall and mean drill into some detail.
Dylan Patel
Yeah, I think when you sort of zoom out to even like you know, let's, let's, let's walk from 2020 because I think it's really important to recognize how cracked Huawei is or even just historically like they've always been really good. Sure. Initially they stole like Cisco source code and firmware and all this stuff. But then they rapidly pass them up as well as every other telecom company. In 2020 they released an Ascend chip and submit it to impartial public benchmarks and they were the first to bring 7 nanometer AI chips to market. They were the first to have that. Right now you could still say Nvidia was ahead, but the gap was like, like nothing, right? And this is when they could access the full foreign supply chain. This was when they just passed Apple to be TSMC's largest customer. They were clearly ahead of everyone on a manufacturing supply chain sort of design standpoint in a total basis. Right now, of course, Nvidia still had higher market share, but it was so nascent then they could have really taken over the market. Huawei got banned by the Trump 1 administration from accessing. And then it went into effect in 2020, right, the full ban. And so they were only able to make a small volume of these chips, but they had trained significant models on these chips that they made then. And then over the next couple years, right? Nvidia continued to accelerate Huawei because they were banned from tsmc, had to go and try and figure out how to manufacture at smic, the domestic tsmc. And then they were also in parallel trying to go through shell companies to manufacture at TSMC and acquire memory from Korea and so on and so forth. So by the end of 24, they had, this had gotten in full swing. And it was caught, right? It was caught and they finally shut it down. But they were able to acquire 3 million chips, 2.9 million chips from TSMC through these other entities, right? Roughly $500 million worth of orders, which, which ends up being a billion dollar fine that the US government gave tsmc, if I recall correctly, or at least there was a Reuters article about it. Actually they actually issued it, which is important and interesting to gauge because the number of ascends floating out there is not, has not consumed this entire capacity yet. Right? So now we get to 2025, right? The H20 got banned in the beginning of the year. Nvidia had to write off, you know, huge amounts of money. Our, our revenue estimate for Nvidia in China for just H20 was north of 20 billion because that's what they were booking in capacity, slash, had to write off. And then it got banned. They cut the supply chain. Like they just said, no, we're not doing this anymore. They had their inventory gets reapproved, they resell the inventory, but now they're like, do we even restart production? Is Nvidia's question. And now you have China saying, hey, like we don't need Nvidia, we have domestic alternatives, right? Whether it be Huawei or Cameron, these companies have, you know, capacity. But most of this capacity is still foreign produced. Right. Whether it be wafers from tsmc, memory from Korea. Right. Samsung and sk, Hynix. So the question is sort of like how much can they do domestically? And there's sort of two fronts there. Right. There's the logic ie replacing TSMC and there's the memory IE replacing Hynix, Samsung, Micron, and on the logic side they're behind, but they're really ramping there. And I think they can sort of get, get to the production capacity estimates needed. And the US is still allowing them to import all the equipment necessary. Pretty much the bands are really for beyond the current generation of technology. Beyond 7 nanometer, the bands are really for 5 nanometer and below. Even though the government says they're for 14 nanometer, the actual equipment that's banned is only for below 7 nanometer. And so they'll be able to make a lot of 7 nanometer AI chips and maybe even get to 5 with, you know, using existing equipment for 5 nanometer rather than using, rather than like taking the new techniques. And so like there's the logic side and then there's the memory side and the, the aspect of Huawei's announcement that was surprising was that they're doing custom memory, right? Yeah, that's, that's the part that is sort of like, hey, this is really exciting. They announced, you know, two different types of chips for next year. One that's focused on recommendation systems and pre fill and then one that's focused on decode.
Guido Appenzeller
There's a trend these days.
Dylan Patel
Yeah. So in Nvidia, the same thing, they just announced a pre fill specific chip recently. There's numerous AI hardware startups that are really focusing on pre fill versus decode. And so this sort of split of inference up to two workloads. You know, Huawei is doing the same thing for their next year chip. And what's interesting is the decode one has, you know, custom hbm. What does that mean? What is the manufacturing supply chain? Because that's the, that's the one that's tricky. Right. How much can they manufacture of that? Custom hbm And Nvidia and others are also adopting custom HBM only starting next year. Right. So it's not like, you know, yes, the manufacturing capacity is not there. The maybe, maybe it consumes, it is going to consume a bit more power, it's going to be slightly lower bandwidth. But the fact that they're able to do, you know, some of the same things that Nvidia plans to do, AMD plans to do in their memory is, is, you know, evidence that they're catching up. But then, you know, the main question that remains is production capacity. So as far as like, hey, Nvidia is banned in China, right? Like, they're saying, don't buy Nvidia chips. I think for a period of time, that's fine because. Fine for China, right? From a perspective of, hey, I'm China, that's fine because you have all this capacity that you, you know, shipped in in 2024. They haven't turned into AI chips. And now you're turning them into AI chips, you're running all that stockpile down. What about the transition from running that stockpile down to ramping your new stuff? Right? And that, that, that transition is the one that's really tricky. China's either shooting itself in the foot by not purchasing Nvidia chips during that time period, or China is able to ramp. I think they'll be able to ramp. I think it'll take a little bit longer and there will be like a sort of a gap in between where China probably backtracks and says it's fine like, like bytedance and is like begging for Nvidia chips, right? Like, they, they don't want to use, they use some camera con, they use some Huawei, but they really want to use Nvidia because it's way better. They don't care about like, the domestic supply chain. They want to make the best models. They want to deploy their AI as efficiently as possible. And so this is like, you know, the, the government can mandate them to like, not do it, right? So, so, so it's not that Nvidia is not competitive, it's that the government's sort of trying to instigate it. And then like, I guess the, the, the last sort of thing is like, you know, there's always the argument of like, hey, if, if, if banning Nvidia chips to China is so good for China, why didn't China do it for itself and they're finally doing it for themselves? So again, like, it'll be interesting to see smuggling is still happening, right? Re exportation of chips from, you know, other countries to China, that is still happening at some volume, low volume, lower, lower, medium volume, right? But then, you know, the direct shipments of Nvidia chips that are legally allowed to China are not necessarily happening today, but may, may have to restart at some point because China won't have the production capacity to, you know, they would just have so many fewer AI chips being deployed domestically versus the US and at some Point you kind of have to pick like, am I, am I all about the internal supply chain or am I all about chasing, you know, super powerful AI?
Sarah Wang
Yeah.
Guido Appenzeller
So is there is an angle here, but a negotiation angle as well? Because currently there's still discussions ongoing what exactly are the boundaries, what can be exported to China. So these are sort of well timed announcements if you want to make a point that, you know, us should allow more exports. Do you think it's a factor or not? Yes.
Dylan Patel
So, you know, in the report we did a few weeks ago about the production, production capacity of Huawei and the supply chain, there was a bit in there that we wrote about how, you know, honestly, like, if you are China and you want Nvidia, you do want Nvidia chips. Actually, how do you play this? Right? And it's by, it's by hyping up your domestic supply chain and it's by, it's like, it's like, yes, we can do everything. It's Huawei announced the most crazy shit possible. Announced seven years of fucking or three years of roadmaps that are like. So you still read your report, basically? I think they do. I mean, they were already bid and then like, say we're banning Nvidia, right? Like, and then it's like, then the government official is going to think alongside sort of like lobbying from domestic players, like, of course we want to ship them better AI chips. Like, we're losing this market. We can't lose this market. And it's sort of like it is 10,000 IQ, right? And we're here playing checkers while they're playing chess.
Sarah Wang
Well, so I guess negotiating chip aside, in that report you talked about HBM or high bandwidth memory being a bottleneck to Huawei. To your point, on one of the surprising aspects of the announcement, do you, do you think it's credible that it's no longer a bottleneck based on what they're saying? Or are they. Is it just hype?
Dylan Patel
I think, I think production capacity wise, it is still absolutely a bottleneck. They. Certain types of equipment required for making HBM need to be imported. They're working on domestic solutions, but as far as we know, they have not imported enough equipment for this. Although if you look at Chinese import data for different types of equipment, right. There's, there's sort of like fabs spend, you know, roughly, it depends on the process technology, but fabs spend roughly different amounts of money on lithography, etch deposition, metrology. Right. Like these different steps. And historically, lithography has hovered around you know, 17, 18%. With EUV it grew to 25%. Right? But China, because they, they wanted, they, they sort of like wanted to stockpile lithography and they were worried about the coming ban. They were importing lithography at a much higher rate than that, right? Like 30, 40% of their equipment imports were lithography and they were just stockpiling lithography equipment. This is sort of like reversed now in that like hey, if I want to. And so if you look at the monthly import export data both into provinces in China but also out of countries, you can see that etch specifically is skyrocketing. And the main thing about stacking HBM is that when you have each wafer, you have to etch create this thing called through silicon via so it can connect from the top to bottom and then you stack them on top of each other, right? 12 high, 16 high for HBM, that's how you make super high bandwidth memory. And their imports for etch is skyrocketing now. So it's like they don't have the production capacity yet. How fast can they ramp it as a function of how much equipment can they get? A and B, like the yields, right? Improving yields is really hard on manufacturing. Intel and Samsung are really good and TSMC is just amazing. Not, not that those companies suck, like I think is a better way to put it. And, and so, you know, it's those two things I think. Yield. They haven't even started production of high speed of, of HBM3, right. They, they've only done some sampling of HBM2. HBM3 came out like a few years ago. So there's still quite a bit of ways on like going up the learning curve. Obviously I expect them to catch up faster than it took, you know, the technology to be developed because it exists right in the world. We know how to do it. It's just a matter of actually doing it versus inventing it. And then the other one is sort of the production capacity. You know, a couple months of import export data is not enough to you know, set up for you know, years worth of supply chain buildup, right? Which is what we have today in, in Korea for the Korean companies. Now Hynix is also investing in the US in Illinois and then Microns primarily in Japan. The American memory companies primarily in Japan and Taiwan, but they're also expanding in Singapore and the US now. Like there's so much capital that's been invested, it would take some time for China to build up that production capacity to actually Match the West. And when I say the West, I mean East Asia in production, non China, East Asia in production capacity. So it'll take some time to get there. And I don't think, I think it's like, hey, we can design this. It's always a question of can we manufacture? And then, and then the thing like that Jensen would say is like you're betting on China not being able to manufacture. Like.
Sarah Wang
Right.
Dylan Patel
You know, it's a, it's a matter of when, not if. And that's the whole calculus that like I think the US government has to be aware of when they're like, hey, what level of AI chips do we sell? Do we sell everything? Probably not, because AI is far more powerful and a lot. The end market of AI is going to be way larger than the end market of semiconductors and equipment. Do we sell, you know, what level do we sell at? Well, how much can China make at each specific, you know, sort of performance tier. And then, you know, analyze that and what's the volume and then figure out like what is okay. Which is like maybe a little bit above or around the same level. Yeah.
Guido Appenzeller
So.
Sarah Wang
So if you, to your point on like playing chess versus checkers, if you're Jensen, what would your next move be given the situation at hand?
Dylan Patel
It's both like partially true that he's afraid of Huawei more than he is like an amd, right.
Sarah Wang
He called them formidable.
Dylan Patel
Yeah, well, like, I mean like every other. Like Huawei's beat Apple, right. They, they passed Apple up in TSMC orders. They passed Apple up in phone market share not in the US but like in many parts of the world before the bands came down. And then even now they're growing back again in market share without like western supply chains. You know, they've done this to numerous other industries. I would say Apple's like a formidable competitor. Right. Like they've, they've beaten a lot of industries and so it's reasonable that he's afraid of them. It's, it's sort of, you know, and he's not afraid of A and B. So like I think like the best thing is like try and so as much like Huawei, what Huawei announced is reality rather than like their hope target. And so away all doubt on manufacturing capacity, which I think is not fair. Right. Like, I think manufacturing capacity is a real bottleneck for them. And then the yield learnings, real bottleneck, like temporary maybe we'll see how long and we'll see how fast the rest of the, you know, the Nvidia technology advances past what Huawei is capable of, right. And how fast Huawei is able to close the gap. But I think his main sort of pitch would be Huawei is real, they're a formidable competitor. They're going to take over not just the Chinese market but also foreign markets, right. Whether it be the Middle east or Southeast Asia or South Asia or Europe or Latam everywhere besides America and the sort of. There's a. I think Noah Smith has this analogy, right? This whole idea is that you should Galapagos China, right. Make them have their own domestic industry that is so different from the rest of the world, right? Kind of what happened with Japan in the 70s and 80s and 90s. Their PCs were so specific and hyper optimized to the Japanese market with like you know the weird, like I don't know if you've seen the weird scroll wheel on the, on these Japanese pieces. Like you literally like it's like you go like this and it scrolls, right? And it's like. And then the touchpad is a circle and then that's around it. It's like things like that are so weird. Totally. And the rest of the world doesn't care. But Japan market likes it, right? And his whole idea is like let's Galapagos them that is keep their technology within China and then that's like dead weight loss and they never expand outside versus that we serve the whole world. But the whole risk is that the opposite can also happen, right? Our technology is hyper optimized to running you know language models at this scale and rl and you keep, you know, you keep like hardware, software, co design can take you down a path of the tree that like is a dead end. And then China like because they're not allowed to access this tree they're like oh okay. Then they end up in the like optimal spot, right? We, we hit a local minima. They had a local maxima. They had a local, a global maxima, right. Like that's sort of like technological. Galapagosing is sort of what Noah Smith's analogy is. I like it a lot. I don't know if it's accurate but it's an interesting one. Yeah, I love that.
Sarah Wang
Well actually maybe just taking a step back from current events even though there's so much to talk about right now. Last time you appeared with us, Nvidia came up obviously and you talked about a couple of the potential paths forward for Nvidia.
Host (Marc Andreessen or Chris Dixon)
Give us maybe the bull case, the bear case.
Sarah Wang
Fair enough.
Dylan Patel
There's a lot embedded in their numbers now. But what's interesting is consensus for the banks is like for across the hyperscalers. So Microsoft, Core, Reeve, Amazon, Google and Oracle, right. Meta. Right. So it's the six hyperscalers, right, who I would consider hyperscalers. The consensus for the banks is $360 billion of spend next year across all of them. And my number is closer to like it's like 4, 5500 and that's, that's based on like you know, all the research we do on like data centers and like tracking each individual data center in the supply chains. Right?
Guido Appenzeller
So, so, so this is just Nvidia.
Dylan Patel
Spend, this is, this is capex for the hyperscalers, right? That, that capex gets split up across different companies but the vast, vast majority still goes to Nvidia. Right? And Nvidia is in a position not where they can't take share, right? They grow with the market, defend share. And so the question is like how fast is the growth rate of capex for hyperscalers and other users? Right. And the reason I included Oracle and Core Rev as hyperscalers, even though they're traditionally not called hyperscalers, is because they are OpenAI's hyperscaler. So when you look and you look at the Oracle announcement, right, like first of all, the Oracle announcement, I don't understand why people don't think this is crazier. They did the most unprecedented thing in the history of like stocks and public and companies ever. They gave a four year guidance and it made Larry the richest man in the world. You know, like all these things. Anyways, you know, the question is like how fast does revenue grow, right? Do you think Oracle and Open, do you think OpenAI which signed a $300 billion plus deal with Oracle will actually be able to pay $300 billion right across raising capital and revenue. And I think most, and, and, and, and it gets to a rate of like over 80 billion, over $90 billion a year in just a handful of years, right? So it's like h, do you believe the market will grow that fast? It's, it's very possible, yes. And it's very possible for like you know, OpenAI. What is their revenue going to be exiting next, next year? Some people think 35 billion, some people think 40 billion, some people think 45 billion, you know, by the end of the year, next year, this year they hit 20, right. Ar, you know, so, so if that growth rate is maintained, then all of that cost goes to compute, plus all the capital they continue to raise, right? And again, there are financials that they sort of like gave to investors for their last round was like, hey, we're going to bend, we're going to burn like $15 billion next year. It's probably more likely going to be like 20, but like, you know, and you stack this on and they're not turning a cash flow, they're not going to be profitable until 2029. So you sort of have like they're going to continue to bet burn 15, 20, $25 billion of cash each year plus revenue growth. That's their compute spend. And you do this for anthropic, you do this for OpenAI, you do this for all the labs. It's very possible that the pie does get to, you know, you know, more than 500, you know, not 360 billion next year, 500 billion next year. And for total capex. And the pie continues to grow for hyperscalers. Nvidia says actually it's going to be multiple trillions a year on AI infrastructure and he's going to capture a huge portion of it. That's his bull case, right? That's the bull cases is AI is actually so transformative and the world just gets covered in data centers and, and the majority of your interactions are with AI, whether it's like, you know, business productivity and telling an agent to do some code or you're just talking to your girlfriend Annie, right. Like it doesn't matter. You know, all of this is running on Nvidia for the most part. The bear case is, you know, even if it does grow a lot.
Guido Appenzeller
Yeah, save the bookcase for a second. I think fundamentally the value creation, I think personally is there, right? I mean, trillions of dollars of value with AI, I can totally see this happen. So I assume it's true. Where will Nvidia top out?
Dylan Patel
I guess how much do you believe in takeoffs? Right?
Guido Appenzeller
Yes.
Dylan Patel
So like if there, if there is like a takeoff scenario, right, where like powerful AI builds more powerful AI builds more powerful AI or that creates more and more each level of intelligence enables more for the economy. Right. How many monkeys can you employ in your business versus how many humans? Right. Or how many dogs? They're sort of like what is the value creation of a human versus a dog? Sort of like the same with AI in this case, the value creation could be hundreds of trillions. If not, you know, the next year after that.
Guido Appenzeller
Do you even need this? I mean, if we take every white collar worker, make them twice as productive with AI, that's in the hundreds of trillions isn't it?
Dylan Patel
Yeah, but like what is the. What is twice? You know, like, I mean, like if you talk to people at the labs, right? Like twice as productive. What does that even mean? It's replace them, right? And it's be 10 times better than death. Like, I mean, like, I don't, I don't know how soon that sort of.
Guido Appenzeller
If it's sort of white color work is essentially useless without a constant stream of LLM tokens, right? That make them, that make them productive, right. At that point you basically can tax every single knowledge worker in the world, right? Which is most workers in the world long term.
Dylan Patel
Yeah.
Guido Appenzeller
I don't know. What's your guess? Give us a number. What's the cap?
Dylan Patel
I mean, why aren't we making a matrioska brain? I don't know. I mean, at some point the machine says humans don't need to live and I need even more compute.
Guido Appenzeller
One step before that.
Dylan Patel
May. Are we colonizing Mars yet?
Guido Appenzeller
Tbd.
Dylan Patel
I don't know, man. I find it completely impossible to predict anything beyond five years given how much stuff is changing. Five is a large number. I'll leave it to economists, honestly. Supply chain stuff is three, four years out and that's it. And then fifth year is sort of like yellow, right? So I just try and ground myself with the supply chain stuff, right? It's like, you know, supply chain and then like, what is the adoption of AI and what's the value creation? What's the usage like? And you can see that in like a, a short horizon beyond that, Like, I don't know, like, are we all going to be connected to computers like BCIs and stuff? Like, I don't know, dude, are humanoid robots, Are they going to be, you know, I mean, you saw Elon's thing, right? Like he's like, yeah, humanoid robots are why Tesla's worth more than 10 trillion. So hey, great. What is all that being TRINED ON Great. Nvidia. Okay, awesome. So that's worth also 10 trillion, right? Like, I don't, I don't know. Like it's too, it' too out there for me. I don't like the out there discussions.
Sarah Wang
Very fair.
Dylan Patel
Read some sci fi books.
Sarah Wang
So just pulling out the thread where you talked about, I mean, this is kind of a throwaway comment, but how market share can't really grow just because it's such a dominant market share. And we talked about, or you guys talked about the moat of Nvidia last time and obviously this moat is tied to maintaining that very high market Share that they currently have. And I love this sort of historic journey you took us through with Huawei. Just earlier, can you kind of walk through what Nvidia did throughout history to build their moat?
Dylan Patel
It's super awesome because, you know, they failed multiple times in the beginning and they bet the whole company multiple times, right? Like, Jensen is just crazy enough to bet the whole company, right? Like, whether it was like certain chips ordering volume before he knew it even worked and it was like all the money he had left or like ordering volumes for projects he had not won yet. Like, I heard a rumor that. Or not a rumor, but like a story from someone who's like a graybeard in the industry and I think would know was like, yeah, no, no, no. Like Nvidia ordered the volume for the Xbox before Microsoft gave them the order. They're just like, they're just like, fuck it, yolo. Yeah, right. I don't know, like, I don't know how real true this. I'm sure there's more nuance there, like, you know, verbal indication or whatever. But like the order was placed before he got the order, right? Like is what he said. You know, there's. There's cases like with the crypto bubbles, right? Like there was a couple of them. But like, Nvidia did their damn best to convince everyone, the supply chain, that it wasn't crypto and that it was gaming, that it was durable, real demand. It was gaming and data center and professional visualization. And therefore you guys should ramp your production. And they all ramped production, spent all this capex on increasing production and, and building out new lines for them. And they pay, they pay per item and then they bought them and sold them at and made shitloads of money. And then, and then when it all fell apart, they just had to write down a quarter's worth of inventory, whatever. Yeah. Everyone else was like, well, crap, I have all these empty production lines, right? And so it's like, you know, but, but like, what did AMD do then, right? Their chips were actually better for crypto mining, right? On a, on a, you know, amount of silicon cost versus how much you hash. But like, they just didn't. AMD was like, we're going to not really raise production, right? Like as a reasonable, you know, thing, right? It wasn't a sort of like strike while the iron's hot. And so like, you know, the same has happened with Nvidia, right? They've in recent, in recent times, like sort of they've ordered capacity that no one believes, right? Multiple times they see the end demand obviously, but in many cases they're just like their number for like Microsoft was higher than Microsoft's internal planning. Right. And then Microsoft's internal planning went up. But like their number for Microsoft was way higher. And it's like we just don't think Microsoft's going to need this much. Even though they tell us this. It's like who the heck is like no, no, no customer. You're going to buy more like and orders, right. And then when the orders come through the supply chain, it's like I have to put pay ncnr. Right. Non cancel, non returnable. Like you know this is, you know this is. I asked a question in Taiwan once. There was like a. It was, it was Colette, which is the CFO and Jensen CEO. They were, they were both there and it was, it was a room full of like mostly finance bros. And they're asking stupid finance questions like three days before earnings. So obviously they just could not answer anything because it's like you know, SEC regulations. But then my question to them was like look Jensen, you're like so vibes driven and very gut feel and very visionary. And then Colette's cfo, she's amazing in her own right. But those personalities clash. How do you work together? He's like I hate spreadsheets. I don't look at him, I just know is his response. And it's like of course the best innovators in the world have really good gut instinct. The gut instinct to order with, you know, with non cancelable when you don't know. And they've had to write down over their history multiple times. Right. Many, many billions of dollars in accumulative orders. Right. Accumulate in total orders, whether it be you know, the age 20, which is more regulatory. But like other cases they've ordered and had to cancel.
Guido Appenzeller
Is that many billions?
Dylan Patel
It's many billions.
Guido Appenzeller
Peanuts.
Dylan Patel
Well, depends, right. The crypto write down was like multiple billion when their stock was like less than 100 billion. Right. Like it's like a, you know, it's.
Guido Appenzeller
Peanuts compared to the upside. Right.
Dylan Patel
I think, I think it's crazy. I think everything he did was right.
Guido Appenzeller
Yeah.
Dylan Patel
And I think everything AMD did was wrong. Like you know, in that, in that scenario. But like it is crazy to. Especially in a cyclical industry like semiconductors where companies go bankrupt all the time, which is why we have all this consolidation is every down cycle companies go bankrupt.
Guido Appenzeller
I mean if you look from a risk return perspective, right. These bets were totally worth taking.
Dylan Patel
Yes.
Guido Appenzeller
If you look at it from I'm a CEO, I want to have predictable quarters for Wall street, it's a very different story. And I think that's sort of where part of detention is from now.
Dylan Patel
Yeah. I don't know if you've seen these Lee Kuan Yew edits where they're like him saying some fiery speech and then it's some cool music at the end and it's showing different pictures of them. And so we made one of Jensen recently and put it on social media, right? Like Instagram, TikTok, XHS, Redbook. Right. Twitter, of course. I like all the different social media and I really liked it because he's like, he's like, you know, the goal of like playing is to win and, and the goal, or sorry, and the reason you win is so you can play again. Right. And he compared it to pinball where like, actually you just play all day and you keep getting more rounds. And it's like his whole thing is like, I want to win so I can play the next game. And it's only about the next generation, right? It's only about now, next generation. It's not about 15 years from now because it's a whole new playing field every time or five years from now. I think you're right. The risk reward is correct.
Guido Appenzeller
But just few people take these kind of risks.
Dylan Patel
It's the only semiconductor company that's worth, I think even north of $10 billion. That was founded as late as it was like Mediatek was in the, in the early 90s and then Nvidia and everyone else is like from the 70s mostly. Yeah, big ones. Yeah, yeah, yeah.
Sarah Wang
I think you raised this great point on these bet the Bet the Farm. And he's actually been wrong a couple times to your point.
Dylan Patel
Mobile, Right. Like, what the hell happened with mobile?
Sarah Wang
Exactly. And he still takes them. And I think Mark actually had this great conversation with Eric where he talked about being founder run, where you have this memory of the risks you took to get to where you are today. Right. And so in a lot of cases, if you're a CEO brought on later on, you're sort of like, okay, continue to steer the ship as is. But in this case, he, he remembers all the times they, they almost went belly up. And he's like, I've got to bet. Keep making bets like that. How do you think he's changed over? I mean, he's been one of the longest running CEOs over 30. He's kind of right up there with Larry Ellison. Now how do you think he's changed over the last 30 years or so?
Dylan Patel
I, I, I, I mean obviously like I'm 29. I don't forget what he was like. I've watched a lot of old interviews. I won't say he wasn't.
Sarah Wang
Longer than you've been alive.
Dylan Patel
Yeah, exactly. Like Nvidia was founded before I was born. I'm 96, right. Like, you know.
Sarah Wang
Yeah. Anything over the last few, couple of years? No, no, I tell you that I.
Dylan Patel
Think even like watching old interviews, right? Like I watched a lot of old interviews, a lot of old like presentations he's given. One thing is that he's just like sauced up and dripped up like way like the charisma he's gotten has only gotten stronger, Right? Yeah, which is, which is an interesting point. I don't know if it's quite relevant, but like the man like has learned to be a rock star more. Even though he was always charismatic, it was like he's a complete rock star now. And he was a rock star, you know, a decade ago too. It's just people maybe didn't recognize it. I think, I think the first live presentation that I watched, it was extreme. Was like, it was what's the, what's the CES like 2014 or 2015 or whatever. He's, he's, he's, it's, it's Consumer Electronics Show. I'm like moderating like gaming, gaming hardware subreddits, right? Like at the time I'm a teenager and like the dude is like talking only about AI. He's telling, he's telling like all these gamers about Alexnet and self driving cars, right? It's like know your audience first of all. But also like, like it's not, has nothing to do with consumer electronics at gaming. You know, at the time I was also like, I was half like, holy crap, this is amazing. But also half like, I want you to announce new gaming gpu, right? Like, you know, but I know, like on the forums, on the forums quickly everyone was like, you know, screw this, you know, I want to hear about the gaming GPUs, Nvidia's price gouging. Like, you know, of course Nvidia's always had the, like, we priced the value and like plus a little bit, right? Because we were just smart enough to know, you know, I'm guessing Jensen just has the gut feel of how to price things, right? He'll change the price. Like at least on gaming launches he'll change the price up until like Right before the presentation. So, like, it really is like a gut feel thing probably. And anyway, so, so he, he had that charisma to know what was right. But I think people, a lot of people were like, oh, no, whatever. Jensen's wrong. He doesn't know what he's talking about. But now, like, he, he talks. People are like, oh, very, very, you know, so it might just be that he's been right enough. Yeah.
Sarah Wang
There's a post on X recently that said he had moved up into God mode with a select group of CEOs, but that this was like, it's exactly.
Dylan Patel
Who's got. Who's, who's the other gods?
Sarah Wang
It was Zuck. Who are the other God?
Guido Appenzeller
Elon.
Sarah Wang
Elon. Elon Zuck and Jensen.
Dylan Patel
Nice, nice. Okay.
Sarah Wang
Good crew to be in.
Dylan Patel
So, so we pray to Silicon Valley, the cult now, is it sickly?
Sarah Wang
Just on one more, one last thing on people. You mentioned Colette, his cfo. And you know, there's, there's sort of a famously loyal crew at Nvidia. Even though all of the OGs could retire at this point. Is there anyone akin to a Gwynne Shotwell at Space X or previously a Tim Cook to Steve Jobs at Apple that is at Nvidia today.
Dylan Patel
I mean, he had two co founders, right. Like that's, you know, it's not overlooked that one of them, one of them's like, you know, not involved and hasn't been for a long time, but the other one was involved up until just a, you know, few years ago. Right. So it's not just Jensen running the show, right?
Sarah Wang
Totally.
Dylan Patel
Although he was running the show, there's quite a few people on the hardware side. I've always. There's someone at Nvidia that's like mythical to me. Like when you talk to the engineering teams, he leads a lot of the engineering teams. He is a private person, so I don't want to say his name actually. But you know, he's, he's, he's like, he's like effectively like Chief Engineering Officer is like his role and people within his org will know who he is. And I think, I think there are people like that, but, you know, they're, they're, they're. He's intensely loyal and there's, there's a number of these types of people. There's another fella who's like, you know, like there's all these like, innovative ideas at Nvidia and he's the guy who literally is like, we need to get this Silicon out now. We're cutting features. And that's like, that's like what he's famously known for. And all the technologists at Nvidia hate him. This is like a second guy. This is a second guy. Also intensely loyal to Nvidia has been around for a long time. But it's like, you know, sort of like when you have such a visionary company and forward, you know, one, one problem is that you get lost in the sauce, right? Oh, I want to make this. It's got to be perfect. Amazing. It's like you got to have that sort of like. And these people are like, obviously they're close to Jensen for a reason, because Jensen also believes these things, right? Have the visionary future looking. But also, screw it, cut it. We'll put it in the next one. Ship, right? Ship now. Ship faster in a space like silicon, which is really hard to do. So. And sort of like the thing about Nvidia that's always been super impressive and it's from the beginning days, right? He's talked about this before. Is their first chip, their first successful chip. They were going to run out of money and he had to go get money from other people to even finish the development. And even then he just had enough money because he'd already had a failed chip before. This was. The chip came back and it had to work, otherwise it would not. And so they were like. Because they could only pay for. It's called a mask set, right? Basically you put these like I'll call them stencils into the lithography tool. And then it like says where the patterns are. And you, you know, you put the stencil in, you deposit stuff, you etch stuff, you deposit materials on the way for, etch it away. And you put the stencil in and like, you, you like tell it where to put stuff, right? And then the, the deposition and etch keeps happening in those spots and you stack dozens of layers on top of each other and then you make up a chip. These stencils are accustomed to each chip, right? And they cost today in the orders of tens and tens of billions of dollars. But even back then, it was still a lot of money. It wasn't that much then, of course, you know, it sort of he. They could only pay for one set. But the typical thing with semiconductor manufacturing is, you know, as good as you can simulate, as good as you can do all the verification, you'll send a design in and you have to change it. There's going to be something. It's. It's so Hard to simulate everything perfectly. And the thing about Nvidia is they tend to just get it right the first time. Even like, even great executing companies like AMD or Broadcom or whoever, they often have to ship, you know, they're denoted in like A and then a number or B and then a number. So it's like two different parts of the masks. So like Nvidia always ships a zero. Almost always. They sometimes ship A one. And a lot of times, even if they'll start production of the A is basically the transistor layer, then the numbers, like the wiring that connects all the transistors together. So Nvidia will start production of the A and ramp it really high and then just hold it right before you transition to the metal, just in case they do need to change the metal layers. And so the moment they're ready and they've confirmed that it works, they can just blast through a lot of production, whereas everyone else is like, oh, let's get the chip back. Okay, A zero doesn't work. We gotta make this tweak, make this tweak and get the chip back. It's called a stepping. Right.
Guido Appenzeller
We at intel, we were very jealous of Nvidia at that time. Right. They consistently delivered in the first one we did not.
Dylan Patel
So the data center CPU group, there was one product where, you know, I said a one, a, you know, a zero, a one. Or you go to B if it's. You have to change the transistor layer as well. So it's like B. Nvidia, sorry, intel got to like E2 once. E2, like that's like a 15 revision. This is, this is, this has a peak of AMD's. Like when they went skyrocketing on market share versus intel was when intel was at E2, right. Like 15 steppings, they closed as quarters of delay.
Guido Appenzeller
Right. I mean it's catastrophic for a go to market.
Dylan Patel
Yeah. Each time is a quarter of delay or something, right? Yeah. So it's absurd. So I think that's the other thing about Nvidia is like, you know, screw it, let's ship it. Let's. Let's get the volume asap, Let's do these things that. And so anyways, they have some of the best simulation, verification, et cetera, that lets them sort of go from design from idea to shipment as fast as possible, cutting out any unnecessary features that could delay it. Making sure they don't have to do revisions so that they can get, they can respond to the market asap. There's a story about how Volta, which was the first Nvidia chip with tensor cores, they saw all the AI stuff on the prior generation P100 Pascal and they decided we should go all in on AI and they added the tensor cores to Volta like only a handful of months before they sent it to the fab. Like they said, screw it, let's change it. And this crazy. And it's like if they hadn't done that, maybe someone else would have taken the AI chip market, right? So there's all these times where they just. And those are major changes. But there's often like minor things that you have to tweak right number formats or like some architectural detail. Nvidia is just so fast.
Guido Appenzeller
The other crazy thing is they have a software division that can't keep up with that, right. I mean if you come out with the chip right, and basically no stepping required, it's immediately in the market, then being ready with drivers and you know, all the infrastructure on top of that's just super impressive.
Sarah Wang
Yeah, I love that point because you think of Nvidia benefiting from tailwind after tailwind. But I think both of you are saying you gotta be. You have to move fast enough and execute well enough. Take advantage of those tailwinds. And if you think about, and by the way, I loved your CES story, I was just envisioning him more than 10 years ago talking about self driving cars. But you know, if you think about nailing the video game, tailwind, VR, bitcoin mining, obviously AI. Now, you know, one thing that, or one of the things that Jensen talks about today is robotics, AI factories. Maybe my last question on Nvidia, what do you, what do you think about the next 10 to 15 years? And I know calling beyond five is hard, but like what does Nvidia, Nvidia's business look like?
Dylan Patel
It's really a question of. And this is like, I think every time I've talked to, you know, some executives, Nvidia have asked this question because I really want to know and you know, they won't answer it obviously, but it's like, what are you going to do with your balance sheet? Like you were the most high cash flow company and like, like you have so much cash flow now. The hyperscalers are all taking their cash flow like way down, right? Because they're spending on GPUs. What, what is, what, what are you going to do with all this cash flow, right? Like, you know, even, even before this whole takeoff, he wasn't allowed to buy art, right? So, so what can he do with all this capital and all this cash? Right. Even this five billion dollar investment, intel is. There's regulatory scrutiny there, right? Like it's in the announcement. Like, yeah, this is subject to review, right. Like, you know, I imagine that'll get passed, but like he can't buy anything big. He's going to have hundreds of billions of dollars of cash on his balance sheet. What do you do? Is it. Is it start to build AI infrastructure and data centers? Maybe. But like why would you do that if you can just get other people to do it and just take the cash?
Guido Appenzeller
Well, he's investing those when investing peanuts, right.
Dylan Patel
You know, like he gave recently like a core weave backstop. Because today it's really hard to find a large number of GPUs for burst capacity. Right? Like, hey, I want to train a model for three months, right. I have my base capacity where I don't know my experiments, but I want to train a big model. Three months done.
Guido Appenzeller
We know from our portfolio.
Dylan Patel
Yeah, yeah. So like Nvidia sees this issue, they think it's a real problem with startups, it's why the labs have such an advantage. But what if I could, you know, right now, like, you know, most companies in the Valley spend what, 75% of their round on GPUs, right. Or at least. Yeah, we see. What if you could do 75% in three months on one model run, right. You know. Yeah. And really scale and have some sort of like competitive product and then you have the model, then you raise more capital. Right. Or start deploying. Right. What do you do with it? Is it, is it start buying a crapload of humanoid robots and to put. Deploying them. But like they don't really make good software. They don't make really that amazing software for them in terms of the models. Right. They make, you know, the layer below is great. Where they deploy their capital is like the question.
Guido Appenzeller
He has been investing up and down the supply chain a little bit though, right? Investing in the Neo clouds, investing in some of the mobile training companies.
Dylan Patel
Yeah. But again, it's small fries. Like he could have just done the entire anthropic round if he wanted to. Of course he didn't. Right. And then like really got them to use GPUs. Or like he could have done the entire, you know, OpenAI round. He could have done the entire like any XAI round. Do you think these are things he should be doing or what's. I mean, like.
Sarah Wang
Yeah, good question.
Dylan Patel
I don't know. Right. I think I Think, like, we'll quote.
Guido Appenzeller
You for the next round that we're raised. But anyways.
Dylan Patel
He could make venture a dead industry. No kidding. Take all of the best rounds.
Sarah Wang
But it's a lot of business. Yeah.
Dylan Patel
You know, you can do the seeds and then have Jensen mark you up. That's why. No, I don't think. I don't think I like it. I think picking winners is obviously really tough for him because he has customers all across this ecosystem. If he starts picking winners, then, like, his customers will even be even more anxious to leave and give even more effort to whether it's AMD or some startup or their internal efforts, et cetera, et cetera. Right. Buying TPUs, whatever it is. Like, you know, people will. He can't just like, invest in these. Like, you know, he can do a little bit. Right. A few hundred million in an open AI round is fine. Or a few hundred million in Xai round is fine. Core Weave. Right. Like, yeah, everyone's like throwing a fuss about it, but it's like he invested a couple hundred million plus, you know, early on, plus, you know, rented a cluster from them for internal development purposes instead of renting it from a hyperscaler, which is cheaper for Nvidia to do. Right. It's better for them to do it from them than the hyperscalers. It's like, did he really, like, is he really backstopping Core Weave that much? Right. Or any of the other customers or Neo Clouds. There's some investment, but it's more like this is a good cloud. We'll throw like 5 or 10% of the round. Right. It's not. He's taking 50% plus of the round.
Guido Appenzeller
Is he also reshaping his market? I mean, look, a couple of years ago, there were four big purchases of these cards. You just listed six. To what extent is that?
Dylan Patel
Nevius?
Guido Appenzeller
And there's a long list there, of course.
Sarah Wang
Yeah.
Guido Appenzeller
Is that a strategy?
Dylan Patel
It is. I think it absolutely is. But he didn't have to put much capital down to do this.
Guido Appenzeller
Just chip one earlier than the other. I don't know.
Dylan Patel
Yeah, that's. No, but it's like if you look at the grand amount of capital that he spent investing in the neoclouds, it's. It's a few billionaires, but he has.
Guido Appenzeller
A lot of other levers if he wants to.
Dylan Patel
Right, right. Allocations, as you mentioned. What's nice is historically you gave volume discounts to hyperscalers, but because he can use the argument of antitrust, he's like, everyone gets the same price as.
Guido Appenzeller
So fair.
Dylan Patel
It's very fair. It's very fair. So what should he do with the cat or what to guide his. I mean I think like, you know, like there is the argument he should invest in data centers and only the data center layer, not the, not the, not what goes in the data center so that more people build data centers. And then if the market demand continues to grow up, data centers and power are not the issue. Right. Invest in data centers and power. I've said that to them. They should invest in data centers and power, not in the cloud layer. Because the cloud layer is, is, is, is quite commoditized, but quite. It's, it's commoditize. Your compliment. Right. Is the whole phrase. And I won't say being a cloud is commoditized, but it's certainly like you have a lot of competitors who are decent now and you've, you've educated the commercial real estate and other, you know, infrastructure investment firms into going into AI infra as well. So like I don't think it's the cloud layer that you invest it. Right. Do you invest in data centers and energy? Yeah. Do you invest because that's the bottleneck for your growth really is, is A, well how much people want to spend and can spend and B the ability to actually put them in data centers and then like robotics and like I think there's like areas he could invest in but nothing requires $300 billion of capital. So what do you, what do you do with the capital? Like I really, I really don't know. And I like feel like Jensen has to have some idea. There's some visionary plan here because that's what shapes the company, right? Is, I mean they could sell, they could, they could, they could just continue to. You know, I mentioned $200 billion of free cash flow, $250 billion of free cash flow a year. What do they do with it? Like do they just buy back stock for forever? Like do they go Apple route? The reason why Apple hasn't done anything interesting in like, you know, nearly a decade is, is, you know, they've got, they've got a. Not visionary at the head Tim Cook's greatest supply chain and they're just plowing the money into buybacks. They're not really, you know, automotive, the self driving car thing failed. We'll see what happens with AR VR. You know, we'll see what happens with wearables. Right. But like meta and OpenAI might be even better than them. We'll see like in Others. Right. So. So what does he invest in? I have no clue. But nothing. What requires so much capital is the tough question. It actually gets a return because the easy thing is my cost of equity. I just buy back.
Guido Appenzeller
It doesn't completely change the company culture. I think that's another thing. There are probably areas you could invest it in, but you suddenly end up with the company doing two completely different things which are very difficult to keep on.
Dylan Patel
But they do 10 completely different things. Right. One way to look at is we build AI infrastructure. And in the guise of we build AI infrastructure. Robots, humanoids around the world are AI infrastructure or data centers. And energy is AI infrastructure. Right.
Guido Appenzeller
So the humanoids would totally work. Right. If you're suddenly pouring concrete and building power plants, it has completely different cost of completely different set of people getting much, much harder. Agree.
Dylan Patel
There's different ways to do it, like invest in the various companies or backstop, like the building of a power plant. Right. Like, you know, does no one want to build power plants because they're 30 year underwriting things? You know, there's all these different areas where could it use capital to, you know, allow something to happen. Right. Not necessarily owning it in some.
Guido Appenzeller
And look, and bear in mind one of the biggest problems we had was that our customer base sucked. Right. I mean, we were selling to most of the chips went to the large hyperscalers, you know, which they're way too concentrated and they build their own chips and so you can push down your prices. So honestly spending it on diversifying the cloud, you know.
Dylan Patel
Well, pop was in 2014, you guys should have just charged so much that your margins were 80%. What would the world have done? Nothing.
Guido Appenzeller
The margins were pretty good back then.
Dylan Patel
That wasn't the problem.
Guido Appenzeller
That was the primary problem.
Dylan Patel
They were 60, 65. They were 80 still. Yeah.
Guido Appenzeller
Oh boy.
Dylan Patel
Jensen Jetson.
Guido Appenzeller
PTSD is kicking in here.
Sarah Wang
Well, wait, I think Guido's comment is actually a really good segue into something else we wanted to talk to you about, which is the hyperscalers. And one of the reasons that I love reading Semianalysis is you guys make these out of consensus calls that you're often right about. And one of them recently is calling only often. You have a Jensen hit rate. It's very high.
Dylan Patel
But where's my billion dollar TV positive bet?
Sarah Wang
But the one that caught my eye was Amazon's AI resurgence. So I wanted to talk to you a little bit about that just because I think we found it pretty interesting being on the ground helping our portfolio companies pick who their Partners are. And so we have some micro data on this but you sort of walk through why they're behind.
Dylan Patel
Yeah. So in Q1 2023 I wrote an article called Amazon's Cloud Crisis and it was about all these neo clouds are going to commoditize Amazon. It was about how Amazon's entire infrastructure was really good for the last era of computing, right? What they do with their elastic fabric, ENA and EFA, right. Their NICs, what they and the whole protocol and everything behind them, what they do for custom CPUs et cetera, right. Like is really good for the last era of scale out computing and not this era of sort of scale up AI infra. And how neo clouds are going to commoditize them and how their silicon teams were focused on cost optimization. Whereas the name of the game today is max performance per cost. Right? But like that often means you just drive up performance like crazy. Even if cost doubles, you drive up performance more triples because then the cost per performance falls still. That's sort of the name of the game today with Nvidia's hardware and it ended up being really good call. Everyone was calling us out like no you're wrong. And this was like when Amazon was the best stock and Microsoft really hadn't started taking off yet and nor had all these other Oracle and so on and so forth. And since then Amazon has been the worst performing hyperscaler. And the call here is that they still have structural issues, right? They still use elastic fabric, although that's getting better. Still behind Nvidia's networking, still behind Broadcom, Arista like type networking Nix. They still use their internal AI chip is okay, but the main thing is that they're now waking up and being able to actually capture business, right? So the main call here is that since that report AWS has been decelerating revenue year on year, revenue has been falling consistently. And our big call is that it's actually going to start re accelerating, right? And that's because of anthropic. It's because of all the work we do on data centers, right? Tracking every single data center when that goes online and what's in there, the flow through on cost or if you know how much the chips cost, the networking cost, the power cost, you know how much, you know generally margins are for these things and you can sort of start estimating revenue. So when we build all that up, it's very clear to us that they trough on AWS revenue growth this quarter, right? This is the lowest AWS revenue growth will be on a year over year basis for at least the next year, right? And it's re accelerating to north of 20% again because of all these massive data centers they have online with trainium and GPUs. Right? Depends on which one, depends on which customer. The experience is not as good as say a core weave or whatever, but the name of the game IS capacity. Today CoreWeave can only deploy so much. They only can get so much data center capacity and they're really fast at building. But the company with the most data center capacity in the world that and still today, although they may get passed up in the next two years, is Amazon. Actually they will get passed up based on what we see is Amazon. But incrementally, Amazon still has the most spare data center capacity that is going to ramp into AI revenue over the next year.
Guido Appenzeller
Let me ask one question. Is that the right type of data center capacity, like for the high density AI buildouts today, you need massively more cooling. You need to have enough water close by, you need to have enough power close by. Is, is, is it in the right place or is it, is it the wrong type of thing?
Dylan Patel
So data center capacity in this sense, I mean all the way from power secured to substations built to transformers to. You can provide the power whips to the racks. Now obviously the data center capacity will differ, right? You know, historically actually Amazon's had the highest density data centers in the world, right? They went to like 40 kilowatt racks when everyone was still at 12. And if you've ever stepped inside of foot inside of most data centers, they're like pretty cool and dry. Ish. If you step inside of Amazon data center, they feel like a swamp. It feels like where I grew up, right? It's like humid and hot because they're optimizing every percentage. And so sort of like your point in here is that Amazon's data centers aren't equipped for the new type of infrastructure. But when you compare them to the cost of the gpu, having a complex cooling arrangement is fine. We made a call on Acera Labs a few months ago, a couple months ago when they were at 90 and it's gone to 250 the month after because of what orders Amazon is placing with them. But there's certain things with Amazon's infrastructure, I won't get too much into it, but their rack infrastructure requires them using a lot more of Astera Labs connectivity products. And the same applies to cooling, right? So it's on the networking and cooling side. They just have to use a lot more of this stuff. But again this stuff is inconsequential in cost compared to the GPU you can build.
Guido Appenzeller
My question was more like, look, I may need a major river close by for cooling at this point. Right. It's in many areas. I just can't get enough water. And you know, it's probably power in the same region.
Dylan Patel
Two gigawatt scale sites that they have power all secured. Wet, wet chillers and dry chillers all secured. Like everything, everything's fine. It's just not as efficient but you know that's fine, right? Like you know they're going to ramp the revenue, they're going to have the revenue. Not that I necessarily think Amazon's internal models are going to be great or hey, their internal chip is better than Nvidia's are competitive with tpu like or their hardware architecture is the best. I don't necessarily think that's the case, but they can build a lot of data centers and they can fill them up with stuff that will be rented out. Right. And it's a pretty simple, it's a pretty simple thesis.
Sarah Wang
How important has Anthropic been to the co design for Trainium? Cause I remember we had a portfolio company, this was summer 2023, they invited them to AWS. They spent man, I think eight hours with them over the course of a week trying to figure out training and back then it was just impossible to work through. Is that you know that obviously that portfolio company hasn't gone back and tried it now. But like how, how different is it now based on what you're hearing?
Dylan Patel
Oh, it's still bad.
Sarah Wang
Okay, okay.
Dylan Patel
You know, it's tough to use. So there's sort of like this is sort of the argument that every inference company offers, right? Including the hardware startups is because I'm only running like three different models at most. I can just hand optimize everything and write kernels for everything and even like go down to like an assembly level. Right.
Guido Appenzeller
How can it be?
Dylan Patel
It is, it is pretty hard. It is pretty hard. But like you tend to do this for production inference anyways. Like you aren't using cudnn, which is Nvidia's like library that's like super easy to generate your, you know, to generate kernels and stuff. Right? Like you're not. Or not generate kernels but anyways you're not using these ease of use libraries when you're running inference. You're either using Cutlass or stamping out your own PTX or in Some cases people are even going down to the SaaS level. Right? And when you look at like say an OpenAI or like, you know, an anthropic, when they run inference on GPUs, they're doing this, right? And the ecosystem is not that amazing when you, once you get all the way down to that level, it's not like, it's not like using Nvidia. GPUs is, is easy now. I mean you have an intuitive understanding of the hardware architecture because you work on it so much and everyone's worked on it and you talk to other people, but at the end of the day it's not like easy. Right? Whereas you know, anthropic training or TPU's actually the hardware architecture is a little bit more simple than a gpu. Larger, more simple cores rather than having all this functionality, you know, less general. So it's a little bit easier to code on. There's tweets from anthropic people saying they, when they are doing that low level, actually they prefer working on Trainium and TPU because of the simplicity. Now interesting to be clear, Trainium and tpu, I mean Trium especially is very hard to use. Like not for the faint of heart, it's very difficult. But you can do it if you're just running like if I'm anthropic and I must only run Claude 4.1 opus for Sonnet and screw it, I won't even run Hiko. I'll just run Hikue like on GPUs or whatever. Right. I'm just going to run two models and actually screw it, I'm just going to run opus on GPUs too. Intro TPU's sonnet is the majority of my traffic anyways. I could spend the time. And how often am I changing that architecture? Every four or six months. Right. Like how much?
Guido Appenzeller
It's not even changing that much, honestly. Right.
Dylan Patel
I think from 3 to 4 definitely did change. Right, yeah.
Guido Appenzeller
Define architectural change. You know, at a high level the primitives are more or less the same across the last couple of generations.
Dylan Patel
I don't know enough about anthropics model architecture to be honest, but I think from what I've seen at other places there have been enough changes that it takes time to, you know, program this and, and really get. The main thing is like, you know, if I'm anthropic and I have what, 7 billion ARR now or whatever, north of 10, you know, by, by the end of next year, north of 20 right. Like ARR is like, maybe even 30 is like, that's, that's. And my margins are 50%, 70%. That's $15 billion of training that I need. Right. That can run on Sonnet. And most of that's going to be Sonnet 3. 5 or sorry 4 5, whatever it is. It's going to be one model serving most of the use cases. So I could spend the time and it'll work on this hardware.
Sarah Wang
Yeah, totally. Maybe on the topic of non consensus calls you've made and maybe I'll move to another cloud. In June you guys said that Oracle is winning the AI compute market. And then in this pod we've already referenced the big jump obviously that Oracle had. I think it was the single largest gain that a company with over 500 billion in market cap has ever had.
Dylan Patel
Was the 2023 Q1 Nvidia. Not bigger. It might have been smaller. Okay.
Sarah Wang
I think it was maybe close. We'll fact check ourselves.
Guido Appenzeller
That's amazing.
Sarah Wang
But obviously this is the massive commitment that was announced. Can you walk us through why you made that call then and just sort of why Oracle is poised to do so well in such a competitive space?
Dylan Patel
Yeah, so Oracle, they're the largest balance sheet in the industry. That is not dogmatic to any type of hardware. Right. They're not dogmatic to any type of networking. They will deploy Ethernet with Arista. They'll deploy Ethernet through their own white boxes. They'll deploy Nvidia networking, Infiniband or Spectrum X. And they have really good network engineers. They have really great software across the board. Right. Again, like clustermax. They were clustermax Gold because their software is great. There's a couple things that they needed to add that would take them higher and they're adding those right to Platinum. Right. Which was where Coreweave was. And so like when you couple, you couple two things, right? Like OpenAI has got insane compute demand. Microsoft is quite pansy. They're not willing to invest in. They don't believe OpenAI can actually pay the amount of money. Right. I mentioned earlier.
Sarah Wang
Right, right.
Dylan Patel
$300 billion deal. Yeah. You don't have $3 billion. And Oracle is willing to take the bet. Now of course the bet is a bit like there's a bit more security in the bet in that Oracle really only needs to secure the data center capacity. Right. So this is sort of like how we came across the bet. Right. And we've been telling our institutional clients, especially in a super detailed way, whether it be the hyperscalers or AI labs or semi electric companies or investors in our data center model because we're tracking every single data center in the world. Oracle doesn't build their own data centers either, right? By the way, they get them from other companies, they co engineer, but they don't physically build them themselves. And so they're quite nimble in terms of like being able to assess new data centers, engineer them. So we saw all these different data centers, Oracle as snatching up in deep discussion, snatching up, signing, et cetera. And so we have, you know, hey, gigawatt here, gigawatt there, gigawatt there, right? Abilene, you know, 2 gigawatts, right? You know, you have all these different sites that they're signing up and discussions with and we're noting them. And then we had the timeline because we're tracking entire supply chain, we're tracking all the permits, regulatory filings through language models, using satellite photos constantly, and then supply chain of chillers, transformer equipment, generators, et cetera. We're able to make a pretty strong estimate of quarter by quarter in our data center model, quarter by quarter, how much power there is for each of these sites. So some of these sites that we know of aren't even ramping until 2027, but we know that Oracle signed it and we have this sort of ramp path. So then it's this question of like, okay, let's say you have a megawatt, right? For simple sake, simplicity's sake, which is a ton of power, but now it doesn't feel like much. We're on the gigawatt rare. But if you talk about a megawatt, right, you fill it up with gpus. How much do the GPUS for a megawatt cost, right? Or actually it's even simpler to do the math, right? If I'm talking about a GV200, right? Each individual GPU is 1200 watts. But when you talk about the CPU, the whole system, it's roughly 2000 watts at the same time, you know, all in everything, simplicity sake, $50,000 per GPU, right? The GPU doesn't cost them, there's all the peripheries, right? So $50,000 capex for 2,000 watts. So $25,000 for 1, 1000 watts. And then what's the rental price for GPU if you're on a really long term deal, volume, 270, right? Two hundred and sixty in that range, then you end up with oh, it costs like $12 million per megawatt to rent a megawatt. And Then each chip is different. So we track each chip, what the capex is, what the networking is. So you know what each chip is. You can predict what chips they're putting in, which data centers, when those data centers go online, how many megawatts by quarter, and then you end up with, oh, well, Stargate goes online in this time period, they're going to start renting at this time, it's this many chips each Stargate site, right? And so therefore this is how much opening I would have to spend to rent it. And then you prick that out. And we were able to predict Oracle's revenue with pretty high certainty. And we matched pretty dead on what they announced for 25, 26, 27, and we were pretty close on 28. The surprise for us was that, you know, they announced some stuff that 28, 29 data centers that they, we don't, we haven't found yet, but we'll find them, right? Of course. And sort of like this methodology lets you see, sort of, hey, what data centers are you getting, how much power, what are they signing, how much incremental revenue that is when that comes online. And so that's sort of the basis of our Oracle bet. Obviously in the newsletter we included a lot less detail, but you know, you know, sort of it was, it was that thesis, right, that like, hey, they have all this capacity, they're going to sign these deals. And in our, in our newsletter we talked about two main things. We talked about the OpenAI business and then we talked about the ByteDance business. And presumably tomorrow, you know, on Friday there's going to be announcement about TikTok and all this. But like the ByteDance business, you know, huge amounts of data center capacity that Oracle is also going to lease out to bytedance. Right? And so we did the same methodology there, you know, with ByteDance, it's pretty certain they'll pay because they're a profitable company. With OpenAI, it's not. And so there's gotta be some like, error bars as you go further out in terms of like, will OpenAI exist in 28, 29, 30 and will they be able to pay the 80 plus billion dollars a year that they've signed up to Oracle with? Right, that's the only risk here. And if that happens, then Oracle's downside is also somewhat protected because they only sign the data center, which is a minority of the cost. Right? The GPUs are everything and the GPUs they purchase one to two quarters before they start renting them. So they're not, you know, the downside risk is pretty low for them in terms of if they don't get the deal. Well, they don't get the revenue. But they're not. It's not like they have. They're stuck with a bunch of assets they bought that are worthless. Yeah, yeah.
Guido Appenzeller
Is that another angle here? I mean, OpenAI and Microsoft EFF and now they're filed to voice papers and they just want to diversify and that's pushing them away to. Towards other providers.
Dylan Patel
Yeah. So Microsoft was exclusive compute provider, it got reorg to write a first refusal, you know, and then, and then Microsoft.
Guido Appenzeller
Is it no last choice or something like that?
Dylan Patel
No, it's still right of first refusal. But it's like Microsoft, those two are not mutually exclusive. Well, if opening has like we're gonna sign a 80 billion dollar contract or a $300 billion contract for the next five years, you guys want it or, you know, it's like. And they're like, no, what? Okay, cool. Right? It's like, it's like. And then they go to Oracle, right? And it's. OpenAI is like, sort of like this is the. You know, OpenAI needs someone with a balance sheet to actually be able to pay for it, right? Because. And then, and then they'll make tons of money off of OpenAI on the margins on the compute and the infra and all these things. But someone's got to have a balance sheet. And OpenAI doesn't have a balance sheet. Oracle does. Although given the scale of what they signed, we also, we had also had another source of information which was that they were talking to debt markets. Right. Because Oracle actually just needs to raise debt to pay for this many GPUs overtime. Now they won't do it like immediately. Like they can pay for everything this year and next year from their own cash. But like in 27, 28, 29, they'll start to have to use debt to pay for these GPUs, which is what Core Weave has done and many of the neoclouds, most of it's debt financed. Even Meta went and got debt for their Louisiana mega data center. Not because, just because it's cheaper than. It's literally better on a financial basis to do buybacks with your cash and get debt because the debt is cheaper than the return on your stock. It's like a financial engineering thing. But who's out there? It could be Amazon, it could be Google, it could be Microsoft.
Guido Appenzeller
This is a very short list.
Dylan Patel
Or it could be Oracle or Meta. Right? Meta's obviously not. Microsoft's chickened out. Amazon, Google and Oracle. Right? That's all that's left.
Guido Appenzeller
Google would be an awkward fit.
Dylan Patel
So yeah, Google would be an awkward fit. Amazon would be a fine fit. But you know. Exactly right. It's like, it's a very dynastic.
Sarah Wang
Yeah, well, I guess maybe, you know, on the topic of these giant data center buildouts, you guys just released a piece on Xai and Colossus 2. Do you. Are you getting less impressed by these feats of building something this massive in six months, or is it still very impressive to you guys?
Dylan Patel
You know, this is the like thing I've said about AI researchers is that they're like the first class of humans to think about things on an order of magnitude scale. Whereas like people have always thought about things in terms of like percentage growth, like ever since industrialization. And before that it was just like absolute numbers, right? You know, sort of like humanity is evolving in terms of how we think because things are changing faster.
Guido Appenzeller
Everything is an upscale.
Dylan Patel
And so like, you know, it was like really impressive when GPT, you know, two was trained on so many chips and then GPT three was trained on that, you know, like on, on 20K1 hundreds and you know, or sorry, GPT four and 20K1 hundreds GPD, you know, sort of like it's like, holy crap. And then it was like, oh, the era of 100k GPUs clusters, right? And we did some reports around 100k GPU clusters, but now there's like, there's like 10 100k GPU clusters in the world. It's like, okay, this kind of boring, but it's like 100k GPUs is like, you know, over 100 megawatts now. It's like, you know, you know, like literally, you know, in our slack and in some of these channels, like, oh, we found another 200 megawatt data center there. There. There's a. There's someone who like puts the yawning emoji every time and I'm like, dude, what? Like now it's only, it's only exciting.
Sarah Wang
If you do gigawatt scale era.
Dylan Patel
Yeah, yeah, yeah. And I'm sure like, you know, and you know, I'm not sure, maybe, maybe we'll start yawning to that too. But like, you know, the log scale of this is like the capital numbers are crazy, right? Like, you know, it was like, it's crazy enough that OpenAI did like $100 billion trading running, you know, or you know, like then they did a billion dollar training run. Now we're talking about $10 billion training runs, right? Like, you know, it's, it's, it's crazy that we think in log scale, but yes, things are only impressive when they do it like what Elon's doing. So what Elon's doing in Tennessee, in Memphis, first time was crazy, right? 100k GPUs in 6 months. He bought a factory in like February of 24 and had models training within 6 months. Right. And he did liquid cooling, you know, first large scale data center with liquid. At this scale for AI doing liquid cooling. Like all these sorts of crazy firsts. Putting generators outside, like cat turbines, all these things for different things to get the power, you know, mobile substations, all these different crazy things. Tapping the natural gas line that's like running alongside the factory. All these. So he does this. It's like, holy crap. And he did it for 100k GPUs, you know, 200, 300 megawatts. Right? Now he's doing it for a gigawatt scale and he's doing it just as fast. Right. And so like you would think, like this is obviously way more impressive that he did it again.
Sarah Wang
Yeah.
Dylan Patel
But like, maybe I'm desensitized. But like it's like, you know, like you've given the child too much candy, right? And now like the child has no, you know, is like, you know, doesn't like apples, right. Like, I don't know. So, so, so like, yeah, a gigawatt data center. There was all these protests around his Memphis facility. People like, oh, you're destroying the air. And it's like, have you looked around that area of Memphis? Like there is like a gigawatt gas turbine plant that's just powering generally that area. There's a sewage plant that's servicing the entire city of Minnesota, or sorry, city of the Memphis. And there's like open air pits of like the, like, like there's open air mining. Like there's all sorts of disgusting shit around there which is needed. Right. We need that stuff to have a country run.
Guido Appenzeller
Right.
Dylan Patel
Like to be clear. And you know, it's like people are complaining about like a couple hundred megawatts in there.
Guido Appenzeller
Yeah.
Dylan Patel
Of. Of, of generation. So he, he got like protests from all sorts of people. You know, you got super into the politics side of things. Or NAACP even protested him like. And so like he really got like some local municipalities to be like, oh, I don't like, you know, like this. And so he Couldn't do as much as he wanted to in Memphis, but he still needed the data center to be close because he wanted to connect these data centers. Super high bandwidth, super close. And he also already had a lot of infrastructure set up there. So he bought another distribution center at this time, and it's still in Memphis. But the cool thing about Memphis is it's right across the border from Mississippi. Right. So now, you know, it's like 10 miles away from his original one, but his facility is like a mile away from Mississippi. And he bought a power plant in Mississippi and he's putting turbines there because the regulation is completely different. Right. And if the question is really like galvanize resources and build it really fast, maybe, maybe Elon is ahead of everyone. You know, he hasn't made the best model yet, or he doesn't have the best model at least today, I think, you know, you could argue Grok 4 was the best for a little period of time. But like, you know, it's, it's, it's, it's, it's truly amazing how fast he's able to build these things. And for first principles, it's like most people are like, fuck, like, you know, we can't build the power. We can't do power here anymore. Yeah, I guess we have to find a new site. And it's like, no, no, just go across the border, go to Mississippi. And my favorite thing is like, Arkansas is right there. So Mississippi gets mad. You know, I don't. You know the regulations, all future data.
Guido Appenzeller
Centers, you know, built in places where multiple states meet.
Dylan Patel
Is that the four quarters? Yeah, the optimal regular. I think there's, there's one. There we go. Is there a point in the US with five? I know there's a point with four. Four states intersect there. Yeah, yeah, maybe, maybe that's recorded as data center.
Sarah Wang
All right, I'm going to buy real estate in that area. Front Reddit. Well, I guess on the topic of just maybe new hardware, you had this piece analyzing TCO for GB2 hundreds. And I'm kind of going to ask this question on behalf of our portfolio companies, which. It sounds like you're helping them already. But one of the findings that I thought was really interesting was TCO was sort of 1.66x H1 hundreds for GV2 hundreds. And so obviously, you know, there's this point on. Okay. That's sort of the benchmark for the performance boost that you're going to need to at least make the sort of performance cost ratio benefit from switching over maybe just talk about what you've seen from a performance standpoint. And what do you recommend to portfolio companies, maybe in a smaller scale than xai, who are, you know, thinking about new hardware, try to get it. There's capacity constraints, obviously.
Dylan Patel
Yeah, I mean, that's a challenge, right? Is with each generation of gpu, it gets so much faster that you end up like you want the new one. And in some metrics you could say GB200 is three times faster than or two times faster than the prior generation. Other metrics you can say it's way more than that. Right. So if you're doing pre trading versus inference, right.
Guido Appenzeller
They can run everything four bits, right?
Dylan Patel
Yeah. If you can run it four bit or just inference and take advantage of the huge NVLink NVL 72, you know, there's ways you can, you could squint and say GB200 is only 2x faster than H100, in which case 1.6x TCL, it's, you know, it's worthwhile, Right. It's worth going to the next gen. But more marginal, it's more marginal, it's not a big deal. Then there's other cases where it's like, well, on if you're running deep SEQ inference, the performance difference per GPU is like north of like 6.7x and it continues to optimize for deep seq inference. And so then it's like, well, I'm only paying 60% more for 6x, it's a 4x or 3x performance per dollar gain. Absolutely right. If you're in running inference of deep seq that can also include rl. Right. And so the question is sort of. And then the other question is like, well, the GPU is new. There's also B200, there's GB200, there's B200. B200 is much more simple from a hardware perspective. It's just eight GPUs in a box. So then it's not as much of a performance gain, especially in inference. But you have all the stability, right? It's an 8 GPU box, it's not going to be unreliable. The GB2 hundreds are still having some reliability challenges. Those are being worked through. It's getting better and better by the day, but it's still a challenge. But you know when you have a GB, when you have a H100 right box or H208 GPUs, one of them fails, you take the entire server offline. You have to fix it. Right. So usually if Your cloud's good, they'll swap it in, right? But if it's GB200, what do you now do with 72 GPUs? If one fails to break the whole thing and you get a new semi, the blast radius of a failure. Right. Note, GPU failure rates at best are the same and likely worse, right? Gen on gen, because everything's getting hotter, faster, etc. So at best the failure rates are the same. Even if you model the failure rates as the exact same because you go from 1 out of 8 to 1 out of 72, it's a huge problem. So now what a lot of people are doing is they run a high priority workload on 64 of them and then the other eight, you run low priority workloads, which is then like, okay, there's this whole like infrastructure challenge. Like I have to have high priority workloads. I've had low priority workloads. When a high priority workload has a failure, instead of taking the whole rack offline, you just take some of the GPUs from the low priority one, put it in the high priority one, then like you just let the dead GPU sit there until you service the rack at a later date. And it's like there's all these like complicated infrastructure things that make it so. Oh wait, actually that, that 2, that 3x or 2x performance increase in pre training is lower because the downtime is higher. Slash. Not using all the GPUs, always slash. I'm not able to perfect, you know, I'm not smart enough or I don't have the infra to like have low priority and high priority workloads. Like it's not impossible. Yeah, the labs are doing it, right? Like it's just, I mean if I'm.
Guido Appenzeller
Running a cloud, it's actually really hard, right? Because I probably have to rent the spot one, like the spares out of spot instances or something.
Dylan Patel
No, no, no, no. Because then, because it's a, it's a coherent domain, it's NV link. You don't want anyone touching that. So it has to be the end customer leave them because it's empty spare.
Guido Appenzeller
So it's even worse.
Dylan Patel
No, the end customer usually would just be like, I want them and I'll, I will, you know, and the SLAs and the pricing, everything is like accounting for that, right? So like generally when you have a cloud, you have an sla, right? That is, hey, it's going to be. Uptime is going to be 99%, you know, blah, blah, blah. Right. For this period with GB200, it's 99% for 64 GPUs, not 72. And then it's like 95% for 72. Now it differs across every cloud. Every cloud has a different sla.
Sarah Wang
Got it?
Dylan Patel
Yeah. But like they've adjusted for this because they're like, look, this hardware is just finicky. Do you still want it? You know, we will credit you in that 64 of them will always work, right? Not. Not 72. And so like there's this whole like finicky nature. And the end customer has to be capable of dealing with the unreliability. And it's like. And the end customer can just continue to use B200. Right? Performance games, not as much. The whole reason you want this 72 domain is so you can have, you know, some of these gains.
Sarah Wang
Right.
Dylan Patel
But you have to be smart enough to be able to do it. And that's challenging for small companies. Totally.
Guido Appenzeller
So the Nvidia just announced the Rubin.
Dylan Patel
Prefill cards like ctx, cpx.
Guido Appenzeller
There we go. What's your take on that? Does it cannibalize?
Dylan Patel
Dude, by the way, I don't know if this is like brain rot or like, I don't know, but like I can't remember what I had for lunch yesterday, but I know the model number of every fucking chip, like haunts you in your dreams. We're broken. We're broken.
Guido Appenzeller
Living the dream.
Dylan Patel
No, no, no.
Guido Appenzeller
You know, why do you pre announce a product that's 5x faster for certain use cases? Is that much?
Dylan Patel
I think it's like historically AI chips were AI chips, right? And then we started getting a lot of people saying this is a training chip, this is an inference chip. Actually training and inference are switching so fast in terms of what they require that like now it's like still like one chip. Actually there are still workload level dynamics that differ, but the main workload is inference. Even in training, right? It's because of rl. Most of that is generating stuff in an environment and trying to achieve a reward. So it's inference still, Right. Training is now becoming mostly dominated by inference as well. But inference has two main operations. There is calculating the KV cache for pre fill, right? Here's all these documents do the attention between all of them, between all the tokens, however, whatever type of attention you use. Then there's decode, which is auto aggressively generate each token. These are very, very different workloads. Initially, the ideas or infrastructure techniques, the ML systems techniques were oh Okay, I will just make the batch size every single forward pass this big. And if I make it, let's call it, I'll make it a thousand big and maybe I'll run 32 users concurrently that way. Now I still have 900 something left. 960 left, right. That 960 is actually doing the pre fill for if a request comes in, it chunks it. It's called trunk pre fill. You pre fill chunks of it. Now you get really good utilization on GPUs. But then that ends up impacting the decode workers. The people who are auto aggressively generating each token end up having slower T and tokens per second is really important for user experience and all these other things, right? So then the idea is like okay, these two workloads are so different and they are literally different, right? You pre fill and then you decode. It's not like you're interleaving them. So why don't we split them entirely? And this is done on the same type of chip, right? OpenAI anthropic, Google.
Guido Appenzeller
Pretty much everybody does that.
Dylan Patel
Everyone good, everyone together. Fireworks. All these guys do pre fill decode, disaggregated pre fill decode. So they run pre fill on a set of GPUs. Decode on certain set of GPUs. Why is this beneficial? Because you can auto scale them, right? You can, hey, all of a sudden I have a lot more long context workers. I allocate more resources to pre fill. Oh, all of a sudden have a. You know, not all of a sudden but like, you know, over time my traffic mixes. Not long input, short output. It's short input long output. I have more decode workers this way I can guarantee. And so now I can auto scale the resources differently and I can also guarantee that my prefill time is, you know, by the time. You know what's really important in search is how fast you get the page to start loading, not when does the resource happen. What do people do in games? Like the loading screen often has some sort of interactive environment or it blends in over time or whatever it is. It has tips and tricks, ways to distract you. The same thing is there's like studies and papers out there that users prefer a faster time to first token, right? First token gets streamed to me like sooner. Even if the total time to get all my tokens is a little bit longer.
Guido Appenzeller
I can't read that fast anyways, right?
Dylan Patel
So I mean, I mean I like to skiv. Yeah.
Guido Appenzeller
I mean most models return above speed reading speed, but you need that right?
Dylan Patel
I think, I think. But like, you know, the idea is that you want to guarantee time to first token is a certain level for user experience reasons. Otherwise people like, screw this. Not using AI. The decode speed matters a lot too, but not as much as time to first token. And so by having separate pre filled decode you do this, right? But now you've already, and this is all on the same infrastructure, you've already done this. So now it's like what's the next logical step? These workloads are so different. Decode you have to load all the parameters in and the KV caches to generate a single token. You batch a couple users together but very quickly you run out of memory capacity or memory bandwidth because everyone's KV cache is different. You know, the attention of all the tokens, right? Whereas on pre fill I could even just serve like one or two users at a time because if they send me a 64,000 context request, that is a lot of flops, right? 64,000 context requests. I'll use, I'll use llama 70B because it's simple to do math on like 70 billion parameters, that's, that's 140 gigaflops per token. 70 times 64,000, that's, that's, that's many, many teraflops. You can use the entire GPU for like a second, right? Like potentially, right, Depending on the GPU to just do the pre fill, right? And that's just one forward pass. So I don't necessarily care about loading all the tokens or all the parameters in kvcache in fast. All I care about is all the flops. And so that leads us to sort of like, you know, I had to give this long winded explanation because it's hard for people to understand what CPX is. I've had a lot of like even my own clients, like we set like multiple notes like explaining and they're like, I still don't understand like shit, okay.
Guido Appenzeller
Send the attention is all you need.
Dylan Patel
Paper and you can't expect, I mean like you think about like a, like, like a networking person. Like they're like, I don't know, I don't need to know about this. You know, attention is all you need, right? Like it's like, or think about an investor, right? Like you know, there's all these people, maybe data center operator. Like they're like, oh, there's two chips. Why should I build my data center for differently? It's like, like you know, I gotta explain everything or just like no, you don't have to build different things. But anyways, you get to know in.
Guido Appenzeller
Stanford it is 25% of all students, not CS students of all students read their paper.
Dylan Patel
Read what paper?
Guido Appenzeller
Attention is what you need. That's low.
Dylan Patel
The DOJ majors and you know like the philosophy.
Guido Appenzeller
I find this amazing.
Dylan Patel
Anyway, sorry. The Middle East, I can't remember what country it is, has AI education starting at like age like 8 and in high school they have to read Attention is all you need. Wow. Someone, someone told me that their skin had to read Attention is all you need. Which is, you know, I don't, I don't know. Look, look, top down mandates for education. You know, maybe they work, maybe they don't. Like, you know, maybe people like homeschooling their kids. I don't know. I went to public school. But like back to your readers.
Sarah Wang
Just on the topic of hardware cycles, I wanted to maybe actually explain what CPX is.
Dylan Patel
So CPX is a very like compute optimized chip. Whereas you know, for pre fill and then decode is just to simply say it is like the rest is the normal chips with hbm. HBM is more than half the cost of the gpu. If you strip that out, you end up having a much cheaper chip passed on to the customer. So. Or like you know, if Nvidia takes the same margin then the cost of this pre fill chip is much, much lower and now the whole process is way cheaper, more efficient. Now long context can be adopted.
Guido Appenzeller
All right.
Sarah Wang
Yeah, so I love that we're actually going to all this detail because I had a more 10,000 foot view question for you which is I haven't been following the sevy market as closely as you have. I probably started with the A100 and I remember helping gnome at character. This is summer of June 2023, chase down GPUs and the only thing that mattered at that time was delivery date because there was a huge capacity crunch. And then to see that over the last two years evolve where you know, let's say six to 12 months ago people were doing these RFPs to 20 Neo clouds, right? And the only thing that mattered to some degree was price.
Dylan Patel
People actually do RFPs for GPUs?
Sarah Wang
Yes.
Dylan Patel
So. So, so just to be clear, my opinion on how you buy GPUs is that it's like buying cocaine or any other drug. This is described to me, not me, I don't buy cookies. Someone tells me this, someone tells me this, I'm like holy. That's right. You call Up a couple people, you text a couple people, you ask, yo, how much you got? What's the price?
Sarah Wang
It's like, exactly.
Dylan Patel
This is like buying drugs. Like, oh, sorry, sorry. No, I mean, it's the same way. You just send like, we have Slack connects with like 30 neoclouds and like, as well as like some of the major ones. And we just send them a message like, hey, customer wants this much. You know, this is what they're looking for. And then they send quotes. And I know this guy.
Sarah Wang
I know a guy well. So I think that's actually a very accurate description. And I've said countless. Portco is your clustermax original post, because I thought it did a really good job breaking them down. But maybe one question to end on for me is just what era are we in now with Blackwell's coming on? Are we sort of back to the summer 2023 era? And it's. That's kind of the cycle that we've just entered or what sort of your view on where.
Dylan Patel
So for a question for one of your portcos, we. We were like, you know, after their difficulties with Amazon, we tried to. We were like, okay, let's. Let's actually like it. You use the original deals we got, you were gone. But like, here's some other deals, right? It turned out that multiple major Neo clouds had sold out of Hopper capacity and their Blackwell capacity comes online in a few months. So it's a bit of a challenge, right? In that due to inference. Inference demand has been skyrocketing this year, right? Reasoning models, these reasoning models of revenue, it's been skyrocketing this year. And then also there's a bit of the, you know, Blackwell comes online, but it's hard to deploy, so it takes a little, you know, there's a learning curve to deploying it. So whereas, like, you got down to, like, you buy the hopper, you install the data center, it's running within like, you know, a month or two, right? For, for, for Blackwell, it was like, it's a longer time frame because of reliability challenges. It's a new gpu. I mean, it's just learning pain, right? Learning, learning, growing pains. So there was like, there was like this gap of like, how many GPUs are coming onto the market, right, as revenue starting to inflect. And so a lot of capacity got sucked up, right? And actually prices for Hopper bottomed like three or four months ago or like five or six months ago. Yeah. And actually they've like, crept up a little bit now. They're still like, you know, not so. So I do. I don't think we're quite 2023, 2024. Era of GPUs are tight, but certainly if you want to, if you want like just a few GPUs, it's easy. But if you want a lot, it's, it's, it's hard. Yeah. Like you can't get capacity that instantly. Yeah. Wow. What a time. Shall we, shall we wrap on that? Dylan, this was another instant classic.
Guido Appenzeller
Thank you so much for coming.
Dylan Patel
The bike, it was like two hours, bro. No, I missed it.
Sarah Wang
Thank you.
Guido Appenzeller
We couldn't.
Host (Marc Andreessen or Chris Dixon)
I couldn't stop.
Guido Appenzeller
Thanks so much. It was great.
Dylan Patel
Thank you so much for having me.
Host (Marc Andreessen or Chris Dixon)
Thanks for listening to the A16Z podcast. If you enjoyed the episode, let us know by leaving a review@ratethispodcast.com a16z we've got more great conversations coming your way. See you next time. As a reminder, the content here is for informational purposes only, should not be taken as legal, business, tax or investment advice, or be used to evaluate any investment or security, and is not directed at any investors or potential investors in any A16Z fund. Please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast. For more details, including a link to our investments, please see a 16 zone forward slash disclosures.
This episode explores the seismic shifts in the semiconductor industry caused by NVIDIA’s surprise $5 billion investment in Intel, and their newly announced collaboration on custom data centers and PC products. The panel – leveraging deep industry and technical insight – explores the implications for NVIDIA, Intel, AMD, ARM, and the US-China tech rivalry. They also dive into China’s AI chip ambitions, rapid hyperscaler infrastructure buildouts, and the ever-volatile GPU supply chain.
“It's kind of poetic that everything's gone full circle and Intel is crawling to Nvidia… but actually, it might just be the best device.”
— Dylan Patel ([01:22])
“If banning Nvidia chips to China is so good for China, why didn’t China do it for itself? And they’re finally doing it for themselves.”
— Dylan Patel ([13:40])
“The goal of playing is to win, and the reason you win is so you can play again. It’s only about now, next generation… a whole new playing field every time.”
— Jensen Huang (as recounted by Dylan Patel [35:00])
“How you buy GPUs is like buying cocaine. You call up a couple people, you text a couple people, you ask, you know how much you got, what's the price?”
— Dylan Patel ([00:00])
“If your two arch nemesis suddenly team up, it’s the worst possible news you can have. I did not see this coming. I think it's an amazing development.”
— Guido Appenzeller ([00:07], [04:29])
“The bulls’ case is AI is actually so transformative and the world just gets covered in data centers… all of this is running on Nvidia for the most part.”
— Dylan Patel ([26:40])
“The goal of playing is to win, and the reason you win is so you can play again… It’s only about now, next generation.”
— Jensen Huang (recounted by Dylan Patel, [35:00])
"If banning Nvidia chips to China is so good for China, why didn’t China do it for itself? And they’re finally doing it for themselves."
— Dylan Patel ([13:40])
If you want an immersive, detailed look at who’s winning the global AI chip race, how Nvidia’s culture keeps them ahead, and what’s next for building the computing world’s physical backbone—this is the episode to catch.