Ep18. Jensen Recap - Competitive Moat, X.AI, Smart Assistant | BG2 w/ Bill Gurley & Brad Gerstner - BG2Pod with Brad Gerstner and Bill Gurley

Summary6 min read

Podcast Summary: BG2Pod with Brad Gerstner and Bill Gurley

Episode: Ep18. Jensen Recap - Competitive Moat, X.AI, Smart Assistant
Release Date: October 13, 2024
Host/Author: BG2Pod
Guests: Brad Gerstner (@altcap), Bill Gurley (@bgurley), Sunny

1. Introduction and Context

The episode kicks off with Brad Gerstner reflecting on the recent Altimeter annual meetings, highlighting the event's focus on scaling intelligence toward Artificial General Intelligence (AGI). Brad mentions key speakers, including Intel's Sunny and Jensen Huang of Nvidia, whose extensive discussion forms the centerpiece of this episode.

Notable Quote:

Brad (00:29): "We recorded it on Friday. We'll be releasing it as part of this pod. And man, was it dense. I mean, he was, you know, he was on fire."

2. Highlights from Jensen Huang's Talk

Sunny shares his impressions of Jensen Huang's presentation, emphasizing Huang's strategic vision for Nvidia beyond being a GPU company.

Key Points:

Nvidia as an Accelerated Compute Company: Huang articulated that Nvidia's identity transcends GPU manufacturing, positioning the company as a leader in accelerated computing.
Data Center as the Unit of Compute: Huang emphasized the significance of data centers in Nvidia's compute strategy, underscoring their role in scaling AI workloads.
Integration and Self-Utilization of AI: Nvidia leverages its own AI extensively to enhance operational efficiencies, showcasing its "eating the dog food" approach.

Notable Quotes:

Sunny (02:29): "Nvidia is not a GPU company, they're an accelerated compute company."

Sunny (02:29): "He thinks about using and already utilizing so much AI within Nvidia and how that's a superpower for them to accelerate over everyone they're competing with."

3. Nvidia's Competitive Moat: Beyond GPUs

Brad and Bill delve into Nvidia's competitive advantages, dissecting the layers that constitute its robust moat in the tech industry.

Key Points:

Systems-Level Advantages: Nvidia has constructed a comprehensive technology stack over the past 15 years, integrating hardware and software to create combinatorial advantages.
CUDA Ecosystem: The CUDA library boasts over 300 industry-specific acceleration algorithms, tailored to diverse sectors like synthetic biology, image generation, and autonomous driving.
Integration with Cloud Service Providers: Nvidia collaborates closely with cloud giants to optimize and accelerate AI workloads, reinforcing its dominant position.

Notable Quotes:

Brad (04:32): "There's this idea that it's just a GPU and that somebody's going to build a better chip, they're going to come along and displace the business."

Sunny (06:27): "He really started going into what they're doing, very particularly on mathematical operations to accelerate their partners and how they work really closely with their partners."

4. Inference vs. Training in AI Workloads

A significant portion of the discussion revolves around the distinction between inference and training workloads in AI, and how Nvidia positions itself in both domains.

Key Points:

Inference Dominance: Nvidia anticipates inference workloads to grow exponentially, becoming a billion times larger than training workloads.
CUDA's Relevance: While CUDA remains pivotal for training, its role in inference is diminishing as specialized competitors emerge.
Competitive Landscape: Companies like Groq, Cerebras, and Sambanova are leading in inference performance, challenging Nvidia's dominance in this area.

Notable Quotes:

Brad (03:00): "He thinks they can 3x the top line of the business while only adding 25% more humans because they can have 100,000 autonomous agents doing things like building the software, doing the security."

Sunny (22:46): "The three fastest companies in inference right now are not Nvidia."

5. Scaling Data Centers and Competitive Advantages

Brad and Bill explore the scalability of Nvidia's data centers and how this contributes to maintaining their competitive edge.

Key Points:

Largest Systems as Competitive Advantage: Nvidia's strength lies in deploying large-scale systems where networking and CUDA truly shine.
Customer Concentration: As demand for large AI workloads grows, Nvidia may see increased customer concentration, reinforcing its market position.
Integration with ARM: Discussions touch on ARM's role in edge computing, suggesting potential competitive challenges for Nvidia in decentralized AI deployments.

Notable Quotes:

Bill (15:00): "It appears to me that Nvidia's competitive advantage is strongest where the size of the system is largest."

Sunny (09:36): "If you think about an orthogonal competitor, right? ... if you think about of competitive advantage, you know, be challenged a little bit."

6. Impact of AI on Business Productivity and Margins

The conversation shifts to the broader economic implications of AI integration, particularly how it can drive productivity and margin expansion across businesses.

Key Points:

Productivity Gains: Nvidia's internal use of AI for design verification and other operations has led to substantial productivity improvements.
Margin Expansion: High operating margins (65%) and the potential for even higher margins (up to 80%) highlight Nvidia's extraordinary financial performance.
Industry Transformation: Companies adopting AI tools are poised for significant operational enhancements, while those that don't may falter.

Notable Quotes:

Bill (48:07): "Nvidia is a very special company... the companies that don't deploy these things are going to go out of business."

Sunny (51:06): "I actually think sort of that's an underestimate. I think you're talking multiple hundreds of percent improvement in productivity gains."

7. The Future of Intelligent Agents with Memory and Actions

Brad introduces a speculative wager on the development of intelligent agents capable of memory and autonomous actions, sparking a lively debate.

Key Points:

Wager on Intelligent Agents: Brad bets with Bill and Sunny on whether intelligent agents with memory and action capabilities will materialize within two years.
Technical Feasibility: Discussion centers on the existing capabilities of AI to handle tasks like booking a hotel, highlighting current technological constraints around reliability and trust.
Integration Challenges: Concerns about secure, scalable deployment of autonomous agents that can perform actions requiring trust, such as handling credit card transactions.

Notable Quotes:

Brad (43:42): "What really transforms people's lives... is that when we have an intelligent assistant that we can interact with, that gets smarter over time, that has memory and could take actions."

Bill (45:09): "What you really can't have is, like the hallucination, when your credit card gets charged 10 grand... you just can't have failure."

8. Consolidation and Future Prospects in AI

The discussion touches upon market consolidation, with expectations that only a few major players like X AI will dominate the AI landscape due to their comprehensive compute and systems capabilities.

Key Points:

Economic Models and Funding: Companies like OpenAI have secured substantial funding, raising questions about the sustainability of newer entrants in the AI space.
Competitive Dynamics: Nvidia's strategic partnerships and ability to scale data centers give it a formidable advantage, potentially leading to market consolidation.
Future of AI Deployments: The demand for AI workloads suggests that Nvidia and similar companies will continue to thrive, while others may struggle to keep pace.

Notable Quotes:

Brad (36:12): "Funding at this point, they've achieved escape velocity."

Bill (37:35): "Nvidia is a very special company... the real answer is the companies that don't deploy these things are going to go out of business."

9. Conclusion and Final Thoughts

The episode wraps up with a reflection on the rapid advancements in AI and the strategic positioning of companies like Nvidia and X AI. The hosts express optimism about the transformative potential of intelligent agents and the ongoing evolution of AI technologies.

Notable Quotes:

Sunny (51:06): "They have their arms around a lot of these very, very difficult problems."

Brad (53:51): "It was a special one to."

Final Remarks: Brad, Bill, and Sunny engage in a wager to bet on the timeline for the realization of intelligent agents with memory and action capabilities, highlighting the forward-looking nature of the discussion. The hosts emphasize the critical role of scaling intelligence and the integration of AI into various layers of technology and business operations.

Notable Highlights:

Nvidia's Evolution: Transition from a GPU-centric company to a comprehensive accelerated compute leader.
AI Workload Dynamics: The anticipated explosive growth of inference workloads surpassing training demands.
Competitive Landscape: Emergence of specialized companies challenging Nvidia in inference performance.
Economic Impact: Significant productivity gains and margin expansions driven by AI integration.
Future Speculations: The potential for intelligent agents with autonomous capabilities within a short timeframe.

Disclaimer: The views and opinions expressed in this podcast are those of the speakers and do not constitute investment advice.

Loading summary

Transcript96 lines

[00:00]
Bill
You may also be running up against the. Even for the Mag 7, the size of Capo X deployment, where their CFOs start to talk at higher levels for sure.
[00:12]
Brad
Totally. Sunny. Bill, great to see you guys.
[00:27]
Bill
Good to see you.
[00:28]
Sunny
Good to be back.
[00:29]
Brad
Thanks, man. It's great to have you. We literally just finished two days of the Altimeter annual meetings. I mean, we had hundreds of investors, CEOs, founders, and the theme was scaling intelligence to AGI. We had Nikesh talking about enterprise AI. We had Rene Haas talking about AI at the edge. We had no.1 Brown talking about, you know, the Strawberry and O1 model and inference time reasoning. We had Sunny talking about, you know, accelerating inference. And of course we kicked off with Jensen talking about the future of compute. You know, I did the Jensen part, talk with my partner Clark Tang, who covers the compute layer and the public side. We recorded it on Friday. We'll be releasing it as part of this pod. And man, was it dense. I mean, he was, you know, he was on fire. He told me, I asked him at the beginning of the pod, what do you want to do? He said, grip it and rip it. And we did 90 minutes. We went deep. I shared it with you guys. We've all listened to it. I learned so much playing it back that I just thought it made sense for us to unpack it, right? To really analyze it, see what we agree with, what we may disagree with, things we want to further explore. Sonny, any high level reactions to it?
[01:45]
Sunny
Yeah, you know, first, it's the first time I've really seen him in a format where you got all that information out in one setting because you kind of get the, you get the tidbits. And the ones that really struck with me was when he said Nvidia is not a GPU company, they're an accelerated compute company. I think the next one which you'll touch on is where he really said the data center is the unit of compute. I thought that was massive and sort of just closing out. When he talked about he thinks about using and already utilizing so much AI within Nvidia and how that's a superpower for them to accelerate over everyone they're competing with. I thought those were kind of really awesome points. And him eating the dog food, as they say.
[02:30]
Brad
It was incredible. There's this thing we'll talk about later. But he said he thinks they can 3x the top line of the business while only adding 25% more humans because they can have 100,000 autonomous agents doing things like building the software, doing the security and that he becomes really a prompt agent not only for his human direct reports, but also for these agents, which, you know, really is. Is mind boggling. Bill, anything stand out for you?
[03:00]
Bill
Well, one, I mean, you should be pleased that you were able to get his time. You know, this is at points in time, the largest market cap company in the world, if not one. Two. And so it was so, I think, kind of him to sit down with you for so long. And during the pot, he kept saying, I can stay as long as you want. I was like, doesn't he have something to be doing?
[03:25]
Brad
Like, it was incredibly.
[03:28]
Bill
And it's fantastic. But my other big, my, my. I mean, two, I had two big takeaways. One, I mean, it's obvious that this guy's, you know, rolling on all cylinders here, right? Like, you have a company at a 3.3 trillion market cap that's still growing over 100% a year. And the margins are insane. I mean, 65% operating margins. There's only like five companies in the S&P 500 at that level, and they certainly aren't growing at this pace. And when you bring up that point about getting more done on the increment with fewer employees, where is this going to go? Like 80% operating margin. I mean, that would be unprecedented. There's a lot that's already here that's unprecedented. But obviously Wall street is fully aware of the unbelievable performance of this company. And the multiples reflect it and the market cap reflects it. But it's super powerful how they're executing. And you can see the confidence in every answer that he gives.
[04:33]
Brad
We spent about a third of the POD on Nvidia's competitive moat, really trying to break it down, really trying to understand this idea of systems level advantages, the combinatorial advantages that he has in the business. Because I think when I talk to people around the investment community, despite how well it's covered, Bill. Right. There's still this idea that it's just a GPU and that somebody's going to build a better chip, they're going to come along and displace the business. And so when he said, again, it can sound like marketing speak, Sonny, when somebody says it's not a GPU company, it's an accelerated compute company. You know, we showed this, we showed this chart where you can see kind of the Nvidia full stack. And he talk about how he just built layer after layer after layer of the stack, you know, over the course of the last decade and a half. But when he said that, Sonny, I know you had a reaction to it. Right. Even though, you know, it's not just a GPU company. When he really broke it down, it seemed like, you know, he did break new territory here.
[05:36]
Sunny
Yeah. Like what was great to hear from him and really, you know, positive for, you know, folks thinking about where Nvidia lives in the stack right now is he kind of got into details and then the sub details below Cuda and he really started going into what they're doing, very particularly on mathematical operations to accelerate their partners and how they work really closely with their partners, you know, all the cloud service providers, to basically build these functions so that they can further accelerate workloads. The other little nuance that I picked up in there, he didn't focus purely on LLMs. He talked in that particular area about how they're doing that for a lot of traditional models and even newer models that are being deployed for AI. And I think just really showed how they are partnering much closer on the software layer than the hardware layer alone.
[06:28]
Brad
Right. I mean, in fact, he talked about the CUDA library now has over 300 industry specific acceleration algorithms where they deeply learn the industry. So whether this is synthetic biology or this is image generation or this is autonomous driving, they learn the needs of that industry and then they accelerate the particular workloads. And that for me was also one of the, one of the key things, this idea that every workload is moving from kind of this deterministic handmade workload to something that's really driven by machine learning and really infused with AI and therefore benefits from acceleration. Even something is ubiquitous as data processing.
[07:15]
Sunny
Yeah. And I shared this code sample with Bill as we were just preparing for this pod and I knew Bill processed it right away and ran it, which was. It really showed like every piece of code that's out there now that's related to or not every piece of. Many of the pieces have this sort of if device equals cuda, do X and if it's not do Y. And that's the level of impact they're having across the, you know, entire ecosystem of services and apps that are being built that are related to AI. Bill, I don't know what you thought when you saw that piece.
[07:47]
Bill
Yeah, I mean, I think there is a, there's a question for the long term that relates to cuda and I want to go back to the system point you made later, Brad, but while we're on CUDA is what percentage of developers will touch cuda and is that number going up or down? And I could see arguments on both sides. You could say the models are going to get more and more hyper specialized and performance matters so much that the models that matter the most, the deployments that matter the most, they're going to get as close to the metal as possible and then CUDA is going to matter. The other side you can make is those optimizations are going to live in Pytorch, they're going to live in other tools like that, and the marginal developer is not going to need to know that. And I don't. I could make both arguments, but I think it's an interesting question going forward.
[08:43]
Brad
I mean, I just asked ChatGPT how many CUDA developers there are today just to be on top of 3 million CUDA developers. Right. And you know, a lot more that touch CUDA that, you know, aren't specifically kind of developing on. So it is one of these things that has become pretty ubiquitous. And his point was it's not just Cuda, of course. It's, you know, it's really full stack, you know, all the way from data ingestion all the way through, you know, kind of the post training.
[09:08]
Sunny
I think I'm on the ladder of your point, Bill. Like I think there's going to be fewer people touching that. And I do think that's a point where the moat is not as strong longer term, as you say. And think about the analogy that I would go with is think about the number of iPhone iOS developers working at Apple Building that versus the number of app developers. And I think you're going to have a 10 to 1 or 100 to 1 ratio of people building at layers above versus people building down closer to the bare metal.
[09:37]
Bill
That'd be something to watch. We can ask more people over time, obviously. It's a big lock today for sure.
[09:43]
Brad
And I think, Bill, to your point, I reached out to Gavin actually before I did the interview. Gavin Baker, who's a good buddy and who obviously knows the space incredibly well, has followed it at a deeper level for a longer period of time than I have. And when I asked him about the competitive advantage, he really said a lot of the competitive advantage is around this algorithmic diversity and innovation and why CUDA matters. He said if the world standardizes on Transformers on Pytorch, then it's less relevant for GPUs in that environment, if you have a lot of standardization, then advantage goes to the custom asics. But I'll tell you this, and I've had this conversation with a lot of people when I asked Jensen, I pushed him on custom Asics. I was like, hey, you've got accelerated inference coming from Meta with their MTIA chip. You got Inferencia and Trainium, you know, coming. He's like, yeah, Brad, like they're it, you know, they're my biggest partners. I actually share my three to five year roadmap with them. Yes, they're going to have these, these point solutions that are going to do these very specific tasks. But at the end of the day, the vast majority of the workloads in the world that are machine learning and AI infused are going to run on Nvidia. And the more people I talk to, the more I'm convinced that that's the case, despite the fact that there'll be a lot of other winners, including Grok and Cerebras, et cetera.
[11:09]
Bill
And they're acquiring companies, they're moving up the stack, they're trying to do more optimization at higher levels. So they want to extend, obviously what Cuda's doing. Don't go to inference yet. That's a whole nother story.
[11:23]
Sunny
I'm actually on that bit about the deep integrations, right? Because really that's a playbook that I think Microsoft really had done well for a long time in enterprise software and you really haven't seen that in hardware ever. You know, if you go back to say Cisco or the PC era or you know, the cloud era, you didn't see that deep level integration. Now Microsoft pulled it off with Azure and when I heard him talking, all I could think about was, man, that was really smart. What he's done is he's gotten together, really understand what the use cases are and build an organization that deeply integrates into his customers and does it so well all the way up into his roadmap that he's much more deeply embedded than anyone else is. When I heard that part, I kind of gave him a real tip of the hat on that. But what did you, you know, Brad, what was your take on that?
[12:12]
Brad
You and I had this conversation after we first listened to it and you know, if you really tell Telescope out, you know, he talks as a systems level engineer, right? Even if you hear like people, you know, people went to Harvard Business School, say, how can this guy possibly have 60 direct reports, right? But how many direct reports does Elon have? Right, these systems level. And he said, I have situational awareness, right? I'm a prompt engineer to the best people in the world at these specific tasks. I think when I look at this, the thing that I deeply underappreciated a year and a half ago about this company was the systems level thinking, right? That these are. That he spent years thinking about how to embed this competitive advantage and how it really, it goes all the way from power all the way through application. And every day they're launching these new things to further embed themselves in the ecosystem. But I did hear from somebody over the last two days who, you know Rene Haas, the CEO of arm, right? Rene was also at our event and, and he's a huge Jensen fan. He. He worked eight years at Nvidia before becoming the CEO of ARM in 2013. And he said, listen, nobody is going to assault the Nvidia castle head on, right? Like the mainframe of AI, right, is entrenched and it's going to become a lot bigger, at least as far as the eye can see. He said, however, if you think about where we're interacting with AI today, right, on these devices, on edge devices, he's like, our installed base at ARM is 300 billion devices, and increasingly a lot more of this compute can run closer to the edge. If you think about an orthogonal competitor, right? Again, if he has a deep competitive moat in the cloud, what's the orthogonal competitor? The orthogonal competitor peels off a lot of the AI on the edge, and I think ARM's incredibly well positioned to do that. Clearly, Nvidia's got ARM embedded now in a lot of their, you know, in a lot of their Grace Blackwell, et cetera. But that to me would be one area. Like, if you looked out and you said, where can their competitive advantage, you know, be challenged a little bit? I don't think they necessarily have the same level advantage on the edge as they have in the cloud.
[14:35]
Bill
You started the POD by saying, everyone's heard this in the investment community. It's not a GPU company, it's a systems company. And I, in my brain, I think, had thought, oh, well, they've got four in a box instead of just one GPU or eight in a box. At the time I was listening to the podcast you did with Jensen, I was reading this neocloud Playbook and Anatomy post by Dylan Patel.
[14:59]
Brad
Yes, that was a good one.
[15:01]
Bill
He goes into extreme detail about the architecture of some of the larger systems. You know, like the one that X AI that we're going to talk about, that was just deployed, which I think is 100,000 nodes or something like that. And it literally changed my opinion of exactly what's going on in the world and actually answered a lot of questions. I had but it appears to me that Nvidia's competitive advantage is strongest where the size of the system is largest. Which is another way of saying what Rene said. It's flipping it on its head. It's not to say it's weak on the edge, but it's super powerful. When you put a whole bunch of them together, that's when the networking piece thrives. That's where NVLink thrives. That's where CUDA really comes alive in the biggest systems that are out there. And some of the questions that answer for me was one, why is demand so high at the high end? And why are nodes available on the Internet, you know, single nodes available on the Internet for at or below cost? And this starts to get at that because you can do things with the large systems that you just can't do with a single node. And so those two things can be simultaneously true. Why was Nvidia so interested in core Weave existing? Now? I understand, like, if the biggest systems are where the biggest competitive advantage is, you need as many of these big system companies as you can possibly have. And there may be, if that trajectory remains true, you could have an evolution where customer concentration increases for Nvidia over time rather than going the other way. Depending on how you know, if Sam's right, that they're going to spend 100 billion or whatever on a single model. There's only so many places that are going to be able to afford that. But a lot of stuff started to make sense to me that didn't before. And I clearly underestimated the scale of what it meant to be a non GPU company, to be a system company. This goes way, way up.
[17:12]
Brad
Yeah. And again, Bill, you touched on something that I think is really important here, and this is this question of whether their competitive MO is also as powerful in training as it is in inference. Right. Because I think that there's a lot of doubt as to whether their competitive mode is as strong as inference. But, you know, let's just flip to that. Well, no, but I asked him if it was as strong.
[17:40]
Bill
No, I didn't.
[17:41]
Brad
He actually said it was greater. Right. To me, when you think about that, in the first instance, I think it didn't make a lot of sense. But then when you really started thinking about it, he said there's a trail of infra behind the infrastructure that's already out there that is CUDA compatible and can be amortized for all this inference. And so he, like, for example, referenced that OpenAI had just decommissioned Volta, so it's like this massive installed base. And when they improve their algorithms, when they improve their frameworks, when they improve their CUDA libraries, it's all backward compatible. So Hopper gets better and Ampere gets better and Volta gets better. That combined with the fact that he said everything in the world today is becoming highly machine learned, right? Almost everything that we do, he said almost every single Application Word, Excel, PowerPoint, Photoshop, AutoCAD, like it all will run on these modern systems. Sonny, do you buy that? Do you buy that? When people go to replace compute, they're going to replace it on these modern systems.
[18:47]
Sunny
So when I was listening to it, I was buying it. But then he said one thing that kept resonating in my mind, which he said inference is going to be a billion times larger than training. And if you kind of double click into that, these old systems aren't going to be sufficient enough. Right? If you're going to have that much more demand, that much more workload, which I think we all agree, then how is it that these old systems which are being decommissioned from training are going to be sufficient? So I think that's where that argument just didn't hold strong enough for me. If that grows as fast as he says it is, as fast as you guys have seen it in their numbers, then it's going to be a lot more net new inference related deployments. And there I don't think that argument holds on the transfer from older hardware to newer hardware.
[19:38]
Brad
Well, you said something pretty casually there, right? Let's underscore this. We were talking about the Strawberry and the 01 preview and he said there's a whole new vector of scaling, intelligence, inference, time, reasoning, that's not going to be single shot, but it's going to be lots of agent to agent interactions, thinking time, as no one Brown likes to say, right? And he said as a consequence of that, inference is going to a hundred x thousand x, a million x, maybe even a billion X. And that in and of itself, right to me was, you know, kind of a wow moment. 40% of their revenues are already inference. And I said over time does your inference go become a higher percentage of your revenue mix? And he said of course. Right. But again, I think conventional wisdom is all around the size of clusters and the size of training and if, if models don't keep getting bigger, then their relevance will dissipate. But he's basically saying every single workload is going to benefit from acceleration, right? It's going to be an inference workload and the number of inference interactions is going to explode higher.
[20:47]
Sunny
Yeah, one, one technical detail, which is you need bigger clusters if you're training bigger models, but if you're running bigger models, you don't need bigger clusters. It can be distributed. Right. And so I think what we're going to see here is that the larger clusters will continue to get deployed, and as Bill said, they'll get deployed for folks, maybe a limited number of folks that need to deploy it for $100 billion runs or even bigger than that. But you'll see inference clusters be large, but not as large as the training clusters and be a lot more distributed because you don't need it to be all in the same place. And I think that's what'll be really interesting.
[21:24]
Bill
It was interesting. He simplified it even more than you did there, Brad. He said, think about a human. How much time do you spend learning versus doing? And he used that analogy as to why this was going to be so great. But I. In a little different way than Sunny, I thought the argument that the reason we're going to be great at inference is because there's so much of our old stuff laying around wasn't super solid. In other words, what if some other company, Sunny's or some other one, decided to optimize inference? It wasn't an argument for optimization. It was an argument for cost advantage because it might be fully distributed or whatever. And of course, if you had maybe poked him back on that, he might have had another answer about why for optimization. But there are clearly going to be people, whether it's other chips companies, some of these accelerator companies, they're going to be people working on inference optimization, which may include edge techniques. I think some of the accelerators may look like AI CDNs, if you will, and they're going to be buying stuff closer to the customer. So all tbd, but just the argument that you've got it left over didn't seem super solid to me.
[22:46]
Sunny
And the three fastest companies in inference right now are not Nvidia.
[22:51]
Brad
Right. So who are they, Sonny? We'll post the leaderboard.
[22:55]
Sunny
Yeah. It's a combination of Groq, Cerebras and Sambanova. Right. Those are three companies that are not Nvidia, that are on the leaderboards of all the models that they run.
[23:05]
Bill
You're talking about performance. Performance is performance.
[23:08]
Sunny
Yeah, yeah, yeah. And I would argue even price.
[23:11]
Bill
Yeah.
[23:12]
Brad
And make the argument why? Why are they faster? Why are they cheaper, in your mind? But yet, notwithstanding that fact, Nvidia is going to do, let's call it 50 or 60 billion of inference this year. And these companies are still just getting started. Right? Why is their inference business just because of installed base?
[23:33]
Sunny
Yeah, I think it's a combination of installed base and I think it's because that inference market is growing so incredibly fast. I think if you're making this decision even 18 months ago, it would be a really difficult decision to buy any of those three companies because your primary workload was training. And the first part of this pod, we talked about how they have such a strong tie in integration to getting training done properly. I think when it comes to inference, you can see all the non Nvidia folks can get the models up and running right away. There is no tie into CUDA that's required to go faster, that's required to get the models running. Right. Obviously none of the three companies run Cuda and so that moat doesn't exist around inference.
[24:13]
Bill
Yeah, CUDA's less relevant in inference. That's another point worth making. But I wanted to say one other thing to what Sonny just said. If you go back to the early Internet days and this just an argument that optimization takes a while. All of the startups were running on Oracle and Sun. Every single one of them were running on Oracle and Sun. And five years later, they were all running on Linux and MySQL like in five years. And it was literally, it went from 100% to 3%. And I'm not making the, I'm not making that projection that that's going to happen here. But you did have a wholesale shift as the industry, you know, they went from developing and building it for the first time to optimizing which are really two separate motions, it seems to me.
[25:04]
Brad
I pulled up this chart, right, that we shared. We made bill way earlier this year for the pod which showed the trillion dollars of new AI workloads expected over the next four to five years and the trillion dollars of effectively data center replacement. And I just wanted to get his updated, you know, kind of reaction or forecast now that he's had, you know, six more months to think about whether or not, you know, he thinks that's achievable. And what I heard him say was, yes, the data center replacement is going to look exactly, you know, like that. Of course, he's just making his best educated guess, but he seemed to suggest that the AI workloads could be even bigger. Right, like that once he saw Strawberry in 01, that he thought the, you know, the amount of compute that was going to be required to power this. And, you know, the more people I Talk to the more I, you know, I get that same sense. There is this insatiable demand. So maybe we just touch on this. You know, he goes on CNBC and he says the demand is insane. Right? And I kept trying to push on that. I was like, you know, yeah, but what about mtia? What about custom, you know, inference? What about all these other factors? What if models stop getting so big? I said, will any of that change the equation? And he consistently pushed back and said, you still don't understand the amount of demand in the world because all compute is changing.
[26:33]
Sunny
I thought he had one nuance, that answer, which was when you asked him that, he said, look, if you have to replace some amount of infrastructure, whatever the number was was really big and you're part of that and you're a CIO somewhere tasked with doing this, what are you going to do? What are you going to replace it with? It's accelerated compute. And then immediately, once you make that choice because you're not going to traditional computer, then Nvidia is your number one choice. So I thought he kind of tied that back together in that, like, are you really going to, you know, get yourself in trouble by having something else there or you just going to go to Nvidia? When he said it, I didn't want to say that, Bill, but it felt like the old IBM argument.
[27:14]
Bill
Yeah, look, I mean one thing, Brad, is this company's public. When a private company says, oh, the demand's insane, you know, I immediately get skeptical. This company's doing 30 billion a quarter, growing 122%. Like the demand is insane. Like we, we can see it, there's.
[27:34]
Brad
No doubt about it. And part of that demand was a conversation about Elon and X AI and what they did. And I thought it was also just incredibly fascinating. Right? I thought it was funny. I asked them a question about the dinner that he and Elon and, and Larry Ellison apparently had. And he's like, you know, just, just because that dinner occurred and they ended up with 100,000 H1 hundreds don't necessarily, you know, connect the dots. But listen, he confirmed that his mind was blown by Elon and he said he has an N of one superhuman that could possibly pull off that could energize a data center that could liquid cool a data center that could. And he said, what would take somebody else years to get permitted to get energized, to get liquid cooled, to get stood up that X AI did in 19 days, you know, and you could just tell the immense respect that he had for Elon, it's clear, you know, he said it's the single largest coherent supercomputer in the world today, that it's going to get bigger. And if you believe that the future of AI is tied closely together with the systems engineering on the hardware side, you know, what hit me in that moment was that's a huge, huge advantage for Elon.
[28:56]
Sunny
Yeah, I think he. I forgot the exact number, but like he talked about how many thousands of miles of cabling that were just in there as part of the task. Look, you know, coming to it from a bit, you know, doing a lot of that ourselves right now, building data centers, standing them up, racking and stacking, you know, our nodes. It's impressive. It's impressive to do something at that scale in 19 days. You know, it doesn't even include how quickly they built that data center. I think it's all happened, you know, within 2024. And so that's part of the advantage. The interesting thing there is he didn't touch on it as much as what when he talked about it doing the integration with cloud service providers. What I'd love to kind of double click into is because, you know, Elon is in a unique situation where he's obviously bought this cluster. He has a ton of respons respect for Nvidia, but he, you know, is building his own chip, building their own clusters with Tesla. So I wonder how much, you know, cross correlation or information there is for them, for them to be able to do that at scale. And you know, you guys look at this, what, what have you kind of seen on their clusters?
[30:05]
Brad
I don't really have a lot of data on the non Nvidia clusters that they have. I'm sure Frida on my team does. I just, I just don't have it off the top of my head. If we have it all, you know, I'll pull a chart and I'll show it.
[30:15]
Bill
Sonny, you said you now think the XAI cluster is the largest Nvidia cluster alive today.
[30:21]
Sunny
Well, I'm saying because I believe Jensen said it in the pot that he said it's the largest supercomputer in the world.
[30:27]
Bill
Yeah, I mean, I just want to spend 30 seconds on what you said, Brad, about Elon. I'm staring out my window at the Gigafactory in Austin that was also built in record time. Starlink's insane. When we were walking in Diablo, I just kept thinking, you know who I'd love to reimagine this place? Elon. Right. And I, I don't. The world should study how he can do infrastructure fast because if that could be cloned, it would be so valuable. Not really relevant to this podcast, but worth noting. The, the other thing that I thought about on the Elon thing and this, this also were these pieces coming together my mind about these large clusters and how important that was to Nvidia. He got allocation right. This is supposed to be like the hottest company, the hottest product backed, you know, backed up for years on demand. And he walks in and takes what equate sounds looks like about 10% of the quarter's availability. And in my mind I'm thinking that's because, hey, if there's another company that's going to develop these big ones, I'm going to let them to the front of the line. And that speaks to what's happening in Malaysia and the Middle East. And any one of these people, they're going to get excited. He's going to spend time with them, put them at the front of the line.
[31:52]
Brad
You know what, I'll tell you, I pushed him on this. I said, elon's going to rumor is he's going to get another 100,000, you know, H2 hundreds, add them to this cluster. I said, are we already at the phase of 2 and 300,000 cluster scale? And he said yes. And then I said and will we go to 500,000, a million? And he's like, yes. Now I think these things, Bill, are already being planned and built. And what he said is beyond that. Beyond that, he said, you start bumping up against the limitations of base power. Like can you find something that can be energized to power a single cluster? And he said we're going to have to develop distributed training. And he said, but just like with Megatron that we developed to allow to occur what is occurring today, we're working on the distributed stuff because we know we're going to have to decompose these clusters at some point in order to continue scaling them.
[32:54]
Bill
You may also be running up against the, even for the Mag 7, the size of Capo X deployment, where their CFOs start to talk at higher levels.
[33:05]
Brad
For sure, totally.
[33:07]
Bill
And there's a super interesting article in the information just now where it came out today, where Sam Altman is questioning whether Microsoft's willing to put up the money and build a cluster. And it may have been, that may have been kind of triggered by Elon's comments or Elon's willingness to do it at X AI.
[33:29]
Sunny
What I will Say on the size of the model is we're going to push into this really interesting realm where obviously we can have bigger and bigger training clusters that naturally imposes that the models are bigger and bigger. But what you can't do is you can't take a single. You can train a model across a distributed site and it may just take you a month longer because you have to move traffic around. And so instead of taking three months, it takes you four months. But you can't really run a model across a distributed site because that inference is like a real time thing. And so we do, you know, we're not pushing it there. But when you start to get to models that become way too big to run in single locations, that may be a problem that we want to be aware of and we want to keep in our minds as well.
[34:10]
Brad
On this question of scaling, you know, our way to intelligence. One of the things I asked Noah Brown, you know, today in our fireside chat, he made very clear his perspective, although he's working on inference time reasoning, which is a totally different vector and a breakthrough vector at OpenAI, which we ought to spend a little bit of time talking about. He said, now there are these two vectors that again are multiplicative in terms of the path to AGI. He's like, make no mistake about it, we're still seeing big advantages to scaling bigger models. Right? We have the data, we have the synthetic data. You know, we're going to build those bigger models and we have an economic engine that can fund it. Right. Don't forget this company, you know, has over 4 billion in revenue, scaling probably most people think to 10 billion plus in revenue over the course of the next year. They just raised six and a half billion. They got a $4 billion line of credit from Citigroup. So among the independent players, Bill, right, like Microsoft can choose whether or not they're going to fund it. But I don't think it's a question of whether or not they're going to have the funding. Funding at this point, they've achieved escape velocity. I think for a lot of the other independent players, there's a real question whether they have the economic model to continue to fund the activity. So they have to find a proxy because I don't think a lot of venture capitalists are going to write multibillion dollar checks into the players that haven't yet caught lightning in a bottle. That would be, that would, that would be my guess. I mean, you know, I just think it's hard. You know, listen, at the End of the day, we're economic animals, you know, and I've said before, you know, if you look at the forward multiple, Most of us underwrote two on OpenAI, it was about 15 times forward earnings, right? If Chat GPT wasn't doing what it was doing, if the revenue wasn't doing what it was doing, right, this would have been massively dilutive to the company. It would have been very hard to raise the money. I think of Mistral or all these other companies want to raise that money, I think it'd be very difficult. But, you know, you never, I mean, you know, there's still a lot of money out there, so it's possible.
[36:12]
Bill
But I think this is, you know, 15 times earnings. I think you meant revenue or 15.
[36:17]
Brad
Times revenue for sure, which I said, you know, when Google went public, it was about 13 or 14 times revenue and meta was like 13 or 14 times revenue. So I do think we're on the precipice of a lot of this consolidation among the new entrants. What I think is so interesting about X is, you know, when I was pushing him on this model consolidation, pushing Jensen on it, he was like, listen, with Elon, you have somebody with the ambition, with the capability, with the know how, with the money, right? With the brands, with the businesses. So I, I think a lot of times when we're talking about AI today, we oftentimes talk about OpenAI, but a lot of people quickly then go into all of the other model companies. I think X is often left out of the conversation. And one of the things that I, things I took away from this conversation with Jensen is again, if the, if scaling these data centers is a key competitive advantage to winning an AI, right? You absolutely cannot count out X AI in this battle. They're certainly going to have to figure out something with the consumer that's going to have a flywheel like ChatGPT or something with the enterprise. But in terms of standing it up, building the model, having the compute, I think they're going to be one of the three or four in the game.
[37:35]
Bill
You touched on maybe wanting to close out on the, on the Strawberry like models. You know, one, one thing we don't have exposure to, but we can guess at is cost. And that chart that they showed when they released Strawberry, the X axis was logarithmic. So the cost of a search with, with the, with the new preview model is probably costing them 20x or 30x what it does to do a normal chat GPT search.
[38:09]
Brad
And so which I think is fractions of a penny, but figuring out which.
[38:14]
Bill
And it also takes longer, so figuring out which problems, it's acceptable. And Jensen gave a few examples. For it to take more time and cost more and to get the cost benefit right for that type of result is something we're gonna have to figure out like which problems tilt to that place.
[38:32]
Brad
Right. And you know, the one thing I feel good about there and again, I'm speculating, I don't have information from OpenAI on this, but what we know is that the cost of inference has fallen by 90% over the course of last year. What we, you know, what Sunny has told us and other people, you know, in the field have told us that inference is going to drop by another 90% over the course of the next, you know, period of months.
[38:56]
Bill
If you're, if you're racing logarithmic needs, that's, you're going to need that to happen.
[39:00]
Brad
Right. And I, you know, and, and, and here's what I also think happens, Bill, is in this chain of reasoning, you're going to build intelligence into the chain of reasoning, right. So that, you know, you're going to optimize where you send these, you know, each of these inference interactions, you're going to batch them, you're going to take more time with because it's just a time money trade off. Right. At the end of the day, I also think that we're in the very earliest innings as to how we're going to think about pricing these models, right? So if we think about this in terms of systems one, Systems two level thinking, right? Systems one being, you know, what's the capital of France, right. You're going to be able to do that for fractions of a penny using pretty simple models on ChatGPT, right. When you want to do something more complex, if you're a scientist and you Want to use O1 as your research partner, right. You may end up paying it by the hour and relative to the cost of an actual research partner, it may be really cheap. Right. So I think there could be consumption models, you know, for this. Like, I think we haven't even scratched the surface to think about how that's going to be priced. But I totally agree with you that it's going to be priced very differently. Again, I think this puts, I think OpenAI has suggested, you know, that the O1 full model may even be released yet this year. Right. One of the things that I'm kind of waiting to see is, I think, you know, listen, having known Noah Brown for quite A while now he's an n of 1, right. And he wasn't the only one working on this for sure at OpenAI. But you know, listen, whether it's Pluribus or winning at the game of diplomas, he's been thinking about this for a decade, right? It was his major breakthrough on how to win the game of six handed poker. And so he brought this to OpenAI. I think they have a real lead here. Which leads me back to this question Bill, you and I talk about all the time, which is memory and actions, right? And so I have to tell you this funny thing that occurred at our investor day. So I had Nikesh on stage and you know, obviously Nikesh was instrumental at Google for a decade. And so I wanted to talk to him about both consumer AI as well as enterprise AI. And I asked him, I said, I want to make a wager with you. I knew of course he would take a bet. And I said, I want to make a wager with you over under. I'll set the line at two years until we have an agent that has memory and can take action. And the canonical use case, of course that I used was that I could tell my agent book me the Mercer Hotel next Tuesday in New York at the lowest price. And I said, over under two years on getting that done. I said, I'll start 5,000 bucks. I'll take the under. He snap calls me, he says, I'll take the over. And he said, but only if you 10x the bet. And of course we're doing it for, we're doing it for a good cause. So I had to call him because I, you know, I can't not step up to a good cause. So we're taking the opposite sides of that trade. Now what was interesting is over the course of the next couple days, I asked some other friends who took the stage, you know where they would come down on, on the same bet, right? Our friend Stanley Tang took the under. A friend from Apple who will remain nameless, kind of took the over. And then no, Brown, who was there, pleaded the fifth. He says, I know the answer, so I can't say. And so, yeah, I was kind of provocative and I, you know, I texted Nikesh and I said, I think you better get your checkbook ready. So coming back to that Bill Strawberry01 is an incredible breakthrough. Something that thinks this whole new vector of intelligence, but it kind of makes us forget about the thing you and I focus so much on, which was memory and actions. And I think that we are on the Real precipice of not only these models think can spend more time thinking. Not only can they give us less hallucinations, you know, and just scaled compute. But I also think, I mean, you already see the makings of this. I mean, use these things today, they already remember quite a bit. So I think they're sliding this into the experience. But I think we're going to have the ability to take simple actions. And I think this metaphor that people had in their minds, that they were going to have to build deep APIs and deep integrations to everybody, I don't think is the way this is going to play out. And let me just.
[43:41]
Bill
What do you think's going to play out?
[43:43]
Brad
Well, I mean, the Easter egg that I thought got dropped last week is they did this event on, you know, their voice API. Right. And it's literally your GPT calling a human on the telephone and placing an order. So why the hell can't my GPT just call up the Mercer Hotel and say, Brad Gerstner would like to make a reservation. Here's his credit card number and pass along the information.
[44:05]
Bill
There is a reason for that. I mean, look, scrapers and form fillers have existed for how long, Sonny? Fifteen years. You could write an agent to go fill out and book it to Mercer Hotel 15 years ago. There's nothing impossible about that. It's the corner cases. And like the hallucination, when your credit card gets charged 10 grand, like you, you. You just can't have failure. And how you architect this so that there's not failure and there's trust. I'm sure you could demo this tomorrow. I have zero doubt you could demo it tomorrow. Could you provide it at scale in a trustworthy way where people are allocating their credit cards to it? That might take a little longer.
[44:49]
Brad
Okay, so over under Bill on two years. I mean, I'm going to get you. I'm going to get to action either way.
[44:55]
Bill
But what's the test? The demo. I think you do it today.
[44:57]
Brad
No, not this. Not the cheesy demo you just said. I'm talking about a release that allows me, you know, at scale, to book a month.
[45:05]
Bill
Where's spending your credit card? And not just you, but everybody. Full release.
[45:09]
Brad
Yeah, we'll call it a full release just because I know that's the only way I can tys you to take the bet.
[45:16]
Bill
Which today is October 8, 2024.
[45:20]
Brad
I mean, Sonny, you already know what he's going to say. You'll take the over, right, Bill?
[45:25]
Bill
Yeah. Yes.
[45:26]
Brad
Okay, so Bill's in the cash camp, Sonny. Where do you come down over under on two years? No, don't start hedging, Bill. Don't start edging.
[45:32]
Sunny
Go ahead.
[45:33]
Bill
I already said it. Demo today. It's 15 years ago.
[45:37]
Sunny
Let me comment on what you're worried about, Bill. And I think people still are still working their way through it. You don't need a single agent right now to book the Mercer and deal with all the scraping stuff you're talking about. You can have a thousand agents working together. You can have one that's making sure that the credit card charge is not too big. You can have another one to make sure that the address is right. You can have another one checking against your calendar. And so all of that's free. So I'm on the under. And Brad, I'll even go under one year.
[46:03]
Brad
Wow. Yeah, wow. Yeah, wow. So we got a little side action, you and I, Sonny. I'm not gonna go under a year, but I think it could. We could have limited releases in a year. But, Sonny, you and I now have action with Bill. What do you want, Bill? A thousand bucks?
[46:19]
Bill
Sure.
[46:19]
Brad
To a good cause. Okay. Thousand bucks each. To a good cause. And I'll just assume, Sonny, that we'll get action from Nikesh as well. And, you know, our friend Stanley Tang is definitely in the tank for some, so we're gonna. We're gonna give some good money to a good cause. And listen, I think this is the trillion dollar question. I know we're all focused on, you know, scaling models, and I know we're all focused on the compute layer, but what really transforms people's lives? What really disrupts 10 blue links, right? What really disrupts the entire architecture of the app ecosystem is that when we have an intelligent assistant that we can interact with, that gets smarter over time, that has memory and could take actions. And when I see the combination of advanced voice mode, Voice to voice API, Strawberry 01 thinking, combined with scaling intelligence, I just think this is going to go a lot faster than most of us think. Now, listen, they may pull on the reins, right? They may slow down the release schedule in order. You know, for a lot of business reasons, that's harder to predict. But I think the technology, I mean, even Noem said I thought it was going to take us much, much longer to see the results that we have seen. Can I hit on one other thing? This is, you know, we started the pod a little bit talking about it. I just want to get your impression, Bill. This idea that Jensen can scale the business Two or three times with, you know, increasing the headcount by, you know, 20 or 25%. Right. We know that Meta's done that over the course of the last two years. And you and I have talked about, are we on the eve of just massive productivity boom and massive margin expansion like we've never seen before? Right. Nikesh said we ought to be able to get 20 or 30%, you know, productivity gains out of everybody in the business.
[48:08]
Bill
First of all, I think Nvidia is a very special company, and it's a company that's. Even if it's a systems company, it's an IP company. And the demand is growing at such a rate that they don't need more designers or more developer engineers to create incremental revenue. That's happening on its own. And so their operating margins are at record levels for the majority of companies. I've always just held this belief that, you know, you evolve with your tools. And the real, the real answer is the companies that don't deploy these things are going to go out of business.
[48:47]
Brad
Yeah.
[48:48]
Bill
And so I think margins get competed away in many, many cases. I think it's ridiculous to imagine, oh, every company goes to 60% operating.
[48:58]
Brad
No, no, no, no. I mean, listen, Delta Airlines is going to do all of these things with AI, and immediately, because it's in a commodity market, it'll get competed away by Southwestern United. Bad industries remain bad industries.
[49:12]
Bill
Yeah, yeah, yeah. So, so, but, but there might be some, you know, that figure it out. And, and I have another theory that, that I always keep in mind, which is hyper growth tends to delay what you learned in microeconomics class. You know, I, I remember when I was a PC analyst and there were five public PC companies all growing 100%. And so in moments of hypergrowth, you will have margins that may or may not be durable, and you'll have a number of participants in a market that may or may not be durable during periods of hypergrowth.
[49:49]
Brad
I have two more things on my mind Sunday. Do you have any reactions to that? I mean, I just have to get to a couple of these topics.
[49:55]
Bill
No, this is going to be a Lex Freeman linked podcast once you.
[50:03]
Sunny
No, look, I really been thinking a lot about Jensen's point in the pod, about how much AI they're using internally for design, design verification for all those pieces. Right. And I think it's not 30%. I actually think sort of that's an underestimate. I think you're talking multiple hundreds of percent improvement in productivity gains. And the only issue is that not every company can grasp that that quickly. And so I think he was kind of holding some cards back at that point when he made that comment. And it really got me thinking about how much are they doing there that they don't want everybody to know about. And you kind of see it now in the model development because if you've noticed the last couple of weeks, they put some models out there that are models trained on their own and they don't get as much noise as ones from Meta and the other players that are out there, but they're really doing a lot more than we think. And they, they, I think they have their arms around a lot of these very, very difficult problems.
[51:07]
Bill
Brad, why did they put their own model out?
[51:10]
Brad
Well, it's related to, to this topic of open versus closed. So Bill, you know, I hope you're proud of me. You know, I went back and I said, I have to, I have to ask this question, right? And you know, I thought Jensen, you know, I thought he gave a great answer, which is like, listen, we're going to have companies that for economic reasons, right, push the boundary toward AGI or whatever they're doing. And it makes sense to have a closed model that can be the best and they can monetize, but the world's not going to develop with just closed models. We're going to, you know, he's like, it's both open and closed. And you know, he said, because open, he's like, it's absolutely a condition required. It's going to be the vast majority of the models in the industry. He's right. Now, if we didn't have open source, how would you have all these different fields in science, you know, be able to be activated on AI? He talked about llama models exploding higher. And then with respect to his own open source model, which I thought was really interesting, he said we focused on, right, something that a specific capability and the capability that we were focused on is how to agentically use this model to make your model smarter, faster, right? So it's almost like a training, coaching model that he built. And so I think for them it makes perfect sense why that may they may, you know, put that out into the world. But I also, you know, a lot of times the open versus closed debate, you know, gets hijacked into this conversation about safety and security. And you know, and I think he said, you know, listen, these two things are related, but they're not the same thing. You know, one of the things he commented on that is just, he said there's so much coordination going on on the safety and security level. Like we have so, so many agents and so much activity going on on making sure, you know, just look at what Meta's doing, you know, on this. He's like, I think that's one thing that's under celebrated that even in the absence of any, you know, platonic guardian sort of regulation, right, without any top down, you already have an extraordinary amount of effort going in by all of these companies into AI safety and security. That I thought was, I thought was a really important comment. Thanks for jumping in, guys, kicking this one around. It was a special one to.
[53:27]
Bill
Congrats on having that opportunity. That's pretty, that's pretty unique.
[53:31]
Brad
And, and now we got a little wager, so, I mean, listen, I am so looking forward to like doing a live booking at the Mercer on the pod, right? And then, Sonny, we can just drop the money from the sky. We can just collect. We can just collect. Exactly. Exactly. Good to see you guys. We'll talk soon.
[53:51]
Bill
All right, peace. Take care.
[54:02]
Brad
As a reminder to everybody, just our opinions, not investment advice.