Last week Jensen Huang shared the numbers from NVIDIA’s order book: AI compute demand has grown a millionfold in two years. Much GCT coverage focussed on chips, robots, data centers in space, but I think Jensen revealed something far more important in his keynote: “the inference inflection has arrived,” and this is about to transform how all companies should manage their budgets. The inference era is already the operating assumption of the world’s most valuable company.
Loading summary
A
Today I want to talk about what Jensen Huang, the boss of Nvidia, said at Nvidia's GTC Jamboree, where the firm lays out its view of where we are and where we are going. Now, I want to focus on one particular dimension, and it may not have been obvious if you had watched any of the videos from GTC or read any of the reporting that came out of it. It's a dimension that will accelerate all the things that we are being promised to about the future. And that dimension is AI inference. Inference is when an AI model responds to us, ideally with what we want it to respond to, whether it's a recommendation for a lawnmower, summarizing your boss's emails so you don't have to read every interminable word, or whether it's getting an image for a presentation you have to give. All of that arises because of inference. I've been peeling back the layers of what Jensen Huang said to figure out what it's really telling us about where AI is today and where it's heading. So do stick with me because I think the shape of the future is really starting to become visible. Nvidia is the world's largest, most valuable company. It's a $4 trillion behemoth that value changes with the winds and the waves and the ends and the oars of the market. It is the leader in providing AI accelerator chips, the GPUs, upon which virtually all of our AI interactions reside.
B
Now, like me, Jensen has seen something in openclaw, that piece of open source software that lets you run an AI agent that's as close to the AI agents of 1980s science fiction as you can imagine. Now, a couple of weeks ago, I said that openclaw is the most exciting piece of technology that I have seen since the web Browser back in 1992. Now, I really mean that. I remember really clearly using it back in that summer of 92. I had to go to a computer room that was in the basement. It was VDU display, so green screen. There were about a dozen, maybe 20 websites at the time. And they're mostly university, physics department, department. And if you were using the Internet, the action was elsewhere. It wasn't on the web, it wasn't through the web browser back then. But it was really clear that the browser was offering a new way of interacting with the Internet. And just in that one year, from 92 to 93, the number of websites in the world expanded by a factor of 50 or 100. And just keep that in mind, like 100x back in 1992-1993. And in a sense my whole career has been defined by the that experience by falling into the Internet a year earlier and then making sense of the web browser. So when Jensen brought up that parallel, the most important thing since the web browser, of course it really spoke to me. He's had the same epiphany that I have, or perhaps I've had the same one that he has had. But he went on to say that every company now needs an open claw strategy. And I think that that is the key that unlocks how to make sense of a lot of what was said by Nvidia earlier this week. Now, look, it's completely wild because before January 30th, nobody had heard of OpenClaw. I mean, it wasn't even called OpenClaw. But even on January 20th, 10 days before it got renamed, the project on GitHub, which is where open source projects live, had about 5,000 stars. Now that's not a nothing burger. It's impressive, but it's not radical, it's not top of the leaderboard, which is where openclaw now sits. And it is wild, it is crazy, it is unprecedented. I am running out of words that by March 16, 45 days later, the CEO of the world's most valuable firm, the firm powering roughly 90% of all AI compute, said every company needs an open claw strategy. Now, of course, he is largely right and I'll talk to you about our open claw strategy a little bit later, so hang on in. But to give you some sense of that, most of the team in Exponential View now has an open claw agent. I've talked a lot about my AI chief of staff, the agent R Mini Arnold. Now, these Open claw agents, they need somewhere to work and they work on new Mac hardware that we've bought for them. We've actually increased the compute the company owns by about 50% in a month. And if you look at the amount of RAM that Exponential View has on its balance sheets, the amount of memory, roughly the AI agents have the same amount of RAM dedicated to them as the the humans working in the company. We've built shared repositories for skills and tools for the agents. And we've got dedicated services that they can call, for example, if they need to run a simulation to figure out whether an argument is strong enough. The investment that we're making, it's making real something that we've argued for a long time that the demand for intelligence is essentially infinite. Any company, any person Trying to solve problems doesn't really get to a point where you have too much intelligence in the room. I mean, in our day to day quotidian, if I'm brushing my teeth, maybe I don't need an enormous amount. But certainly in my work in trying to solve knotty problems around my life, you really do want to sometimes be able to leverage lots of intelligence. Now, if the demand for intelligence is essentially infinite, that means the demand for compute, which is what produces manufactured intelligence, that thing that we call artificial intelligence, well, that's also effectively infinite. And we're experiencing this within exponential view that the more we use, the more we need to use, because it opens new horizons, new vistas, new avenues and new opportunities. And no one is experiencing this more, no firm is experiencing this more than Nvidia. Last year the CFO said we've got about $500 billion of committed orders. This is effectively promised revenue. It's orders that investors and the markets can see. And it's a really good measure of the health of a company like Nvidia. Right. Semiconductors are highly cyclical businesses. They're prone to booms and they're prone to busts. And being able to see an order book out like that is a real sign of health. Well, this year we learned at GTC that scale of the committed orders was now a trillion dollars for Blackwells, for the new Vera Rubin products, just out through to 2027. And the thing to understand is, yes, Nvidia maintains a dominant market share, but it's not 100% of the market. There's also competition and supply from Google's TPUs, from Amazon's Trainium, from AMD and from others in the wings. But there is a trillion dollars of orders at least, and certainly more than that. Every dollar going on chips to build manufactured intelligence. And those chips are more power efficient and they're more powerful, delivering more for each dollar than the dollars spent last year and the dollars spent the year before that. So hold that thought for a moment. The reason that's happening is because the shape of AI is changing. The way AI is being used is shifting. When ChatGPT burst on the scene, the bulk of usage of COMPUTE for AI workloads was in what's known as training, training, building these large language models that we would then go off and use in the inference phase. And these large language models really did require large amounts of compute. Astronomical is not an exaggeration. 10 to the 23, 10 to the 24, floating point operations or more to train and data centers were being consumed by the training rather than the inference stage of the process, which is where of course, people like you and I actually end up using these models. But that is changing. Now, to understand that, let's talk about how manufactured intelligence makes its way out there. The atomic unit of manufactured intelligence of AI is the token. Now, there are lots of limitations in the framing I'm about to give you, but this is a really helpful one to take away. Think of it this way. The token is effectively the unit of AI output. It's a little bit of something that you can do something useful with. A token is roughly three quarters of a word. So if you get a 500 word email summarized into 100 words, that back and forth is probably a couple of thousand tokens. And as we get more advanced, we move from back and forth models, we move from summarizing emails to doing much more sophisticated things like Please analyze these 10 vendor proposals and find what's common, what's different and why I should prefer one over another. I mean that's going to be tens of thousands of tokens. When you move to the reasoning models, usage increases really, really significantly. And when you move to agentic systems, it moves even further. Now I discussed this a couple of weeks ago in an essay called Magnitudes of Intelligence. So you can go back to the archive on exponential view and find that essay. I made the point there that that our mini Arnold, which is my OpenClaw agent, the AI chief of staff, had just consumed 100 million tokens in a single day. And that was my average running. And when I look back to the summer of 2024, so less than two years ago I was at about maybe 100 to 150,000 tokens a day. So that's a three order of magnitude increase in less than two years. So the shift to reasoning models, that so called test time compute, which started with OpenAI's 01, was really the start of the increase the amount of tokens that we as end users would use. It changed the compute workload. What Jensen said in the last week is that as people move to reasoning models and using reasoning much much more, they saw a 10,000 fold in the increase in compute demand from each user, but at the same time usage increased 100 times. So that was a million fold expansion in compute demand increase in just two years. And again, hold that idea in your head, that's a million X in two years. What other market could go out and serve that? So question to think about is what does the next two years look like? Does it look faster or does it look slower? The trillion dollar backlog gives us the answer, tells us what Nvidia's customers actually think. They think at the very minimum, it's going to be the same, quite likely, more than that. So in this changing market, as we move from a world where the compute is dominated by training and move to a world where there's a lot of inference, things do change. And at the end of 2025, Nvidia acquired a company called Groq Groq. It was founded by Jonathan Ross, who was the original designer of Google's Tensor Processing units, which is Google specialist chips for serving AI workloads. He'd done that roughly a decade ago, maybe a bit longer. And it was a big, slightly weird acquisition of about $20 billion. People moved and IP was licensed. The company wasn't formally acquired. It's kind of complicated, but that GROK acquisition really, really pointed to the changing shape of the AI market. Up until that point, Nvidia had survived on a single, albeit evolving architecture, that is the gpu, the graphics processing unit. It was its heritage coming out of video games. And GPUs are great at many things, but it had becoming clear over the last year or so that they might not be fantastic for the changing shape of AI use as we move towards inference. So now look, this is a technical bit of my discussion. So when you think about what happens in inference, I think it's worth just unpicking this because it'll explain what's going on. There are a couple of phases. The first phase is called pre fill. That's when you send a prompt to a model. Whether it's a question or a document to summarize, or some complex instruction, the model reads and processes your input tokens simultaneously in parallel. This is enormously compute intensive and it is where GPUs shine. They were built for graphics, throwing huge matrices of pixels at thousands of cores at once, doing it all in parallel. And that is the shape of the pre filled problem. So when you're feeding context in, GPUs are doing what they were made to do. But the second phase of inference is called decode, and that is the generation of the responses. And you know exactly what that is like because you have used these tools word after word after word sequentially at a time. Like a slow teletype from the old days. The model is producing tokens one at a time, each depending on the previous one. This can't be parallelized, it's structurally sequential. The bottleneck here now is no longer raw compute, it's memory bandwidth. How fast can you stream the model's weights from memory for each individual token step? So now you have lots of GPU cores sitting largely idle, waiting on memory reads just to produce one token at a time. So GPUs are not fantastic here. They weren't designed for this. And as we move from the training era to to the inference era, well, the workflows are shifting from building models to running them constantly at scale. For billions of users and lots of agents, that efficiency becomes a serious problem. That's why you acquire grok. Do whatever deal you did with Grok. It's a fast move, a high conviction move by an incumbent to ensure it can serve that changing market. A quick note, if you want to support us in bringing more of these conversations to the world, please consider subscribing to the show. So Nvidia is going to make some new chips or systems with Groq technology embedded later this year. The point is that that combined architecture uses Nvidia's homegrown Vera, Rubin GPUs and Groq processing units, and it will result in a 35 fold improvement in the throughput per megawatt of power versus Nvidia's current Blackwells, which are the posh chips of the moment. This isn't the first time Nvidia has done something like this. They acquired Mellanox, it was a new networking company and it turned into a real advantage for Nvidia. And it's important to understand that Nvidia is more than just a chip company, it's actually a systems company. And it sells systems, not just chips, platforms, not just processors. But the inference market is really different to the training market, and that's really different to the graphics market that Nvidia came from. And I think it's really impressive to see the firm figure this out. Not just figure it out, frankly, actually deliver on it at the scale at which they have to deliver. We'd figured out, we'd thought, like Nvidia, like lots of other people, that inference was going to be an important part of the market, that workload volumes were going to shift and ratios were going to shift. And I've personally invested in a couple of startups that build chips to tackle this inference opportunity through much higher throughput or much lower energy usage. It was clearly going to be important. And I think that what we've seen with Nvidia is almost to be a case study of how a large firm can move really quickly with conviction, which is not Something firms can normally do at this point. Let's go back to what we're doing with all that inference. What am I doing with it? What are you doing with it? What are your colleagues doing with it? And this takes us back to Jensen's observation that every company needs an open claw strategy. So I have to go back a layer still. What's driving all this growth? Well, it's really happened in the last few months where Agentic systems, I think CLAUDE code, the coding tool that Anthropic has built, was a good example of this, started to work well, which means that you could effectively leave them to their own devices. And I think that that shift and openclaw also symbolizes it helps us understand the distinction between breakthrough research and development and the things you need to diffuse technologies into the market. The debate that's gone on about the AI investment wave and AI in general, I think has confused this. It's often all about whether the biggest frontier models are hitting a wall. You know, the boosters and actually also the more sober minded, calm people in the foundation model lab say they're not close to a wall. And the evidence does suggest they're probably right there. The doomers or the skeptics say there's obviously a wall and this thing can't go on or it's already stopped. But both of those miss the point because what we're doing there is just looking at a performance curve, the innovation curve. And you're arguing, especially if you're a doomer, that if the innovation curve can't be guaranteed to continue, the whole of this shebang is just a house of cards. It's a glass house in which you shouldn't throw stones. But what it does is it misses the importance of the systems that you build around a technology to make it useful. And these are the things that drive diffusion. They're the things that make technologies helpful. That real inflection point I think happened at the tail end of 2025, which was the maturity of CLAUDE code. It's almost independent of the model that sits below it. It's a harness that helps you get work done that harnesses traditional code. It's also the rules around it, sometimes traditional programming, sometimes specifications. And it's what helps that AI system do so well in the particular cont of coding or as we've discovered at Exponential View, lots of other things. Maria, for example, uses Claude code when she's editing essays, which is a kind of a novel use for me. But that is the second curve, right? The first curve is A performance improvement curve. The second curve is the diffusion curve. The harnesses, the things that sit around a piece of technology that make it more applicable and make it more useful. Well, I'm going to treat you to a classic Azim analogy here, and I want you to think about the internal combustion engine. Now, if you're a smart engineer, a mechanical engineer, and you get an internal combustion engine, it's useful to you. You can do something with it. You can put some diesel into it or some gas, and maybe you can connect it up to a generator and produce some electricity, or you could figure out how to connect it to a rotating blade and build a hedge trimmer. It's brilliant and well done, and I'm so glad you did mechanical engineering. But the rest of us can't do that. What really gets internal combustion engines to take off is when you put them in a harness, a harness that makes that technology a useful product. Now, historically, we've called the harness for an internal combustion engine an automobile or a car. When you put a combustion engine in a car, in a harness, it becomes useful. Let's call it open car. And now people will buy lots of them and they'll use them a lot. This is the full product solution. And you've got to remember that research companies don't start with the full product. Doing the research is hard enough. They start with the technologies. And those technologies are not that easy to use. And ultimately they or business partners figure out how to productize them. And that harness is the productization and it's the harness that makes these AI systems so incredibly useful. But you could look at an agentic harness, and I've seen this on X, when people have looked at openclaw or the zillion of clones like Picoclaw and zeroclaw that are out there and they say, well, there's just not much in there. It's just a way of scheduling your AI system to run different prompts at different times and to look at the right files on your file system. It's a scheduler, It's a set of consistent rules. I mean, that is a lot of what something like openclaw does. But equally, you could look at a car and say, well, so what? It's kind of an increasingly cheap metal frame and there are some cheap seats in it, and there's a dodgy hi Fi sitting there. That's not very clever. But it's the harness that makes us use it, and it's a harness that makes us use them more and more and more. Would you Rather have a race tuned Ferrari engine sitting on your lawn at home or a 10 year old petrol engine, gas engine actually in a car that you could go off and drive and do something useful with. And it's a harness that makes these things valuable. So I'll talk to you a little bit about how the open claw harness has completely transformed my own personal experience. So on this show a few weeks ago in one of the essays I talked about how our Mini Arnold, which is my AI chief of course, of staff, it is an open claw agent, has been helping me with more and more of the things that I want to get done. And I talked about how it had got to 100 million tokens per day and that was really the average through the earlier part of March. And to give you a sense what a hundred million tokens is, you know, if you use ChatGPT a few times a week and you do a bit of document summarization, you're probably at the tens of thousands of tokens per day. If you are doing some software development, you might be a couple of million a day if you're doing this regularly. So 100 million is sort of at a different level. But since then, and I think it's been two and a half weeks since I told you about 100 million tokens a day, I bumped up a little bit my record day working with Armini Arnold, 870 million tokens consumed and applied to business problems that I'm dealing with. That's close enough to a billion tokens per day. I mean there was lots of other workloads being done on other systems where we don't get the token counts. So maybe it was a billion tokens. You know, it feels like something, it's a billion a day and it was 100 million two or three weeks ago. My average days are now well above 200 million tokens per day, seven days a week. It's because if you have an internal combustion engine sitting on your lawn, you don't use it that much. When you stick it in a car with some seats and a windscreen and a terrible hi fi, you use it quite a lot and you use it because it's useful. People always ask me what am I doing with my 850 million tokens. I mean, how much value are you getting with that? Well, look, let's acknowledge, right? I experiment with these things. It's my job to experiment with these things. It's my job to make sense of them. But also I'm kind of rational about my choices. So I wouldn't be using them this heavily if I wasn't finding them useful, if I wasn't getting substantially more value from them than the cost of actually running them. And, you know, that cost might be my time or it might be real sort of commercial dollars and cents. I talked a little bit about this in a previous discussion where I explain how I work with these AI systems to improve my thinking. And this is true. This is where part of that 850 million tokens go. So there are some really obvious things that we do. Of course, there is loading right? There is building new bits of software. You'll see some of this if you looked at our solar super cycle model, which I recommend you go and take a look at@Solar ExponentialView Co that was built by my colleague Hannah. That's the kind of thing that I'm also building. But there's infrastructural things that we are building. The agents are helping with that. They're also helping with writing the documentation and going through and clearing technical debt and doing security audits and bug fixing. Our Mini Arnold, as of this week, now has four open claw agents that it can hand work to. It's got R. Veblen, it's got R. Simmons, it's got R. Bradley, it's got R. Gulbenkian. Now, I'm going to regret some of those names soon, but there is some logic to it. So R. Wablen is of course named after the Norwegian economist Thorsten Veblen. He was an outsider who never quite fit in either the academy or in business. Most of his work was really about technology as the primary driving force of social change. He was an interdisciplinarian. I recognise Veblen in my work and R. Veblen is the agent that is helping me with my book research. R. Simmons is named after one of the world's greatest investors, Jim Simmons. So R. Simmons is the agent that helps me, will manage my portfolio in different ways to help my writing and analysis. For Exponential View, I turned to R. Bradley. So Benjamin Bradley was the editor of the Washington Post, and he's one of the greatest newspaper editors ever. He backed people, he valued friendship and he remained skeptical, not ideological. And I hope that R. Bradley will help me do a better job. In Exponential View, R. Gulbenkian is perhaps the most controversial name. So Kalust Gulbenkian was a business architect who at the turn of the 20th century, catalyzed the formation of the oil economy. Now, I have a few misgivings about Gulbenkian, but His role was essential to shaping the 20th century. And our Gulbenkian's focus is on the data and the frameworks that help make sense of the burgeoning AI economy. But equally, I've started to do some really large scale simulations to figure out difficult questions. So we can simulate any kind of audience of readers or of potential customers or of clients who I might be talking to. And we can test ideas before we get there. Doing it in silico in much the same way as the aerospace industry has used CFD computational fluid dynamics to test the shape of wings and planes and other surfaces before they actually build them. A large scale simulation for US can take 150, 200, 250 million tokens. And there is some cost to something like that. You know, 250 million tokens isn't free, but it's about 10 to 50 bucks. And it's not nothing. But it's certainly 100 times cheaper than doing it in the real world and perhaps priceless if we get to a better outcome for an argument we're making for our community. But in doing this, I've also started to notice a shift in the way that I use the AI models. And it's something that we have pointed to and argued as sort of theoretically being the case. But now I'm seeing it in our own behavior a couple of years later, which is that now that I have an AI chief of staff in Armini Arnold, I'm also cognizant of using the right model for the right task. I mean, we never thought that a single monolithic model would win the AI race. And my core agent, Armini Arnold, uses the anthropic models of Sonnet and Opus and for some really lightweight things that uses Haiku. But it also has access to other systems so it can pull on Manus. So Manus was that Chinese Singaporean company that Meta acquired. It was pretty good at certain types of research. It could do quite long workflows, actually. It's also quite good at making PowerPoint presentations. I thought part of the point of AI was we wouldn't have to deal with PowerPoints. But what have I done? And of course, Armini Arnold also uses OpenAI models through Codex, which is its code development product. But I found that Qin 3.5 as a model, or the new Huawei model, which I forget its name, it was called Hunter Alpha, are really efficient and cost effective when I'm doing really large scale simulations. So I tend to prefer them for that work. And so one of the things I've learned is that as you start to use more AI, you start to use more of a portfolio of models that you want to access. And our Mini Arnold really helps there. It maintains a model registry, so every couple of hours it goes off and checks that there aren't new models that we need to update and upgrade to and it automatically upgrades our model registry. And in that configuration file, if you're using an AI tool in the team, you should be checking that configuration file. And if we've gone from GPT 5.2 to 5.4, it should seamlessly switch you over. That kind of automatic evaluation is sort of churning through not that many tokens, but consistently churning through some usage. But it means that we're never going to be more than a couple of hours late if there is a radical shift in model capabilities. And I think that you can see that in some of the things that happened at Nvidia's gtc because Jensen talked a lot about other models and Nemotron and open source models they're working with. So what happens is when you're using AI agents, usage increases. And of course it's in Jensen's interest to say that everybody should use agents, that every company needs an open clause strategy because he is selling the thing that produces the tokens. And these things are Hungry Hungry Hippos and they love eating tokens. So it's the same way that it's in the interests of an airline to tell you about beautiful destinations that they can fly you too. But I think he's right. I mean, I think that it is really important that you start to think about what kind of open clause strategy or agentic strategy or token application strategy you have in the firm. So my inference loads have gone up hugely using these agentic systems. And we're just starting to touch on the question of what happens when agents themselves trigger inference workloads. Because right now what happens is that I am triggering them. I'm either saying kick something off if these conditions happen or do this at this point in time or do this now. And right now my sub agents, those open clawed agents that report into our Mini Arnold, they have access to a wide range of tools when I task them to do something. But they don't have access to the really token expensive complex tools, for example, the large scale simulations, those 200 million token jobs. So when they want to call on that particular skill, they have to issue a request to Armini Arnold that gets relayed to me and then I will approve the request. I haven't yet turned A request down. I have sometimes said that's a terrible specification, go back and do it again. But equally, that may change. It may change because at some point I will have enough trust in the processes I've set up that I'm not going to worry too much about the agents going off and kicking off 50, 100, 200, 250 million token workloads. I already do that in a whole range of other areas. You know, I do that with checking the code quality on our software repositories overnight. They can go off and make the changes themselves if they're of a certain size. I certainly do that when I'm doing signals detection for themes that I'm investigating. There's a lot of discretion to run a crawler or do whatever an agent needs to do. And that's a really, really important shift because, you know, if you're in a big company and you're thinking about an agent strategy, an open clause strategy, well, this is all great news for people who sell compute for Nvidia and its competitors and all the companies in the supply chain sort of upstream from them. But a boss trying to figure out how much compute you're going to use, how many tokens you're going to consume, well, you're going to need to come up with some sort of governance over the underlying agents as they go off and do their work and maybe send messages to each other about needing to do X or Y to achieve Z objective. Back at Exponential View, I've been badgering the team to increase their usage of AI, to put more workflows and capabilities into Claude Code and into Cowork and openclaw and the other systems that we use just to help them achieve more for themselves more than anything else. And in thinking through that process internally, I realized that token budgets are really, really important. And in a small, flat company like Exponential View, the decision making is pretty easy. Generally you can do what you need to do. Maybe you go and talk to Maria, maybe there are things you need to come and talk to me about, but that's not most companies. And I was thinking that probably a lot of token budgets are actually owned by the IT department and not by the line of business. That is a bit of a problem, right? If you think of tokens as an IT function, you're effectively saying that one class of cognition, that is manufactured intelligence, is owned by it, which has traditionally been a cost center, and that's the wrong place for IT to sit. And it makes me think about Moderna, the biotech company that Last year put its AI resourcing and its HR resourcing under the same C Suite leader, recognizing that these were essentially a portfolio of capabilities that needed to have one single line. Back at ev, the conversation we have. Well, I can share one with you. I had a discussion with Hannah, who's one of our researchers. She worked on the brilliant, brilliant solar super cycle piece we put out a couple of weeks ago. And I said, look, you can quintuple the amount of tokens you're using, just quintuple that AI budget and if you really need to go much above that, come back and check, but it'll probably be okay. What's implicit, of course, in our values is that we're kind of sensible about when we use these things. We don't throw everything out to the most advanced, most expensive model. But you don't limit your capacity or what you could achieve by having an arbitrary budget that's set somewhere else. So Jensen says this much better than I do. He says that if you've got a well paid engineer, half of their salary should also be allocated to, to their token budget. And if not, I mean, by implication, they're probably not going to be able to do their job well enough. And so that other side of the open clause strategy for the firm is the open claw strategy for you as an individual. And I think you can take that message from Jensen and frankly, you can take it from me as well, which is that, you know, token budgets and your ability to use them for business problems, for achieving better solutions, more optimal solutions, more practical project plans are going to really, really help you. So for me, that's the thing that really came out of listening to gtc, that shift to the inference economy and how large Nvidia and its customers think it's going to be. We've already seen a million fold growth in two years and we can probably see that similar rate over the next couple of years. And I'll do more formal work on what we think that's going to look like and that growth emerges as we move towards agentic systems, to affordable cars, from engines to things that people actually want to use. Of course, these are just indications of the strength of demand. It's not a forecast, it's a process. It's a set of incentives that may lead us to that future. And there are so many second order consequences and conversations to explore with such demand. Will supply catch up? Where will the bottlenecks be? When will those bottlenecks hit? So let's get back to Jensen and what to take away from gtc. Behind the numbers and the razzmatazz, what there is is a really strong signal that the AI economy is changing. From that training economy, the build it and they will come economy to the inference economy. We don't need to infer that, we can see it in the data. Thanks for listening all the way to the end. If you want to know when the next conversation is released, just hit subscribe wherever you're listening. That's all for now and I'll catch you next time. SA.
Episode: What NVIDIA’s bet on OpenClaw means for the future of AI and your token budget
Host: Azeem Azhar
Date: March 25, 2026
In this episode, Azeem Azhar unpacks Nvidia’s recent pronouncements at their GTC event—focusing on CEO Jensen Huang’s endorsement of OpenClaw, a rapidly ascendant open-source agent framework. Azeem moves beyond the headlines, delving into how Nvidia’s strategic moves reveal deep shifts in the AI industry: the transition from an era dominated by model training, to an “inference economy”—where agents perform real-world tasks at massive, ever-increasing scales.
The episode contextualizes these changes through Azeem’s own company, Exponential View, and his personal use of OpenClaw agents—illustrating what this future will mean for businesses, professionals, and the very concept of a “token budget.”
“It's a dimension that will accelerate all the things that we are being promised to about the future. And that dimension is AI inference.” (00:22)
“OpenClaw is the most exciting piece of technology that I have seen since the web browser back in 1992.” (01:53)
“The demand for intelligence is essentially infinite...if the demand for intelligence is infinite, the demand for compute...is also effectively infinite.”
“Our Mini Arnold, which is my OpenClaw agent, had just consumed 100 million tokens in a single day...a three order of magnitude increase in less than two years.”
“That was a million fold expansion in compute demand in just two years.” (16:33)
“It's the harness that makes these things valuable...Would you rather have a race-tuned Ferrari engine sitting on your lawn or a 10-year-old petrol engine actually in a car you could use?” (28:35)
“That's close enough to a billion tokens per day...My average days are now well above 200 million tokens per day, seven days a week.” (32:35)
“As you start to use more AI, you start to use more of a portfolio of models that you want to access.” (41:05)
“If you've got a well-paid engineer, half of their salary should also be allocated to their token budget.” (52:10)
“OpenClaw is the most exciting piece of technology that I have seen since the web browser back in 1992.” (01:53)
“The demand for intelligence is essentially infinite...the more we use, the more we need to use.” (07:06)
“This year...the committed orders was now a trillion dollars...And every dollar is going on chips to build manufactured intelligence.” (09:05)
“The atomic unit of manufactured intelligence...is the token.” (11:30)
“My record day working with Armini Arnold: 870 million tokens consumed and applied to business problems.” (32:35)
“If you've got a well-paid engineer, half of their salary should also be allocated to their token budget.” (52:10)
The episode closes with a call to reflect: The era of the “inference economy” is here—are you and your organization ready with your own OpenClaw (agentic) strategy?