Inside Google’s massive AI capex (live) - Catalyst with Shayle Kann

Summary7 min read

Catalyst with Shayle Kann Live from Transition-AI 2026: Inside Google’s Massive AI CapEx Episode Date: April 23, 2026
Guest: Amin Vadat – Chief Technologist for AI Infrastructure at Google
Host: Shayle Kann

Episode Overview

This live episode of Catalyst, hosted by Shayle Kann at Transition-AI 2026, explores the unprecedented scale of Google’s capital expenditures (CapEx) in AI infrastructure, with an in-depth conversation featuring Amin Vadat, Google's Chief Technologist for AI Infrastructure. The discussion covers the rapidly growing demand for AI compute, the interplay between energy, infrastructure, and data center scaling, reliability requirements, power procurement strategies, and innovation in data center design. Vadat offers a rare, candid look at what it takes to build the world’s largest, most advanced computing infrastructure — and how those ambitions are tightly intertwined with energy markets, grid planning, and climate tech.

Key Discussion Points & Insights

1. Big Picture: Google’s Astronomical AI CapEx

[02:00] Shayle sets the context: Google is projected to spend $175-185B in CapEx this year — dwarfing entire national power sector budgets, NASA’s annual budget, and even multiple large-scale nuclear projects.
- “We spend $25 or $35 billion a year in capex on transmission... This is five to seven times that amount just from Google, just in one year.” (Shayle Kann, 02:00)

2. Scale in Data Center Design: Training vs. Inference

Scale Evolution:
- [04:36] Early 10 MW Google data centers were once cutting-edge; now gigawatt-scale is the norm for leading-edge AI training.
- Amin: “Now you've got a gigawatt of capacity that used to be used for training. What are you going to do with it? Probably going to serve on it.” (Amin Vadat, 05:34)
Inference Compute – Is Bigger Always Better?
- [06:20] For inference, enormous individual data center scale isn’t strictly necessary (smaller deployments can suffice), but some scaling is required due to co-location demands (compute, storage, networking).
Future Mix:
- [10:23] Likely future: medium number of medium-sized data centers, plus a few very large ones — driven by geographic locality needs and management overheads.
  - “It's easier to build a smaller number of larger sites... But having 1,000 each with 0.1% of your capacity has other overheads.” (Amin Vadat, 10:23)

3. Reliability Paradigm Shift

“Four Nines” is Out?
- [11:08] Data centers have historically demanded ultra-high reliability (up to 99.99% uptime), incurring major capex for UPS, backup generation, and redundancy.
- Amin: “No, it is not intrinsic and we should be thinking about lower reliability power delivery overall.” (Amin Vadat, 12:10)
Practical Trade-Offs:
- [12:10] As compute becomes a much larger portion of cost, many internal customers now prefer greater total capacity at modestly lower uptime (e.g., 99% availability but double the compute).
- “If the other 51 and a half weeks I get twice the capacity, many people would say sign me up.” (Amin Vadat, 13:10)
Real-World Implementations:
- [13:55] “Without saying too much, it’s happening.” — Google is already co-designing for flexibly lower reliability with some customers.

4. Behind-the-Meter Power, Bridge Power & Demand Response

Grid Preference, With a Caveat:
- [14:57] “At Google, we prefer grid connected capacity... if you're behind the meter, you're going to have to do all that provisioning yourself.”
Agile Power Strategies:
- [15:00] Google achieved a gigawatt of demand response agreements, trading flexibility for more grid integration. Willing to “brown down” (reduce load) during grid peaks for system-level gains.
“For the utility for the one week of the year where they have maximum demand, we're willing to brown down... In exchange for, in the end, more availability of power, less cost, both for us, but also for, for the ratepayers.” (Amin Vadat, 15:00)
Bridge Power Reality:
- [16:40] Bridge power (onsite generation, sometimes mobile, before a grid connection is ready) is a necessity due to lead times but is seen as temporary.
- “We're always going to look to invest with the utilities to bring the transmission. Maybe it's a year after, maybe it's two years after.” (Amin Vadat, 15:44)
Energy Abundance Dream:
- [18:33] Amin on stranded generation risk: “I'd love to have that problem... I think that the world would be a better place if energy were abundant... I think we're so far away from that world.” (Amin Vadat, 18:33)

5. Data Center as Microgrid, Software-Driven Optimization

[21:49] “The microgrid and the software control here is going to be absolutely key... This is a place where I think we as a community are under invested today.” (Amin Vadat, 21:49)
Dynamic power management, workload differentiation, and real-time optimization are critical as facilities become more like microgrids, especially under demand response scenarios.
Workload Sensitivity:
- [22:55] Inference and training each have vastly different power and latency profiles — batch vs. real-time workloads, etc.
Google’s Edge:
- [23:52] Google is highly vertically integrated; they co-design everything, from the racks and chips to data centers and power sources, supporting unique system-level optimization.
  - “For us...we co design (TPUs) with the building...with the power generation source...with the DeepMind team that builds Gemini models.” (Amin Vadat, 23:52)

6. The True Bottlenecks in AI Growth

Rate Limiter Debate:
- [24:53] If demand is infinite, what’s the constraint? Chips, power, labor, or construction?
- Amin: “At 10am it's labor, at noon it's power. And at 2pm it's chips every single day.” (Amin Vadat, 26:01)
  - All three are simultaneous bottlenecks; there is no “easy” one to scale faster than the others.

7. Edge vs. Cloud vs. Physical AI (Robotics & More)

Physical AI is Lagging:
- [28:14] Digital AI’s demand is much larger and has more mature infrastructure; robotics and physical AI raise new reliability and safety challenges, often requiring edge or on-device solutions.
On-Device vs. Edge Compute:
- [29:49] “I believe that a lot of it is going to have to be on device and dedicated to that use case.” (Amin Vadat, 29:49)
  - For critical real-time needs, computation must occur on the physical device (e.g., a self-driving car).
  - In factory automation, some edge/localized data centers may become economical for non-safety-critical tasks.

8. Driving Down Data Center CapEx: Toward Efficiency and Specialization

Density & Specialization:
- [32:10] The rising cost of compute makes density and task-specific optimization far more important: “What is the ratio of power to space?... Now if we could actually co design and optimize and say, you know what, this building is going to be a GPU building, that building is going to be a TPU building... huge opportunity.” (Amin Vadat, 32:47)
Flexibility vs. Efficiency:
- [34:46] Historically, multipurpose design won (for flexibility). Now, with 100x more power use for GPUs vs. disks, hyper-optimization becomes key.
Radical Power Differentials:
- “The difference between storage and accelerators are approaching 100x. But even within an accelerator, if you look at the power footprints...of serving versus training, radically different.” (Amin Vadat, 34:46)

Notable Quotes & Memorable Moments

“If you went to your internal customers…would you rather have four nines of availability and half the capacity or two nines of availability and twice the capacity…many people would say sign me up.” (Amin Vadat, 13:10)
“I would say that when delivering the end to end, we unfortunately don’t have the luxury of focusing on a single limiter. At 10am it’s labor, at noon it’s power, and at 2pm it’s chips every single day.” (Amin Vadat, 26:01)
“The microgrid and the software control…are absolutely key. And this is a place where I think we as a community are under-invested today.” (Amin Vadat, 21:49)
“I'd love to have that problem [of energy abundance]. I think we’re so far away from that world.” (Amin Vadat, 18:33)

Timestamps for Important Segments

02:00 – 04:36 | Setting the scale: Google’s CapEx compared to global megaprojects
04:36 – 08:01 | Data center scaling: training vs. inference, life cycle of infrastructure
09:45 – 11:08 | Future siting: gigawatt mega-sites vs. distributed smaller sites
11:08 – 14:07 | Reliability: legacy requirements vs. future flexibility
14:07 – 18:33 | Behind-the-meter power, bridge power, demand response, and energy abundance
21:14 – 23:52 | Data center “microgrids”, workload optimization, and Google’s vertical integration
24:53 – 27:31 | Bottlenecks in AI scaling: chips, power, construction
27:31 – 31:24 | Physical AI (robotics), edge vs. cloud compute
32:10 – 36:22 | Driving future CapEx efficiency: design specialization, density, power profiles

Tone & Final Thoughts

The conversation is frank, technical, and introspective, reflecting both the immediacy of industry pressures and long-term strategic thinking at the highest levels. Amin is refreshingly direct about the limits, lessons, and opportunities of operating at hyperscale, while Shayle challenges assumptions about where the future might bend.

Bottom line:
Google's AI ambitions — and their infrastructure’s energy demands — are changing the rules of the game across cloud, AI, and power sectors. Expect continued pressure on grid planning, persistent bottlenecks in chips and power, and rapid innovation toward both flexible software and specialized hardware as the AI era matures.

Loading summary

Transcript65 lines

[00:01]
Energy Hub Announcer
When utilities need flexible capacity they can count on, they turn to Energy Hub. Energy Hub works with more than 170 utilities coordinating over 2.5 million devices to manage 3.4 gigawatts of flexibility. Built for the moments when utilities can't afford uncertainty, Energy Hub builds and operates virtual power plants that utilities actually stake their grid, planning on coordinating EVs, batteries, thermostats, and more through a single platform built for utility scale, predictive, verifiable, and designed to perform when it counts. Learn more@energyhub.com trillions of dollars are flowing into clean and critical infrastructure. But those investments aren't driven by technology alone. They're shaped by markets, by policy, by capital, and by the institutions that connect them. I'm Alfred Johnson, CEO of Crux and host of a brand new podcast, Critical Capital. Each episode I talk with people deploying capital, shaping policy and building the clean economy. Tune in as we unpack how progress is actually made. Listen to Critical Capital on Spotify, Apple, or wherever you get your podcasts.
[01:05]
Fishtank PR Announcer
Catalyst is supported by Fishtank pr, an award winning PR firm focused on climate and energy tech, renewables and sustainability. Fish Tank is known for generating prominent and effective media coverage for the brands they work with. If you want a PR partner that's thoughtful, shoots straight and gets results, you'll like Fishtank PR. To learn more about Fishtank's approach, visit fishtankpr.com that's F I S C-H fishtankpr.com Latitude Media covering the new frontiers of the energy transition.
[01:39]
Shel Khan
I'm Shel Khan. Welcome to Catalyst Live. Thank you so much. Okay, I am here with Amin Vadat, who's sitting next to me here. Amin is the chief Technologist for AI Infrastructure at Google. Amin, welcome.
[01:58]
Amin Vadat
Thank you for having me. Excited to be here.
[02:00]
Shel Khan
Okay, I want to provide a little bit of context for the conversation we're about to have here. I know this is why everybody is here in this room at this conference, but there's a lot going on in AI infrastructure at the moment, particularly as it pertains to energy. Amin leads the infrastructure team at Google. So in the Q4 2025 earnings report, Google announced its intent to spend somewhere between $175 and $185 billion in capex this year. It's not all for AI infrastructure, but let's assume a decent portion of it is just for this purpose right now. Let me offer you some context for that number. We had a big election in Hungary this week. That number is roughly the GDP of Hungary numbers that are more relevant to this audience. Probably we spend about 25 or $35 billion a year in capex on transmission, electricity, transmission, infrastructure in the United States. This is five to seven times that amount just from Google, just in one year. If you want to talk about big infrastructure projects, let's talk about Vogel. Vogel is the notoriously expensive, extremely expensive nuclear plant that's the first nuclear project built in the United States in decades. Vogel cost about $30 billion. So this is five or six Vogels per year. If you want to move outside energy just for one fun one. I was in San Diego last week, which happened to be when the lunar mission dropped down. So I looked up NASA. NASA's annual budget is $25 billion. So this is seven NASAs that Amin is responsible for spending each year, or at least this year, on infrastructure. So with a lot of infrastructure and with great capex comes a lot of great questions. I have many. Let's dive into some of them. The first one, I mean, I guess is one of the big ones that's been on my mind and I want your perspective on it. We clearly have been living in a world where scale of individual data centers has been a driving force. Right? We've gone from you guys were probably building tens of megawatts per data center years ago to hundreds of megawatts to now gigawatts. And I think probably everybody here appreciates that for training purposes, for model training purposes, scale is really important. This is why we're getting these huge data centers. But for inference, I've heard mixed things. As we shift more into inference world, it may or may not be true that you need that level of individual scale. So in your mind, how much does scale matter when it comes to inference? Compute. When I say scale, I mean scale of the individual data center.
[04:37]
Amin Vadat
Yeah. So great question and I think you have it spot on. I remember when Google announced its first data center in Oregon, the Dallas. This was 23, four years ago, before I was at Google, 10 megawatts and people were just stunned that little company would go build a 10 megawatt data center. That was a big number. And actually no one else was building data centers for their own compute infrastructure at the time. And it's just grown from there, 100 megawatts, gigawatt, et cetera. It's a really good question in terms of the split between training and serving. And so here's where to me gets perhaps most interesting. At the scale that we're operating, we want the latest Greatest, most efficient, most capable training cluster, essentially on an annual basis. If you look at our announcements for TPUs, Nvidia's announcements for GPUs, the latest greatest is coming out every year. And every year the latest greatest is by definition better than last year. A pick this gigawatt number. Let's say you buy the latest greatest and you put a gigawatt somewhere and maybe you put a couple of these down after a few years, one, two, probably not much more than that. Whoever is doing the training is going to want the new latest greatest and then they're going to want to gigawatt somewhere else. Now you've got a gigawatt of capacity that used to be used for training. What are you going to do with it? Probably going to serve on it. Now the question is, could you get away with lower scale? Yes, absolutely. In fact, we have lots of smaller deployments, lots of data centers with much less than a gigawatt of capacity. 10 megawatts in certain places, that serves
[06:19]
Shel Khan
equal value for inference.
[06:21]
Amin Vadat
Inference in general. Now for our largest, most capable models, they are going to run on many chips. It's not just one chip simultaneously. But you don't strictly need a gigawatt of capacity to be able to do useful work. You probably don't even need 100 megawatts of capacity. It gets a little bit more interesting than that because of, let's say, co located compute and storage and networking and everything else. In other words, it's not just the accelerator. But no, strictly speaking, you could go to much smaller deployments and still be able to do inference. The lifecycle aspect of it that I just described as people cycle workloads over the capacities. The more interesting one in terms of the footprint for serving.
[06:59]
Shel Khan
So there's two interesting pieces to that. One is, as you're saying, just intrinsically for inference, you don't need the same scale effect, but there is probably some minimum scale that's viable, as you said, because you are collocating it with other things. So you're probably not doing 10 kilowatt deployments. No. Okay, so we're in the tens of megawatts or hundreds, but not gigawatts necessarily.
[07:21]
Amin Vadat
And these racks today are trending toward hundreds of kilowatts. Just this single rack.
[07:25]
Shel Khan
Right.
[07:25]
Amin Vadat
For the rack, for the rack with multiple chips in it. But I mean it's absolutely, you're going to need some minimum scale.
[07:32]
Shel Khan
Okay, and then the second interesting piece is what you said about repurposing. And there, I guess it's a question of demand, right? You put a few gigawatts for training, you move on to the next few gigawatts for training of whatever the next TPU or GPU is. But is that enough to serve the booming? I think the assumption has been, look, we, we're training now, but that is going to result in inference demand shooting upward. And so then that would imply it's not nearly going to be enough.
[08:01]
Amin Vadat
Exactly. And so this is, and I think we're at that transition point. I mean, we said last year that we're entering the age of inference. I think with agents exploding today, that's well, well happening. So probably, I mean, the analogy I would use is from Google's early days with web search. It used to be that most of the compute at Google was dedicated to building the search index pretty quickly. You hoped, and it unfortunately turned out to be true, that most of the capacity needed to be used to serve that index. Same thing here. Most of our capacity maybe earlier on was used for building the model, but you would hope that it transitions to serving the model pretty quickly. And you're absolutely right that we're there. So I do think that over time also, as the efficiency and latency of these models improves, more disparate deployments are going to be valuable. So what I mean by that is today each individual token that is generated by the model takes a reasonable amount of latency. So much so that actually you might not be able to tell the difference here. Let's say in San Francisco, if you're accessing content on the east coast, maybe even Europe sometimes relative to San Francisco in general, for, let's say, maps or search or ads, that's not true. The computing is sufficiently efficient and the latency sufficiently low that you will notice because of the speed of light propagation delay of the network if you're going to a faraway site. So as these services become more interactive, as they become more efficient, and that is still going to be a journey we're not there today, you're going to want to have geographic locality that's also going to impact reliability because again, you can think of it as a highway system. The less distance you have to go, the, the more likely it is that you're going to find the capacity you need for your request.
[09:45]
Shel Khan
So I guess wrapping up this piece of it, the core question that I've been trying to think about, I think a lot of folks in this world that intersects energy and AI have been thinking about as well, is do we end up, as we shift more and more into Inference where you can make an argument for smaller pixel sizes. Making sense for data centers. Does it end up being easier in three years, five years, something like that, to go build a new gigawatt data center and find a site on the grid that you can interconnect the gigawatt data center, or does it become easier and or faster to build 50, 20 megawatt data centers or something like that?
[10:23]
Amin Vadat
Yeah, that's a good question. In general, we found over the years that it's easier to build a smaller number of larger sites. There's still asterisks there. You don't want to be too concentrated. Again from a fault tolerance and geographic locality perspective. In other words, the argument of build as big a site as you can in one place breaks down rather quickly. But having 1,000 each with 0.1% of your capacity has other overheads associated with it in terms of management. So I think that it'll really come down to geographic locality and probably a medium number of medium sized data centers. Sorry for the whatever lack of precision there, but medium number, medium sized data centers augmented with a small number of large data centers.
[11:08]
Shel Khan
Right, which makes sense. Okay, so then the next question that's been on my mind about the future of this infrastructure that has a lot of direct relevancy to the energy side of the equation is about reliability. Data centers historically have been just it's gospel. I would say the data centers require the highest reliability, three nines or whatever the number is and to the extent where like the standard footprint of a normal data center in cloud world pre AI, but even the early AI data centers as well has a UPS system and backup generators and all this kind of stuff just to make sure that reliability is that high. Two questions for you. One, why? Why is the reliability requirement so high? And two, is there any argument for that changing in the future? Because that reliability requirement causes so much challenge and, and capex, why is it such a problem that we have lead times on gas generators, all this kind of stuff? It is because of the reliability requirement. So is it intrinsic to something about what you're doing or is it just a function of how the business has evolved?
[12:11]
Amin Vadat
Yeah, a fantastic question. And I think that if I were to probably send one message here is no, it is not intrinsic and we should be thinking about lower reliability power delivery overall. I'll tell you why it has been, but I think that I'll also get to why it has changed substantially. So for most modern software services, the compute is actually a relatively small fraction of your cost. So now it makes sense to over provision it. You want to have 99 point, actually 999% five nines reliability for your software services, you don't need quite that. But many of our data centers aim for four nines of minutes of downtime a year maximum, which as you said, has a large amount of costs associated with it. Now if you think about it though, as of now, given how constrained resources are and how costly they are, a much larger fraction of your overall service cost is in the compute. So if you went to your internal customers, if I were to go to my internal customers and said, would you rather have four nines of availability and half the capacity or two nines of availability and twice the capacity, which do you pick very often? Not always very often they'll say, oh my gosh, give me 2x the capacity if I need to have 99% and 99% sounds good. You all know the math. That's 3.65 days of downtime a year. That's a lot like we're saying, three and a half a week every year you're down, you don't have the capacity. But if the other 51 and a half weeks I get twice the capacity, many people would say sign me up.
[13:53]
Shel Khan
And yet I don't see that happening. Is it happening?
[13:55]
Amin Vadat
And I'm not seeing it without saying too much, it's happening. I would say that actually the co design there with our customers at Google has been one of our sources of significant efficiency.
[14:07]
Shel Khan
Okay, so that's a good segue then into my next question. Which is, which is behind the meter? Power generation, storage, whatever it might be. There are multiple reasons that one might put something behind the meter. Right. And it can be for reliability purposes, that is one. But oftentimes now people are talking about bridge power and things like that. What is your view on this? There's an enormous amount of planned behind the meter power. Is that the direction of travel? Will it be the direction of travel for an extended period of time?
[14:39]
Amin Vadat
It's a very important opportunity for us and it is one of latency, again, a different kind of latency. In other words, what is the time to delivery of capacity? What I'll say though, before going down that path, is that we would actually at Google prefer grid connected capacity.
[14:57]
Shel Khan
Why? I was going to say why is it reliability?
[15:00]
Amin Vadat
It is in the end provisioning for a given level of reliability. If you're behind the meter, you're going to have to do all that provisioning yourself. Now, an aspect of this that's Actually quite powerful for us. And to give an example, going back to the reliability question, in March we actually hit a significant milestone in agreements with utilities for a gigawatt of demand response across our fleet. What does demand response mean? It means that for the utility for the one week of the year where they have maximum demand, we're willing to brown down. That also goes to the availability commitment that we make to our customers. Why? Because that allows them to provision not for their worst, coldest, hottest, whatever it is, week of the year, but to then provision for the 90, whatever it is, 8th percentile. And we'll give up that capacity in exchange for in the end, more availability of power, less cost, both for us, but also for, for the ratepayers in the region. So now if we have to do that all the reliability work ourselves, rather than being able to shift capacity back and forth when we're not using it, like let's say that we actually have behind the meter power generation and we will behind the meter in quotes. What if we can, when we're not using it, give it back to the utility in general? The way we look at it is we like behind the meter if it means that we get the capacity up most quickly. But we're always going to look to invest with the utilities to bring the transmission. Maybe it's a year after, maybe it's two years after. But the point is this gets us the capacity we need. And maybe we need some bridge power in the interim, but that bridge power actually in the limit, could be mobile.
[16:41]
Shel Khan
Tying these two things together. One thing I haven't fully wrapped my head around with bridge power is the reliability question, right? If you're still in this world where you're demanding, let's say it's not four nines, let's say it's two nines of reliability, but you need two nines of reliability with just on site generalization generation for some period of time, however long that bridge is, you got to build a lot of on site stuff, right? You end up over provisioning really heavily and then eventually you get the grid connection and now what do you do with all this stuff? So is there during that bridge power period, are you offering a different level of service somehow? Or are you actually provisioning for your two nines, whatever your ultimate reliability requirement is going to be. But from day one with on site resources, it's both.
[17:20]
Amin Vadat
I mean we basically, I mean one way to look at it is that most people have trouble unless they've operated at scale. Thinking in terms of these numbers of like, what's the difference between 99.9 and 99.5 or 99.99 in a given year? And in a given year they might actually be identical. So some people are just going to say, I'm going to roll the dice. I hope I get lucky. And sometimes they will, and they actually won't experience any issues. What I would say, though, is that we also look to seeing, okay, beyond some of this bridge power that we're going to need. We what are the more permanent sources? Would we use solar, wind, nuclear, other sources that will be permanent but might not be able to get us all the way to the power capacity that we might need? And then we have to augment with whatever might be turbines, gas, or something
[18:06]
Shel Khan
else which could be mobile, as you
[18:07]
Amin Vadat
said, which could be mobile.
[18:08]
Shel Khan
Yeah. I guess the question for me then is, do you feel that we're going to end up with all this stranded on site generation as a result of this? Are we going to end up with, is there any world where we build excess generating capacity or are we just so far underwater now that it doesn't matter?
[18:33]
Amin Vadat
I'd love to have that problem. I'd love to have that problem. I think that one of the things that we aim for at Google, and I think you all as well, is a world of energy abundance. And I think that the world would be a better place if energy were abundant. It's not. I'm not just talking about AI or data centers or anything. Energy is a limiter. I think we're so far away from that world that I'd love to have the conversation. I don't think it's the next few years where we have too much.
[19:03]
Energy Hub Announcer
Virtual power plants are becoming a reliable way for utilities to manage capacity. But enrolling devices is just the start. What really matters is confidence, knowing those resources will perform when dispatched and being able to prove it. From the control room to the living room, Energy Hub's platform handles the full picture from near real time forecasting, locational dispatch, and the kind of rigorous verification that holds up when regulators, grid operators, or leadership ask did it deliver? Easy enrollment creates momentum. Proven performance builds trust. That's why more than 170 utilities rely on Energy Hub to manage over 2.5 million devices delivering 3.4 gigawatts of flexible capacity. See what that looks like@energyhub.com we're living through a profound economic shift, and energy sits at the center of all of it. Trillions of dollars are flowing into power plants, transmission lines, battery factories, data centers. But the future of energy isn't shaped by technology alone. It's shaped by markets, by policy, by capital, and by the institutions that connect them. I'm Alfred Johnson, CEO of Crux, the capital platform for the clean Economy. Join me for my brand new show Critical Capital. As I talk with people deploying capital, shaping policy and building projects together we unpack how risk is priced, how incentives are structured, and how progress is actually made. Listen to Critical Capital on Spotify, Apple or wherever you get your podcasts.
[20:30]
Fishtank PR Announcer
Are you tired of overpaying for big name PR firms but not really knowing what they're delivering? And is your comms team wasting time reviewing lengthy messaging briefs and decks instead of engaging journalists or producing content? Are you wondering why your competitors are getting press and you aren't? Fishtank PR is an award winning climate and energy tech, renewables and sustainability focused PR firm dedicated to elevating the work of both early stage and established companies. Whether you need to position yourself as a thought leader in between project announcements or translate complex ideas and technologies into tangible, compelling stories that resonate with the media, Fish Tank can help check out fishtankpr.com that's F I S C H fishtank pr.com
[21:14]
Shel Khan
let's talk about the different resources that you might put behind the meter. Right you mentioned you can build on site. Solar, wind or whatever you can do nuclear. You can get your generation that way. You can get your generation with gas as well and then you can build batteries to buffer. Do these end up the ones that you are going to the data centers you're going to build that do have on site infrastructure beyond just the UPS and the backup generator? Do they end up looking like little microgrids and are you co optimizing against a bunch of resources or is it generally going to be some of the data centers? Like I don't know, the XAI data center that got built is just a bunch of gas generators.
[21:49]
Amin Vadat
Basically the microgrid and the software control here is going to be absolutely key. And this is a place where I think we as a community are under invested today. So if you think about that demand response scenario I talked about, if we need to do a brown down it's not going to be that the whole site goes away. It's like okay maybe we need to give up 20%, 30%, 40% of our capacity. Okay, which 20%, 30% and 40%? What's the signal to the software? What do we drain from where? What slos do we shift? Do we say you know what for the next week, we're going to need to fail over 20% of requests from this location to somewhere else. Maybe actually a whole building gets powered down for a week maybe, or most of the building. The microgrid is going to look exactly like this microgrid. Now, can you distribute the power dynamically? Also, by the way, in response to the workload we talked about training versus serving the power footprint, the two are very different.
[22:45]
Shel Khan
I assume the latency sensitivity is super different as well. Even within, as you said, even within inference, there are some things that are going to be super latency sensitive and some that very much will not.
[22:55]
Amin Vadat
If you got your overnight agent running, then it might be all serving, but it might be batch serving. That's not sensitive from a human loop perspective, but then others for your chat interactions or whatever that might be very latency sensitive.
[23:08]
Shel Khan
Is there an extent to which you are sort of uniquely capable of executing on this in the sense that Google's certainly the most vertically integrated player. You're from the tpus through the cloud service, you have Gemini, you're running your own workloads and so on. So if you. Part of what is required in order to reach this future where data centers are flexible and can operate at slightly lower reliability and all those kinds of things, part of what's required is that you have to differentiate amongst the workloads such that some can operate as necessary at really low latency and others at higher latency. Google can kind of do all that in house. I mean, you have customers for Gemini, so you have to serve those customers, but you have more capability than, than most. How do you think it disseminates out beyond Google?
[23:52]
Amin Vadat
So I think that it's a good question. It's something that we think a lot about. In other words, what we want to do is we want to design end to end systems that taken together create capabilities. This word capability is actually essential to what we discuss internally a lot. So I appreciate the question. Create capabilities that otherwise wouldn't be possible. And I do think that it comes down to this vertical integration. In other words, for, for us, for let's say our TPUs, we co design them with a building, we co design them with the power generation source, we co design them with the DeepMind team that builds Gemini models. So it's the software above the models, above that, the chip design which we do in my team as well, that's integrated with the rack, that's integrated with the data center, that's integrated with the power source. And if between each of these Boundaries. You have a customer optimized interface that gets you a few percent, those few percents up and down, start adding up, multiplying out in fact to something meaningful. And that is exactly what we go after.
[24:53]
Shel Khan
Okay, so I'm going to ask you to rank some things. There's been a little bit of a debate publicly that I have found interesting about what is the rate limiter on the growth of AI. Let's assume for the moment it's not demand relative to supply today, that there's essentially infinite demand and maybe that changes at some point in the future. I'd be interested in your perspective on if and when that might happen, but certainly not the case today. So it's going to be something else. There has been an argument that it is chips in the chip supply chain, particularly some of the things upstream in the chip supply chain, like UV tools for lithography and so on. This is a room full of power oriented people. There's certainly also an argument that it is power. I think there's a third argument maybe that it could be labor at some point. You can add a fourth if you want to that. But if you had to rank order, what is the biggest rate limiter to growth between power chips and labor? How would you rank them?
[25:56]
Amin Vadat
Yeah, and I would add sort of data center construction and delivery as so
[26:01]
Shel Khan
EPC is a broad category, not just labor, you mean?
[26:04]
Amin Vadat
Yes, labor is one component of it, but I think just even the supply chain there associated with it, electricals, mechanicals, cooling, et cetera, is another aspect of it beyond the chip supply chain. I would say that when delivering the end to end, we unfortunately don't have the luxury of focusing on a single limiter. I would say very sincerely and honestly, at 10am it's labor, at noon it's power. And at 2pm it's chips every single day.
[26:35]
Shel Khan
All right, I'm going to force you to answer the question a different way. You're supposed to spend, whatever it is, 175, $185 billion this year building out new infrastructure. If you woke up tomorrow and Sundar said you got to spend 300 now, what would you go try to solve?
[26:54]
Amin Vadat
You know, I think, I know. I'm not trying to dodge the question, but I very sincerely feel that actually we'd have to go scale all of them and that every single one of those is at the limit of what we can do for the envelope that we have. Is one of them inherently easier to scale than the others among the options? Honestly, no. All three of those are major, major issues for us. I'm sure that there is an answer, but I'm not relaxed about any of them. This is a real thing. I couldn't pick one. I would say. Sundar Wow. 300. Okay, I'll get back to you as to what the exact issues are going
[27:31]
Shel Khan
to be on the labor and EPC. 1. Curious. Your perspective on not just related to data center construction, but in general the rise of physical AI as a category, the rise of robotics and who knows what form factor that it ends up taking has been a sort of second wave. Right? There was an LLM wave of excitement in the public. I'm sure in your world it's been going on longer, but I would say we had this wave of sort of digital AI excitement and now a physical AI wave as well. Do you have a heuristic in your head for how demand shapes up between those two or how infrastructure will get built relative to those two?
[28:14]
Amin Vadat
Yeah, it's a good question. I mean I think that in terms of the digital side rather than the physical side, the demand obviously today is much, much, much larger. The architecture for the physical side is still in development. I would say the best examples of it right now are with self driving cars. In other words, if you think about it, these self driving cars really are robots on four wheels. For this use case in particular, you can imagine this is actually one of the hardest use cases. Safety is paramount. Safety is absolutely paramount. What that means is that you actually give up some capability, some scale for certainty and reliability. To my knowledge, without speaking about any of the specifics. This means actually more of the edge use cases are relevant there because the multiplexing associated with cloud is probably less desirable if you have a blip and you're really counting on some computation to if you're doing a chat and whatever your chat app is down for five seconds, fine, do something else for five seconds. You come back. If the robot can't get its answer in five seconds, depending on the use case, that could be catastrophic.
[29:33]
Shel Khan
How much of that happens on device or in the case of a Waymo in car? Like how much of the compute that occurs in a Waymo or in a humanoid robot in the future is going to happen inside that instantiation of the physical AI device versus getting pulled from the cloud?
[29:50]
Amin Vadat
Without talking about any specific use case, I believe that a lot of it is going to have to be on device and dedicated to that use case. Not all. Again, there's going to be different kinds of use cases. If it's what kind of music Do I want to play for my passenger? I don't know. Maybe that's okay if that blips for a few seconds. But if it's, which turn do I take now in an evasive maneuver, it seems like you want that on device, right?
[30:13]
Shel Khan
Which then makes the argument for the edge infrastructure stuff a little bit weaker. People have made the argument back to the kind of edge versus medium size, medium number versus hyperscale thing. I think that people thought the strongest argument for edge, edge small localized compute was things like Waymo. Right. But if the really sensitive safety oriented stuff or the really latency sensitive stuff is all going to happen on device, then maybe when you pull, you can handle the latency of going to the east coast.
[30:48]
Amin Vadat
It's a very good question and I would need to think through it more. But if you think about some other related use cases like factory automation, in that case, would you have an edge deployment, something that looks more like edge deployment that handles this provision for handling 100 or a thousand or whatever it is robots for that particular use case at the edge? Again, good question. I'm giving you an authoritative no.
[31:11]
Shel Khan
And in that case it might be. You might do that for cost saving reasons.
[31:13]
Amin Vadat
Right.
[31:14]
Shel Khan
Putting all that compute in every individual robot might be expensive or power.
[31:18]
Amin Vadat
Right. Because that much compute into every one of these robot arms may be prohibitive.
[31:25]
Shel Khan
Right. For the infrastructure inside the robot, yes. Okay, I want to finish up by asking you something that I feel like I don't hear as much talk about as you'd expect in the long term that there should be, which is capex and cost savings in data center infrastructure. Right now we're just in the world of we need to build as much as we possibly can. And it seems like speed is the only thing that matters. But in the long arc of history, one presumes ultimately the cost of that CAPEX is going to be important. Where do you see the biggest opportunities? If you think out into the future, how do you turn, if you were to build the same amount of capacity in five years in megawatts as you are today, is there a world where you turn that $175, $185 billion into $100 billion? And what are the things that could get you there?
[32:11]
Amin Vadat
We're looking at this all the time. In other words, it is probably one of the biggest focus areas in my team. I won't say biggest, but it's top three. For sure it might be biggest. So in other words, when we say we're spending X dollars, we're Saying that if we had to have done this work last year, we would have to have spent 1.2x making up the number. Don't take it as. In other words, every year we're looking to deliver substantial efficiency such that if we had to do it again, it would be way more efficient. This starts with software and again, lot of opportunity on the software side, but lots of opportunity on the hardware side. Let me give a very simple example. What is the ratio of power to space in your data center? In other words, if you have let me pick a number, 100 megawatts, how big a building do you build and how big a building do you build for 25 year lifetime of that building? Not just one generation of TPU or GPU or whatever, but maybe five or six generations of them. Now you could be conservative and build an infinitely sized whatever it is building and say okay, whatever comes next, I'm going to be set. Or maybe I have to. Now if you think about the watts per linear foot of a disk rack versus a GPU rack, radically different, like I don't know, 100x between disk and GPU, what are you going to assume? So now if we could actually co design and optimize and say, you know what, this building is going to be a GPU building, that building is going to be a TPU building and that building is going to be a disk building, huge opportunity. Now I've now limited my fungibility. If I change my mind in five years time and I have a disk building and I want to put some TPUs there, there's going to be a lot of empty space. A lot of empty space. So I think we figured these things out not perfectly, but every year, every generation we're looking to drive that co design for that optimization and managing the flexibility while optimizing the cost.
[34:14]
Shel Khan
You know, it's interesting on the outside, I think I would have assumed that you had already basically optimized to the T for linear area density. Right. Everybody talks so much about that density inside these data centers for a variety of reasons. Right. Some of that is because for training it actually it's a performance thing. But for cost reasons as well, I would have assumed you're already at the maximum possible density given today's technology. It sounds like you're saying that hasn't always been the case in part because we've been designing data centers to be more, I don't know, multipurpose tools.
[34:47]
Amin Vadat
Exactly. Back, I would say five years ago, 10 years ago, it didn't pay to have that Hyper optimized. Because if you lost anything in flexibility in a world where compute wasn't the dominant portion of your cost, if compute is not the dominant portion of your cost, you actually want to have flexibility and fungibility. When compute becomes a more dominant portion of your cost, you now actually are thinking, okay, what am I going to do for this year, next year and the year after to make sure that I optimize it super well. The difference between storage and compute was at most 10x. The difference between storage and accelerators are approaching 100x. So the problem just got and getting wider disks aren't consuming any more power every generation. The accelerators are so these kinds of problems. But even within an accelerator, if you look at the power footprints and this is where the microgrids also come in, of serving versus training radically different. If you just look at how much power we draw from the utility or from our batteries or from whatever for one workload versus another could be a factor of two.
[35:55]
Shel Khan
Well, in the profile, those workloads are very different. Right. Training, sort of notoriously, very on, off, spiky. And you have these. You solve for that. I don't know if this is still true within Google's data centers, but you solve for that by basically blank workloads to try to make it smooth.
[36:11]
Amin Vadat
We don't do this, but yes, some do.
[36:14]
Shel Khan
Yeah. The profile of those workloads ultimately impacts what other infrastructure you need on site, what your buffer system, all the power infrastructure, all those kinds of things.
[36:22]
Amin Vadat
Yes.
[36:23]
Shel Khan
All right. I mean, this was very fun. Very informative, as I expected. Thank you so much for being here.
[36:28]
Amin Vadat
Thanks for having me. This was great.
[36:35]
Shel Khan
Amin Vadat is the chief technologist for AI infrastructure at Google. This show is a production of Latitude Media. You can head over to latitudemedia.com for links to today's topics. This episode is produced by Max Savage Levinson. Mixing and theme song by Sean Marquan. Stephen Lacy is our executive editor. I'm Shaya Khan and this is Catalyst. Sam.