
With Google’s expanded demand response and EPRI’s DCFlex initiative, the industry is putting its early demand-shifting capabilities to the test. So how does data center flexibility actually work?
Loading summary
A
Latitude Media covering the new frontiers of the energy transition.
B
I'm Shayl Khan and this is Catalyst.
C
You might slow down a job. You might change the resource allocation of how many chips, for example, are instantaneously being used for a job. You might also go all the way down to the underlying silicon and you might change what we call the clock frequency of the chip to change the rate at which computations happen.
B
Coming up what does it actually look like to make a data center flexible?
D
Imagine a world where connected devices like EVs, home batteries and smart thermostats work together to support a more efficient, reliable and affordable power grid. EnergyHub is making this vision a reality today with Energy Hub's Edgederms platform. Utilities can create virtual power plants through customer centric flexibility programs, making it easy to manage distributed resources and balance the grid. Unlock grid flexibility and reliability through cross der management with Energy hub, the trusted edgederms leader. Visit energyhub.com to learn more. Catalyst is brought to you by Antenna Group, the communications and marketing partner for mission driven organizations developing and adopting climate solutions. Their team of experts help businesses like yours identify, refine and amplify your authentic climate story. With over three decades of experience as a growth partner to the most consequential brands in the industry, their team is ready to make an impact on day one. Get started today@antennagroup.com need to accelerate procurement.
A
For an upcoming solar or storage project. ANSA is your best source of intel to stay on top of current policy, tariff, domestic content and supply chain issues. ANSA's team of experts is available to help you adjust procurement strategies, secure safe harbor products and find existing inventory in the US as policy continues to evolve. Learn more about ANSA subscription and service options to help you navigate an uncertain market@go.anzarenewables.com Latitude.
B
I'm Shel Khan. Invest in early stage companies at Energy Impact Partners. Welcome. So the conventional wisdom about data centers is that from an electricity perspective, they look like totally flat loads that is operating 24, 7, 365 and without much willingness to change that. But as power increasingly becomes the choke point for more data center infrastructure development, the world is waking up to a bunch of ways in which that's not entirely or necessarily true. First, you can put generation or batteries on site to shave peak load. That's the physical solution. But there are also digital solutions, it appears. First, because data centers aren't actually operating at nameplate peak most of the time anyway, but also second, because you might actually be able to make the workloads themselves a little bit flexible. Google actually made a big announcement about doing this at their data centers just a few weeks ago. They announced that they've partnered with two utilities, Michigan Power and tva, to introduce demand response via workload flexibility in their data centers. But our guest today is my old friend Varun Sivaram, who's also working on this problem. His company, Emerald AI, is building a software platform that is intended to make data centers flexible. As with many things in electricity, the devil is in the details. And in this case, the details involve what do we mean by flexibility? How do we actually get it? What are the SLAs between the data center operators and their customers? How are the grid operators going to think about it? There are a lot of nuances to this, so let's get into it. Here's Varun. Varun, welcome back Shail.
C
Thanks for having me back.
B
All right, new topic for us to talk about here, which is what you are spending your time on these days, data center flexibility. I want to start by having you kind of walk me through what you understand to be the way that compute translates to electricity load in AI data centers today. I think this is something that is actually commonly misunderstood. So we what does the electricity load profile look like of an actual AI data center today?
C
Yeah, great question. First of all, from a planning perspective, the grid has absolutely no idea what your load profile is going to look like. And that's the way that they study you as a new AI data center load. But let's just back up here. AI data centers nowadays, as Nvidia CEO Jensen Huang calls them AI factories, fundamentally are in the business of transforming electricity into what we call tokens, which are the fundamental input or output unit from AI. And they're doing it increasingly well. So a data center will try very efficiently to take electricity and turn it into compute outputs. And you'll have losses along the way. You'll have losses because of the load of cooling, for example, all the other non computational loads in a data center. Historically, a data center might lose 33% of the power or use it 33% of that power for non IT or information technology uses and the remaining 66, 67% goes into actual computations. Nowadays, with the increasingly customized design of these AI factories and some of the amazing efforts of the hyperscalers such as Google, these numbers are falling and therefore you can get 80 or 90% of the power being turned directly into AI computations. What does that look like to the grid? Well, if you're running a large language Model training run, you might see the power use of that AI data center spike as the training run commences, have brief dips as the AI training run undergoes what's called synchronized checkpoints. So there's this kind of very difficult to predict transient behavior that's wildly swinging. And then after the training run concludes, hours or days later, you might have a large reduction in demand. If you have an AI data center that's fully committed to doing what's called inference, or using these AI models, you might see more smooth but still relatively unpredictable usage patterns from the grids perspective. So that's one of the reasons that AI data centers appear so scary to grids today. You can't really plan for what you expect to see. And these loads look fundamentally different from anything they've ever seen. They're extraordinarily energy dense.
B
Yeah. And you know, it's not dissimilar from kind of everything else in electricity, which is the result is you have to plan for the peak. Right. So the data center says, I need, let's invent a number, 400 megawatts of capacity. The, I think from a grid operator perspective, you basically have to plan for 8,760 hours of 400 megawatts. That is essentially what you were planning for. Right.
C
You're actually planning for even worse than that shale. You're right. Over 8,760 hours, which is one single year. You want to predict or plan for a worst case scenario where the data center, let's say as you suggested, it's a 400 megawatt data center, that 400 megawatts shows up at the absolute worst time of the year. But you're actually planning for even more years than that. When you're running this interconnection study to determine can this data center connect to my system, you're saying in the next seven or 10 years in an absolute worst case scenario. So not just 8,760, but 8,760 times 10,87,600 hours when a transmission line goes down somewhere and it's a record hot day and air conditioning demand is super high on that particular day. Will my 400 megawatt data center request its full 400 megawatts and overload a circuit and if so, can't connect it today, have to upgrade the system before we do that. So that's how data centers are studied today.
B
Okay, and then. But that is a different question. That's sort of. You said it right. That is how data centers are studied today. There is a Separate question of how are they operated, generally speaking, which does not align perfectly with how they are studied. In other words, it is not always true that they are operating at full 400 megawatt capacity if it's a 400 megawatt rated data center. So what do we know about the actual operational profile from an electricity perspective? Assuming you're doing nothing clever like the things we're about to start talking about.
C
And let me say, Shale, before you do anything clever, I actually don't think it's irresponsible or analytically incorrect for the grid to study these data centers in that extremely risk averse way that I just described. Because you're right, Shale data centers do take sometimes years to ramp up their capacity. They'll proceed in phases as you build out the buildings, fill the data halls with the equipment, and begin to actually run the workloads that you'd like to run. And there may also be quite a bit of buffer that you leave on top. But you may only, even if you're running an intensive training run, you may only be utilizing this data hall, 75%, let's say. And so it may very well be the case that that 400 megawatt data center in the foreseeable future does not hit 400 megawatts. And yet I don't think it's incorrect for system operators to plan for a hyperscaler who comes to town and says, I want a 400 megawatt data center to actually use that full entitlement once it's granted. And there are certainly examples of data centers running absolutely full tilt, large data centers running full tilt to the point where unless shale, as you mentioned, you do one of these clever things to intelligently control the consumption when the grid needs you to. The grid has absolutely, you know, it's correct and justified to assume that you may use your consumption at the absolute worst time in full.
B
Yeah, I mean, my understanding of kind of the basic state of affairs is right, so the grid says, okay, I'm going to plan for worst case scenario as I need to do to deliver reliable service. And so I'm going to assume you need 400 megawatts all the time for 10 years. Meanwhile, the data center actually operates differently from that. And data center load profiles, AI load profiles, as I understand it, I mean, particularly for training, but in inference as well, at least in the current iteration of inference, they're surprisingly spiky, so loads can go up and down quite a lot. So maybe you're pulling 400 megawatts. Some of the time maybe you're pulling 200 megawatts some of the time. It's kind of a weird load profile, but to the grid operator, it's unpredictable, which is, I guess, the key point here, which is if you don't know when that load is going to spike or not spike, then again, all you can do is operate as if it is 8,760, 400 megawatts of load. And so that's what people are starting to wake up to, is like, wait a second, like there is this mismatch here. Clearly there is headroom, because the data center does not need to operate all the time at full capacity. But. But taking advantage of that requires doing some things differently, because otherwise the grid operators can't do anything different. Their hands are tied, basically.
C
Yeah, precisely. I think that's really well said. And if I can just take one more moment to set the table here. Shale, earlier you said, hey, look, this isn't dissimilar to what we see from other loads. And I think, you know, I don't probably disagree with you fundamentally, but I do think there are some very peculiar things about AI that are truly dissimilar. One is the extraordinary rate of growth. The power demand from data centers has more than doubled every year the last several years, and that trend shows no sign of abating. A lot of people talk about data center efficiency and the increasing efficiency of the new generations of GPUs, these graphics processing units. Nvidia's Blackwell is much more efficient than Hopper, which is much more efficient than the previous generation, A1 hundreds, et cetera. But that efficiency gain is currently being eaten up by the tremendous growth in computing demand. So even as power demand is more than doubling every year, the reason it's more than doubling is because compute demand is more than quadrupling every year. A 4x increase every year. And the second thing that's truly dissimilar is what I mentioned earlier, the power density. AI's power density is increasing by orders of magnitude, which I don't think any other electricity application has seen in this short span of time. Where we went from 5 kilowatt racks, the rack is, you know, a set of servers and stacked in a single cabinet. That rack might have used 5kW just a few years ago. Today I just was in a data center in Silicon Valley seeing a brand new deployment of Nvidia GB2 hundreds, the BlackB generation. The rack is 132 kilowatts, it's liquid cooled, and we're headed toward 1 megawatt rack. So think of that. That's two orders of magnitude increase in density. These massive data centers occupy a tiny footprint and look like small cities. So both of these trends, the exponential increase in power demand and the shrinking footprint of massive power demand, are stressing grids out in ways we haven't seen before.
B
Okay, so last question on the current state of affairs. Before we talk about the clever stuff, I mentioned this, but I'm curious whether you have visibility into actually what it looks like, which is, is there a, Is there a meaningful distinction in terms of the current operating profile of AI data centers for a training data center versus an inference data center? Do they look different from a load profile perspective?
C
Oh, absolutely. These loads do look different, right? Training loads have a very characteristic profile, and inference workloads have a different characteristic profile. And we talked a little bit about this earlier. A training run looks like, you know, you ramp it up. It can ramp up by tens or hundreds of megawatts. It will kind of randomly, you'll have dips in the power as checkpoints happen. It'll ramp all the way back down, down when the synchronized GPUs stop. With the end of the training run, inference, depending on the set of use cases and the diversity of the applications, can look much more smoothed out. It might in some cases look more like what you've seen traditional cloud computing. Like you've seen, for example, a metadata center might have a load profile that looks like people open their phones in the morning, go to Instagram, and so you see a spike. Similarly, today, people open their phones and go to ChatGPT. And so that's a more familiar load profile. But, but nevertheless, like, you can certainly impute a different kind of workload type from the power signature today. It's one of the things, by the way, that we at Emerald AI have been training an AI model to do. However, an important distinction here is that a data center will not do a single thing for its lifetime. Right? A massive data center, for example, may initially be configured and specified to train a large language model. And then you'll finish training the large language model and then you'll do other things with those GPUs. Those same Nvidia GPUs can then be used for smaller research training workloads. They can be used for inference and fine tuning large models for specific applications. A single data center may be used for one model, and then it's separated out into multiple different types of workloads. So I wouldn't count on any given data center having the same load profile for its lifetime or even more than.
B
A year, which presumably complicates things even a little bit further from the electricity perspective. All right, so let's talk about the clever stuff then. Or at least start to talk about the clever stuff. So the key concept here is can we make data centers look to the grid like flexible assets, which means introducing some measure of predictability and planning into when the load from the data center is below peak, basically. And there are various ways you could do that. From like, basic demand response that says we will tone down demand a few hours a year just at peak, to like, daily flexibility, where you're shifting intraday all the time, so there's lots of different versions of it. But from like, a simple mechanical perspective, just to start, say you want to introduce some measure of load flexibility into a AI data center. What are you actually doing so you.
C
Can achieve flexibility through multiple routes? You can, of course, achieve flexibility through what I'll call the physical infrastructure route. If you have a lot of backup generation, you might fire up the backup generation. Often you're not allowed to. Your diesel generator will violate its air permit if you use it regularly. And so what we at Emerald AI, the company I founded to solve this problem of data center flexibility, what we do @Emerald AI is computational workload orchestration. We want to attack the beating heart of AI's energy demand, which, as I mentioned, increasingly is just the computers, as AI factories become much more efficient and honed at converting electricity into tokens. And to do that, to achieve that on demand flexibility, you take advantage of some of the inherent or latent flexibility that the different AI workloads have. You might, for example, orchestrate a workload that is flexible in time, one that can be slowed down or paused for a certain amount of time. Something, for example, that looks like a fine tuning operation that doesn't need to terminate immediately on time. If what you're doing is taking a large language model and tuning it to a particular enterprise application, that enterprise might not mind if that model is paused for a minute or an hour. And in other cases, you may be taking a model or a AI use case that has flexibility spatially. You might move it from one location to another to save power in one particular data center location while keeping that application running as you move it to a different location. So there are a lot of different ways within this broad framework of achieving spatiotemporal flexibility. And what Emerald AI takes advantage of is there is inherent workload flexibility in the use cases of AI today.
D
Catalyst is brought to you by Energy Hub Energy Hub helps utilities build next generation virtual power plants that unlock reliable flexibility at every level of the grid. The Energy Hub platform takes the guesswork out of balancing energy supply and demand. It uses machine learning to control customer owned distributed energy resources like EVs, home batteries and smart thermostats to precisely shape load profiles for grid flexibility and reliability. As the industry leader, Energy Hub helps more than 80 utilities manage 1.7 million devices, more than any other edge derms on the market. Click the link in the Show Notes to learn more or go to energyhub.com Catalyst is brought to you by Antenna Group, the OGs of PR and marketing for climate tech. Is your brand a leader or challenger? Are you looking to win the hearts and minds of customers, partners or investors? If you're a startup investor, enterprise or innovation ecosystem that's helping drive climate's age of adoption and Antenna Group is ready to power your impact. Visit antennagroup.com to learn more.
A
Do you have questions about how potential policy changes or tariff adjustments could impact development and procurement plans? ANSA can help companies move fast, stay informed and make better procurement decisions with in depth supplier relationships and 20 plus years of industry experience, ANSA's team can help buyers rapidly execute procurement strategies that hedge against trade and policy risk. ANSA offers the industry's most comprehensive platform for supplier product pricing and availability data, plus several gigawatts of US Inventory ready for purchase. Whether you are looking to evaluate risk exposure, move quickly on inventory, or simply gain better visibility into market options, Anza is here to help. Learn more at go.anzarenewables.com latitude.
B
Maybe let's walk through that in a little bit more detail. So let's focus on the temporal component, right? Spatial component. If you have multiple data centers, you shift load from one place to another. Google's actually been talking about doing that for years for the purpose of lower carbon, right? Like they've been saying, one of our ways we're going to reduce the carbon intensity of our computation is by shifting location to location. That feels to me like it is more readily available to the hyperscalers who have lots and lots of data centers, probably within one region, than it is to others. The temporal one, in theory, available to anybody. So what does it look like? So you have some workload that the data center is supposed to undertake. Is it as simple as saying we delay this workload by a few hours or presumably there's more to it than that?
C
It absolutely can be as simple. And let me first give Credit where credit's due. You mentioned Google. Google also by the way, has exploited temporal flexibility. There was a paper, a post they put out a couple years ago. A friend of mine, Varun Mera, wrote it about moving video indexing operations to nighttime in order to reduce load during periods, as you mentioned, shale, when that computation would be not renewables intensive or would be carbon intensive. So exactly as you said, one simple thing to do would be to simply pause a workload. However, that's not going to work for all workloads. And the reason this is tricky and sophisticated is because there are many things you could do, many different requirements that users are going to have for you. And you want to precisely meet a grid target, and you want to make sure that your performance is not sort of approximate, but that you can guarantee to the grid that if they need you to achieve a particular demand reduction, you can certainly do that while respecting the constraints that the users of the AI compute put on you. That dual optimization problem is what makes this complicated. So in addition to pausing and then resuming later on a job that can tolerate a delay, you might slow down a job. You might change the resource allocation of how many chips, for example, are instantaneously being used for a job. Some instances of this are known as auto scaling, where you scale up and down the resource allocation for particular kinds of queries. You might also go all the way down to the underlying silicon, the for example Nvidia chips. And you might change what we call the clock frequency of the chip to change the rate at which computations happen. And so depending on the workload type, a customer may be comfortable with that workload being slowed a little bit. Slowed a lot. And there are some other technical limitations as well. And I'll stop talking in a moment about the complexities because they're fractally complex. But I'll mention, for example, that different workload types can tolerate different amounts of clock frequency changes or power caps. And so you need to know something about these workloads in order to determine, hey, what's the best set of operations that I can do to preserve what the user wants, which is great performance for their AI workload, whether it's training a model, fine tuning a model, et cetera, and precisely what the grid needs, which is not a megawatt more than this limit that we promised to achieve for them. And that is a non trivial problem that's far harder than just eh, I'll just pause a bunch of jobs.
B
Yeah, that differentiation amongst types of workloads I think is sort of important here because if you think just historically pre AI wave, right, there was already the same problem of like lots of data centers, way, way fewer, but lots of data centers that needed what looked to the grid like 24, 7 load, et cetera, et cetera. And the explanation you would always get as to why those loads couldn't or wouldn't be flexible was, well, these are mostly hyperscaler data centers and the hyperscalers are making a commitment to their customers, the ones on whose behalf they're doing this work that they will deliver with low latency or whatever it is. And so you know, they're just, it's just not worth it to them to try to shift this stuff around. They just want to deliver as quickly as they possibly can. So I can imagine there being cases here where that's going to be true too. Certain inference workloads in particular, I can imagine like there isn't really flexibility. But then others maybe like training a model, certainly not as time sensitive. So how do you think about like the workloads and types of compute for which this is especially well suited?
C
Well, first of all, necessity is the mother of invention or changing your business model. And this is one of those cases, Shale, where Look, we've got 50 to 100 gigawatts of latent AI demand in the pipeline. It's just not going to get built unless you have this capability of flexibility. Tyler Norris's viral paper, he's an Advisor to Emerald AI, I should note Tyler Norris's viral paper said, hey, there's 100 gigawatts of spare capacity lying around on grids. If we can just make data centers modestly flexible up to 200 hours a year, they're able to reduce consumption by around 25% for around two hours on average per event. And so if it weren't the case that there was this extraordinary demand for energy, severe limitations and kind of this golden ticket to get it, I don't think we would be changing business as usual, which is the last two decades of SLAs or service level agreements is Shale, as you said, you simply get 24, seven uptime agreements on your power. Given the necessity. Now I think there's a range of AI customers and we've talked to hundreds who are willing to tolerate small levels of changed power availability today. There are different kinds of ways that you can reserve compute capacity. You can have a guaranteed instance where you get that 99.99999% uptime guarantee. You can also have a spot instance where you can Basically just get kicked out anytime or preempted. What Emerald AIs Spatiotemporal Flexibility Technology offers is an almost firm guarantee. It's a guarantee that look, 99% of the time you're going to be left alone. But every so often up to that 100 hours or 200 hours, there might be a mild power cap in which the, in which Emerald Conductor is going to gracefully orchestrate your workloads and you might have to face a power cap. And based on what kind of workloads you're running, we're going to make sure to protect the performance and tolerate delays only where you're willing to tolerate them.
B
So that, so that implies then that sort of answers one of my implicit questions from earlier. So you're focused on the 100200 hours a year. So this is a demand response type application. It's not like a daily load shifting thing. This is like in periods of extreme grid stress, we will, will dial down your power consumption a little bit.
C
You know, to be clear, I think that's where we enter. It's the most pressing need of the hour, no pun intended today. But I think that the same toolkit that harnesses spatiotemporal flexibility, that allows you to for those 100 or 200 hours, provide this demand response, is also the same capability set that would allow data centers to flex on a weekly or even daily basis one day again, if the prices are right, if the incentives are well calibrated. And I think shale, you and I both believe in a grid that is fundamentally abundant, cheap, affordable, and that's going to require a lot of both dispatchable, but also intermittent and not dispatchable energy. And I personally view data centers as a potential holy grail, if not the silver bullet to enable a generation mix like that, one that's far more clean and one that's far more intermittent. So down the road you can imagine that data centers, which today are about 4% of American energy consumption, AI data centers are about 5 gigawatts of load, grow to 12% by the end of the decade. AI data centers could be anywhere up to 50 or even more gigawatts to 25% of American load by 2035 and beyond. They suddenly become by far the biggest user of electricity in this country. And if they have this flexibility toolkit, they can be doing all of these operations, the up to 100 hours demand response, potentially daily shifting. That's what a truly co optimized AI infrastructure and electricity grid infrastructure, massive system would look like. And I think step one is solving this 100 to 200 hour problem and just getting data centers onto the grid and getting grids comfortable that they can perform when called upon.
B
So I think the big question then here is like, how much flexibility ability can you actually offer? It's going to vary, I understand, but I don't think anybody's proposing the 400 megawatt nominal data center turns to 0 megawatts 200 hours out of the year. Right, because you still have H Vac load and all that kind of stuff. And my presumption is you also don't want to. I mean you mentioned this, right? Some of the techniques that you want to employ are things like slowing down the clock speed of a gpu that doesn't dial the load down to zero, it just dials it down some. So what do we know about how much flexibility, how much demand response capacity is realistically latent within say, a 400 megawatt data center?
C
You know, we set out to demonstrate one example of this in Phoenix, Arizona earlier this summer and we published the results along with Nvidia and the Electric Power Research Institute, our partners Salt river project at an Oracle data center. And we said, look, let's take a large cluster of GPUs and let's see what we can get. Can we achieve a 25% demand reduction, which the Tyler Norris Duke paper suggested would be a kind of minimum threshold to achieve this massive amount of headroom. So 25% reduction, sustain it for what the Arizona grid needed, which was a three hour demand reduction, and do so with representative AI workloads. And so we worked with our partner Jonathan Frankel, the chief AI scientist of databricks, who specified for us, look, this is what a representative set of workloads could look like. It was surprising to me, by the way, to hear that he anticipated that just 10% of the workloads on a representative databricks cluster were non preemptible. In other words, they absolutely could not be paused or delayed in any way. That gives us a lot of flexibility to work with. And so we worked with them to develop four kind of representative ensembles of workloads of varying levels of flexibility, some which could be just delayed by a little bit or slowed down a little bit, and some which could be delayed a little more using those representative workloads. We've published a preprint of our academic paper on the archive showing that a 25% reduction is definitely feasible. We even have one of our runs which showed a 40% reduction still met all of the performance requirements for this representative set of users and AI workloads. So there is, I think, a lot of inherent flexibility in the system. And then shale, you can think about layering on other interventions. You can get computational load flexibility alongside, let's say, some limited deployment of batteries. And together you can get much of the data center's consumption to go offline for a small amount of time.
B
When you say still met the performance requirements, is that like, there's something in the sla, they're giving you a representative SLA and you're saying, okay, I still need to meet this, or is it, you know, who defines what the. Because isn't that the key thing? Obviously you can get kind of as much as you want, presuming that the performance requirements allow for it. And so a lot of this to me seems to come down to like, what is the SLA between the data center operator and the customer?
C
You're nailing it. This is the key central question going forward is can we, can we define a new kind of SLA that looks almost like the previous kind of SLA but has, again, less than 1% of the time, the chance that your workloads might get power capped in the most graceful way possible? And again, in talking with hundreds of AI companies, our conclusion is this is definitely doable. It is definitely possible for us to find a large set of customers who are willing to tolerate this kind of disruption, especially because first, AI customers today struggle to get access to computer. You hear OpenAI's Sam Altman talking often about how GPU capacity is a limiting constraint on the expansion of OpenAI's GPT5 model, for example. And others say, hey, the costs of compute, because of the scarcity of compute are really the limiting factor for popularizing and democratizing AI and even for applications that are extremely time or latency sensitive. You know, I recently talked to the CEO of a company that makes a very real time interactive world model. You know, you can step into this world and the data center needs to be quite close to you in order for you to have a good experience. At 30 frames per second, even they can tolerate geo shifting a workload less than 1% of the year. Geo shifting some of the workloads within a 500 mile radius because it's only going to incur less than a 50 millisecond latency penalty. That's, that's acceptable if what that leads to is a much larger set of GPU deployments and therefore better access to compute and maybe even cheaper access to compute. So I Think yes, Shaile. The central question is, can there be a new Powerflex SLA that's slightly different from today's SLAs? And I think the answer is probably yes.
B
All right, so final question for you then. The, the holy grail here is if. Is if you and others can convince the grid operators. You mentioned this before, right. That they can rely upon this type of flexibility, as you said, perhaps in combination with physical flexibility assets as well, such that they know there is a data center that has nominal 400 megawatt capacity, but actually we're going to interconnect it at 300 megawatts or whatever it is. What do you think it's going to take to get that level of comfort from the grid operators? It's been a long road to get traditional demand response there, and this is like a whole other level of complexity now, as you said, necessities of motherhood, invention. But what's your sense of, like, what are you going to have to prove to get grid operators to trust it?
C
That's a really great question. To answer it, I recently was invited to speak at the Electric Power Research Institute's summer seminar. There are 100 utility and grid operators CEOs in the audience, and I asked all of them for the same thing. I said, please participate alongside the AI companies in an escalating series of demonstrations approaching commercial scale. And we at Emerald AI plan to hit commercial scale early next year. We're very excited to have whole data centers be power flexible in partnership with our collaborators such as Nvidia, which is our biggest investor, because that's that data, that ground truth reliability information is what's needed for grid operators and utilities to believe that this is actually a thing that AI, far from being the scariest liability that's getting added to grids, could actually be the most promising asset that we can add to grids. They've got to see it to believe it. So we're working with a range of partners. I mentioned the collaboration with EPRI and Oracle and Nvidia and SRP in Phoenix. But now we have upcoming demonstrations all over the United States and increasingly around the world, which I'm very excited about, to showcase that data centers can be flexible and get grid operators very comfortable. One last thing I'll mention is in order for a grid operator utility to bank on the fact that, hey, when I call this resource, it's actually going to perform the way I need it to, Emerald has developed something called the Emerald Simulator, which is a digital twin that imagines what would happen if we did certain orchestration operations, we moved some workloads around, we paused or slowed workloads. And as we've submitted in our academic paper, it's extremely accurate. And that accuracy, built out over many more demonstrations, is going to be critical to prove to utilities and grid operators that in fact the system is going to work the exact way you expect it to. And if it doesn't, in that absolute worst case, there will be some fail safe mechanism to make sure that it does work. So there's a lot of convincing work to do, but I sometimes feel we're pushing on an open door. You know, when I talk to the chairman of a regulatory commission, you know, you pick your large east coast state. That chairman said, I've got the governor knocking on my door every month and saying what have you done for me to bring data centers to my state because I want to economically compete with all the other states? Regulators, utilities, system operators are all balancing this trade off between providing reliable and affordable electricity, but also bringing economic development and this extraordinary new source of demand, the greatest economic opportunity humanity's ever seen to their state. Data center flexibility is a way to end the trade off between those two halves. You can have it all at the same time. It's the reason I left everything I've been doing in my career and founded this company to do just this for the next decade of my life. So really excited about it.
B
Varun this was fun. Thank you again for coming back.
C
Really appreciate the time. Shill. Thank you so much for having me.
B
Varun Sivaram is the Founder and CEO of Emerald AI. This show is a production of Latitude Media. You can head over to latitudemedia.com for links to today's topics. Latitude is supported by Prelude Ventures. This episode was produced by Daniel Waldorf. Mixing and theme song by Sean Marquan Stephen Lacy is our Executive editor. I'm Shayl Khan and this is Catalyst.
Release Date: August 28, 2025
Host: Shayle Kann
Guest: Varun Sivaram (Founder & CEO, Emerald AI)
This episode of Catalyst explores the evolving concept of data center flexibility and its implications for the electricity grid amid massive growth in AI (artificial intelligence) infrastructure. Host Shayle Kann sits down with Varun Sivaram, CEO of Emerald AI, to discuss how data centers—historically seen as rigid, inflexible electricity consumers—can adapt their operations to become dynamic, flexible assets for grid reliability and decarbonization. The conversation dives into the technical, market, and regulatory nuances as data centers’ energy demand skyrockets and pressures power systems.
“AI’s power density is increasing by orders of magnitude, which I don’t think any other electricity application has seen in this short span of time...These massive data centers occupy a tiny footprint and look like small cities.”
— Varun Sivaram [10:54]
“You want to predict or plan for a worst-case scenario where the data center...shows up at the absolute worst time of the year...If so, can’t connect it today, have to upgrade the system before we do that.”
— Varun Sivaram [07:00]
“You might slow down a job...change how many chips...are instantaneously being used...or go all the way down to the underlying silicon and...change the clock frequency of the chip to change the rate at which computations happen.”
— Varun Sivaram [20:00]
"This is one of those cases where...We've got 50 to 100 gigawatts of latent AI demand...it's just not going to get built unless you have this capability of flexibility."
— Varun Sivaram [23:30]
"It was surprising...that just 10% of the workloads...were non-preemptible. That gives us a lot of flexibility to work with."
— Varun Sivaram [28:19]
"That data, that ground truth reliability information is what's needed for grid operators and utilities to believe that this is actually a thing...They've got to see it to believe it."
— Varun Sivaram [33:22]
This episode lays out a vision for AI data centers as not just an unprecedented demand on the grid, but potentially its most valuable and responsive asset. Data center operators, AI firms, and grid managers will need to move beyond outdated assumptions, develop new SLAs and partnership models, and invest in trust-building pilots. According to Varun Sivaram, flexibility is not only technologically feasible but increasingly economically and operationally essential to the future of both AI and clean electricity in the US.