wavePod

The mechanics of data center flexibility - Catalyst with Shayle Kann | Wave AI Podcast Notes

The mechanics of data center flexibility

Thu Aug 28 2025

With Google’s expanded demand response and EPRI’s DCFlex initiative, the industry is putting its early demand-shifting capabilities to the test. So how does data center flexibility actually work?

Summary

Catalyst with Shayle Kann

Episode Summary: “The mechanics of data center flexibility”

Release Date: August 28, 2025
Host: Shayle Kann
Guest: Varun Sivaram (Founder & CEO, Emerald AI)

Brief Overview

This episode of Catalyst explores the evolving concept of data center flexibility and its implications for the electricity grid amid massive growth in AI (artificial intelligence) infrastructure. Host Shayle Kann sits down with Varun Sivaram, CEO of Emerald AI, to discuss how data centers—historically seen as rigid, inflexible electricity consumers—can adapt their operations to become dynamic, flexible assets for grid reliability and decarbonization. The conversation dives into the technical, market, and regulatory nuances as data centers’ energy demand skyrockets and pressures power systems.

Key Discussion Points & Insights

1. Data Centers and Their Growing Energy Challenge

Conventional grid view: Data centers are perceived as flat, unchanging loads, operating 24/7 at peak demand. However, actual operations differ and are often more flexible than assumed.
Growth & density: AI data center power demand is doubling annually; compute demand is quadrupling (10:54). Rack power density has surged from 5 kW a few years ago to 132 kW today, heading toward 1 MW per rack.
Quote:

“AI’s power density is increasing by orders of magnitude, which I don’t think any other electricity application has seen in this short span of time...These massive data centers occupy a tiny footprint and look like small cities.”
— Varun Sivaram [10:54]

2. How AI Workloads Translate to Electricity Loads

Training vs. Inference:
- Model training spikes sharply; unpredictable and energy-intensive.
- Inference workloads are smoother but still variable.
- Data center repurposing: Workload types can change over a center’s lifespan, making historical data a poor predictor of future patterns. [13:10]
Grid complexity:
- Grid operators must plan for worst-case scenarios (e.g., 10 years at full demand), which leads to overbuilding and challenges interconnection.
- Quote:
  
  “You want to predict or plan for a worst-case scenario where the data center...shows up at the absolute worst time of the year...If so, can’t connect it today, have to upgrade the system before we do that.”
  — Varun Sivaram [07:00]

3. The Flexibility Opportunity: "Clever Stuff" Explained

Physical vs. Digital Flexibility:
- Physical: Onsite generation, batteries (often limited by regulation).
- Digital: Orchestrating workloads—slowing, pausing, or shifting computational jobs in response to grid needs. [15:43]
Temporal vs. Spatial Flexibility:
- Temporal: Pausing, delaying, or slowing workloads.
- Spatial: Moving workloads between locations/data centers (often for cost or carbon benefits).
- Quote:
  
  “You might slow down a job...change how many chips...are instantaneously being used...or go all the way down to the underlying silicon and...change the clock frequency of the chip to change the rate at which computations happen.”
  — Varun Sivaram [20:00]

4. Service Level Agreements (SLAs) & Types of Workloads

Historical barrier:
- Rigid SLAs promising near-perfect uptime made flexibility unattractive.
New model:
- Customers increasingly willing to tolerate minor, well-defined interruptions—e.g., allow power capping 100–200 hours/year.
- Spot vs. Guaranteed Instances: Multiple tiers of compute availability give workaround options.
- Quote:
  
  "This is one of those cases where...We've got 50 to 100 gigawatts of latent AI demand...it's just not going to get built unless you have this capability of flexibility."
  — Varun Sivaram [23:30]

5. The Scale and Limits of Data Center Flexibility

Demonstrated results:
- Oracle/Nvidia/EPRI/Emerald AI pilot in Phoenix: 25% demand reduction for 3 hours with representative AI workloads (some cases up to 40% reduction) while maintaining user performance [28:19].
- About 10% of workloads may be completely non-flexible.
  
  "It was surprising...that just 10% of the workloads...were non-preemptible. That gives us a lot of flexibility to work with."
  — Varun Sivaram [28:19]
Limits:
- Not all power (e.g., HVAC) is deferrable.
- Amount of shift is dictated by the mix of workloads and agreed SLAs.

6. Market and Regulatory Acceptance

Barriers:
- Grid operators need rigorous, real-world demonstrations and robust verification before recognizing flexible data centers as reliable grid resources.
Path forward:
- Emerald AI’s “Emerald Simulator” (digital twin) forecasts the effects of orchestration, helping to build confidence for grid operators.
- Ongoing and planned demos with utilities and research institutes are essential to mainstream adoption.
- Quote:
  
  "That data, that ground truth reliability information is what's needed for grid operators and utilities to believe that this is actually a thing...They've got to see it to believe it."
  — Varun Sivaram [33:22]

Notable Quotes & Memorable Moments

[06:34] On grid planning:
“You want to predict or plan for a worst-case scenario...When a transmission line goes down somewhere and it's a record hot day...Will my 400 megawatt data center request its full 400 megawatts and overload a circuit and if so, can't connect it today, have to upgrade the system before we do that.”
[12:49] On rapid AI growth:
"The power demand from data centers has more than doubled every year the last several years...compute demand is more than quadrupling every year."
[14:48] On workload volatility:
"A data center will not do a single thing for its lifetime...A single data center may be used for one model, and then it's separated out into multiple different types of workloads."
[20:00] On mechanisms for flexibility:
“You might slow down a job. You might change the resource allocation...You might also go all the way down to the underlying silicon and...change the clock frequency of the chip..."
[23:30] On market need:
“We've got 50 to 100 gigawatts of latent AI demand in the pipeline...it's just not going to get built unless you have this capability of flexibility.”
[28:19] On real-world results:
“It was surprising...that just 10% of the workloads on a representative databricks cluster were non-preemptible. In other words, they absolutely could not be paused or delayed in any way.”
[33:22] On convincing grid operators:
"They've got to see it to believe it...that ground truth reliability information is what's needed for grid operators and utilities to believe that this is actually a thing..."
[34:30] On the stakes for states:
“…That chairman said, 'I've got the governor knocking on my door every month and saying what have you done for me to bring data centers to my state because I want to economically compete with all the other states?’ Regulators, utilities, system operators are all balancing this trade off..."

Important Timestamps

03:49: Data centers as “AI factories”; core concepts of electricity-to-token conversion
06:34: How grid planning for data centers is performed—risk aversion, long-term peak scenarios
10:54: What makes AI data center loads unique: fast growth and extreme density
13:10: Distinction between AI training and inference; workload profiles
15:43: Introduction to demand flexibility—physical vs. digital
20:00: Practical methods for temporal workload flexibility
23:30: Market drivers: why flex is now essential, not optional
28:19: Case study—demonstrating data center demand reduction in Phoenix
30:43: The critical role of SLAs and customer willingness
33:22: The path to regulatory acceptance and trust

Conclusion

This episode lays out a vision for AI data centers as not just an unprecedented demand on the grid, but potentially its most valuable and responsive asset. Data center operators, AI firms, and grid managers will need to move beyond outdated assumptions, develop new SLAs and partnership models, and invest in trust-building pilots. According to Varun Sivaram, flexibility is not only technologically feasible but increasingly economically and operationally essential to the future of both AI and clean electricity in the US.

Loading summary...

Transcript

A (0:02)

Latitude Media covering the new frontiers of the energy transition.

B (0:07)

I'm Shayl Khan and this is Catalyst.

C (0:11)

You might slow down a job. You might change the resource allocation of how many chips, for example, are instantaneously being used for a job. You might also go all the way down to the underlying silicon and you might change what we call the clock frequency of the chip to change the rate at which computations happen.

B (0:29)

Coming up what does it actually look like to make a data center flexible?

D (0:42)

Imagine a world where connected devices like EVs, home batteries and smart thermostats work together to support a more efficient, reliable and affordable power grid. EnergyHub is making this vision a reality today with Energy Hub's Edgederms platform. Utilities can create virtual power plants through customer centric flexibility programs, making it easy to manage distributed resources and balance the grid. Unlock grid flexibility and reliability through cross der management with Energy hub, the trusted edgederms leader. Visit energyhub.com to learn more. Catalyst is brought to you by Antenna Group, the communications and marketing partner for mission driven organizations developing and adopting climate solutions. Their team of experts help businesses like yours identify, refine and amplify your authentic climate story. With over three decades of experience as a growth partner to the most consequential brands in the industry, their team is ready to make an impact on day one. Get started today@antennagroup.com need to accelerate procurement.

A (1:40)

For an upcoming solar or storage project. ANSA is your best source of intel to stay on top of current policy, tariff, domestic content and supply chain issues. ANSA's team of experts is available to help you adjust procurement strategies, secure safe harbor products and find existing inventory in the US as policy continues to evolve. Learn more about ANSA subscription and service options to help you navigate an uncertain market@go.anzarenewables.com Latitude.

B (2:13)

I'm Shel Khan. Invest in early stage companies at Energy Impact Partners. Welcome. So the conventional wisdom about data centers is that from an electricity perspective, they look like totally flat loads that is operating 24, 7, 365 and without much willingness to change that. But as power increasingly becomes the choke point for more data center infrastructure development, the world is waking up to a bunch of ways in which that's not entirely or necessarily true. First, you can put generation or batteries on site to shave peak load. That's the physical solution. But there are also digital solutions, it appears. First, because data centers aren't actually operating at nameplate peak most of the time anyway, but also second, because you might actually be able to make the workloads themselves a little bit flexible. Google actually made a big announcement about doing this at their data centers just a few weeks ago. They announced that they've partnered with two utilities, Michigan Power and tva, to introduce demand response via workload flexibility in their data centers. But our guest today is my old friend Varun Sivaram, who's also working on this problem. His company, Emerald AI, is building a software platform that is intended to make data centers flexible. As with many things in electricity, the devil is in the details. And in this case, the details involve what do we mean by flexibility? How do we actually get it? What are the SLAs between the data center operators and their customers? How are the grid operators going to think about it? There are a lot of nuances to this, so let's get into it. Here's Varun. Varun, welcome back Shail.

C (4:16)

Yeah, great question. First of all, from a planning perspective, the grid has absolutely no idea what your load profile is going to look like. And that's the way that they study you as a new AI data center load. But let's just back up here. AI data centers nowadays, as Nvidia CEO Jensen Huang calls them AI factories, fundamentally are in the business of transforming electricity into what we call tokens, which are the fundamental input or output unit from AI. And they're doing it increasingly well. So a data center will try very efficiently to take electricity and turn it into compute outputs. And you'll have losses along the way. You'll have losses because of the load of cooling, for example, all the other non computational loads in a data center. Historically, a data center might lose 33% of the power or use it 33% of that power for non IT or information technology uses and the remaining 66, 67% goes into actual computations. Nowadays, with the increasingly customized design of these AI factories and some of the amazing efforts of the hyperscalers such as Google, these numbers are falling and therefore you can get 80 or 90% of the power being turned directly into AI computations. What does that look like to the grid? Well, if you're running a large language Model training run, you might see the power use of that AI data center spike as the training run commences, have brief dips as the AI training run undergoes what's called synchronized checkpoints. So there's this kind of very difficult to predict transient behavior that's wildly swinging. And then after the training run concludes, hours or days later, you might have a large reduction in demand. If you have an AI data center that's fully committed to doing what's called inference, or using these AI models, you might see more smooth but still relatively unpredictable usage patterns from the grids perspective. So that's one of the reasons that AI data centers appear so scary to grids today. You can't really plan for what you expect to see. And these loads look fundamentally different from anything they've ever seen. They're extraordinarily energy dense.

C (10:54)

Yeah, precisely. I think that's really well said. And if I can just take one more moment to set the table here. Shale, earlier you said, hey, look, this isn't dissimilar to what we see from other loads. And I think, you know, I don't probably disagree with you fundamentally, but I do think there are some very peculiar things about AI that are truly dissimilar. One is the extraordinary rate of growth. The power demand from data centers has more than doubled every year the last several years, and that trend shows no sign of abating. A lot of people talk about data center efficiency and the increasing efficiency of the new generations of GPUs, these graphics processing units. Nvidia's Blackwell is much more efficient than Hopper, which is much more efficient than the previous generation, A1 hundreds, et cetera. But that efficiency gain is currently being eaten up by the tremendous growth in computing demand. So even as power demand is more than doubling every year, the reason it's more than doubling is because compute demand is more than quadrupling every year. A 4x increase every year. And the second thing that's truly dissimilar is what I mentioned earlier, the power density. AI's power density is increasing by orders of magnitude, which I don't think any other electricity application has seen in this short span of time. Where we went from 5 kilowatt racks, the rack is, you know, a set of servers and stacked in a single cabinet. That rack might have used 5kW just a few years ago. Today I just was in a data center in Silicon Valley seeing a brand new deployment of Nvidia GB2 hundreds, the BlackB generation. The rack is 132 kilowatts, it's liquid cooled, and we're headed toward 1 megawatt rack. So think of that. That's two orders of magnitude increase in density. These massive data centers occupy a tiny footprint and look like small cities. So both of these trends, the exponential increase in power demand and the shrinking footprint of massive power demand, are stressing grids out in ways we haven't seen before.

C (20:00)

It absolutely can be as simple. And let me first give Credit where credit's due. You mentioned Google. Google also by the way, has exploited temporal flexibility. There was a paper, a post they put out a couple years ago. A friend of mine, Varun Mera, wrote it about moving video indexing operations to nighttime in order to reduce load during periods, as you mentioned, shale, when that computation would be not renewables intensive or would be carbon intensive. So exactly as you said, one simple thing to do would be to simply pause a workload. However, that's not going to work for all workloads. And the reason this is tricky and sophisticated is because there are many things you could do, many different requirements that users are going to have for you. And you want to precisely meet a grid target, and you want to make sure that your performance is not sort of approximate, but that you can guarantee to the grid that if they need you to achieve a particular demand reduction, you can certainly do that while respecting the constraints that the users of the AI compute put on you. That dual optimization problem is what makes this complicated. So in addition to pausing and then resuming later on a job that can tolerate a delay, you might slow down a job. You might change the resource allocation of how many chips, for example, are instantaneously being used for a job. Some instances of this are known as auto scaling, where you scale up and down the resource allocation for particular kinds of queries. You might also go all the way down to the underlying silicon, the for example Nvidia chips. And you might change what we call the clock frequency of the chip to change the rate at which computations happen. And so depending on the workload type, a customer may be comfortable with that workload being slowed a little bit. Slowed a lot. And there are some other technical limitations as well. And I'll stop talking in a moment about the complexities because they're fractally complex. But I'll mention, for example, that different workload types can tolerate different amounts of clock frequency changes or power caps. And so you need to know something about these workloads in order to determine, hey, what's the best set of operations that I can do to preserve what the user wants, which is great performance for their AI workload, whether it's training a model, fine tuning a model, et cetera, and precisely what the grid needs, which is not a megawatt more than this limit that we promised to achieve for them. And that is a non trivial problem that's far harder than just eh, I'll just pause a bunch of jobs.

C (28:19)

You know, we set out to demonstrate one example of this in Phoenix, Arizona earlier this summer and we published the results along with Nvidia and the Electric Power Research Institute, our partners Salt river project at an Oracle data center. And we said, look, let's take a large cluster of GPUs and let's see what we can get. Can we achieve a 25% demand reduction, which the Tyler Norris Duke paper suggested would be a kind of minimum threshold to achieve this massive amount of headroom. So 25% reduction, sustain it for what the Arizona grid needed, which was a three hour demand reduction, and do so with representative AI workloads. And so we worked with our partner Jonathan Frankel, the chief AI scientist of databricks, who specified for us, look, this is what a representative set of workloads could look like. It was surprising to me, by the way, to hear that he anticipated that just 10% of the workloads on a representative databricks cluster were non preemptible. In other words, they absolutely could not be paused or delayed in any way. That gives us a lot of flexibility to work with. And so we worked with them to develop four kind of representative ensembles of workloads of varying levels of flexibility, some which could be just delayed by a little bit or slowed down a little bit, and some which could be delayed a little more using those representative workloads. We've published a preprint of our academic paper on the archive showing that a 25% reduction is definitely feasible. We even have one of our runs which showed a 40% reduction still met all of the performance requirements for this representative set of users and AI workloads. So there is, I think, a lot of inherent flexibility in the system. And then shale, you can think about layering on other interventions. You can get computational load flexibility alongside, let's say, some limited deployment of batteries. And together you can get much of the data center's consumption to go offline for a small amount of time.

C (33:22)

That's a really great question. To answer it, I recently was invited to speak at the Electric Power Research Institute's summer seminar. There are 100 utility and grid operators CEOs in the audience, and I asked all of them for the same thing. I said, please participate alongside the AI companies in an escalating series of demonstrations approaching commercial scale. And we at Emerald AI plan to hit commercial scale early next year. We're very excited to have whole data centers be power flexible in partnership with our collaborators such as Nvidia, which is our biggest investor, because that's that data, that ground truth reliability information is what's needed for grid operators and utilities to believe that this is actually a thing that AI, far from being the scariest liability that's getting added to grids, could actually be the most promising asset that we can add to grids. They've got to see it to believe it. So we're working with a range of partners. I mentioned the collaboration with EPRI and Oracle and Nvidia and SRP in Phoenix. But now we have upcoming demonstrations all over the United States and increasingly around the world, which I'm very excited about, to showcase that data centers can be flexible and get grid operators very comfortable. One last thing I'll mention is in order for a grid operator utility to bank on the fact that, hey, when I call this resource, it's actually going to perform the way I need it to, Emerald has developed something called the Emerald Simulator, which is a digital twin that imagines what would happen if we did certain orchestration operations, we moved some workloads around, we paused or slowed workloads. And as we've submitted in our academic paper, it's extremely accurate. And that accuracy, built out over many more demonstrations, is going to be critical to prove to utilities and grid operators that in fact the system is going to work the exact way you expect it to. And if it doesn't, in that absolute worst case, there will be some fail safe mechanism to make sure that it does work. So there's a lot of convincing work to do, but I sometimes feel we're pushing on an open door. You know, when I talk to the chairman of a regulatory commission, you know, you pick your large east coast state. That chairman said, I've got the governor knocking on my door every month and saying what have you done for me to bring data centers to my state because I want to economically compete with all the other states? Regulators, utilities, system operators are all balancing this trade off between providing reliable and affordable electricity, but also bringing economic development and this extraordinary new source of demand, the greatest economic opportunity humanity's ever seen to their state. Data center flexibility is a way to end the trade off between those two halves. You can have it all at the same time. It's the reason I left everything I've been doing in my career and founded this company to do just this for the next decade of my life. So really excited about it.