
Loading summary
Kevin Fraser
Hey, Ryan Reynolds here from Mint Mobile. Now I don't know if you've heard.
Dan Zhao
But Mint's premium wireless is $15 a month.
Kevin Fraser
But I'd like to offer one other perk.
Dan Zhao
We have no stores. That means no small talk. Crazy weather we're having. No, it's not. It's just weather. It is an introvert's dream.
Kevin Fraser
Give it a try@mintmobile.com Switch upfront payment.
Commercial Announcer 1
Of $45 for three month plan $15 per month equivalent required. New customer offer first three months only, then full price plan options available, taxes and fees extra.
Commercial Announcer 2
See mintmobile.com youm've worked hard to build business. SimpliSafe helps you protect it with SimpliSafe for business, AI powered cameras watch over your entry points and instantly alert live monitoring agents. They can deter intruders before they get inside. It's protection built for growing companies. 24. Seven monitoring, no contracts and a 60 day money back guarantee. To get 50% off your new system, go to SimpliSafe.com podcast that's SimpliSafe.com podcast for 50% off. There's no safe like SimpliSafe foreign.
Alan Rosenstein
It'S the Lawfare podcast. I'm Alan Rosenstein, Associate professor of Law at the University of Minnesota and a Senior editor and Research Director at lawfare. Today we're bringing you something a little different, an episode from our new podcast series, Scaling Laws. It's a creation of lawfare and the University of Texas School of Law, where we're tackling the most important AI and policy questions. From new legislation on Capitol Hill, new to the latest breakthroughs that are happening in the labs, we cut through the hype to get you up to speed on the rules, standards and ideas shaping the future of this pivotal technology. If you enjoy this episode, you can find and subscribe to Scaling Laws wherever you get your podcasts and follow us on X and bluesky. Thanks for listening. When the AI overlords take over, what are you most excited about?
Kevin Fraser
It's not crazy, it's just smart.
Alan Rosenstein
And just this year, in the first six months there have been something like a thousand laws.
Kevin Fraser
Who's actually building the scaffolding around how it's going to work, how everyday folks are going to use it?
Alan Rosenstein
AI only works if society lets it work.
Kevin Fraser
There are so many questions have to be figured out and nobody came to my bonus class. Let's enforce the rules of the road.
Podcast Host
Welcome back to Scaling Laws, the podcast brought to you by lawfare and the University of Texas School of Law that explores the intersection of AI policy and of course, the law. Kevin I'm Kevin Fraser, the AI Innovation and Law Fellow at Texas Law and a senior editor at Lawfare. Artificial intelligence is sometimes framed as a magic bullet for solving big problems, from discovering new drugs to planning smart cities. But the infrastructure that powers these models uses electricity and water and a lot of it. A December 2024 Department of Energy report found that AI data centers already account for 4.4% of electricity consumption, which is estimated to double or triple by 2028. That's a trend that seems likely to continue even after this decade, given that OpenAI announced the construction of five new data centers as part of Project Stargate, and other labs seem poised to follow suit. So why exactly does AI use so much energy, and is it cause for alarm or merely a fact of technological advance today on scaling laws to explore these questions and more, we have Moshaf Chaudhary, who is an associate professor at the University of Michigan and one of the directors of the ML Energy Lab, and Dan Zhao, an AI researcher at mit, Google X and Microsoft who focuses on AI for science and sustainability and energy efficient AI. Giddy up for quite the ride. To get in touch with us, email scalinglawsawfairmedia.org and with that, we hope you enjoy the show.
Kevin Fraser
Thank you to both of you for coming on.
Musharaf Chaudhary
Thank you for having us.
Dan Zhao
Thank you for having me.
Kevin Fraser
Awesome. Masraf, what would you say is the common understanding of AI's energy consumption to the extent there is one?
Musharaf Chaudhary
So AI's energy consumption have been in the news a lot for last three, four years and at the beginning there were no good tools to precisely measure how much energy these models consume for training and inference and as a placeholder people were using and that's because for the lack of better tools they were using estimations to get a sense of rough order of magnitude of energy consumption, which essentially means that how much a GPU consumes at its peak and then how many GPUs you might have and then assuming that many GPUs are needed to do a training or for inference, then you just multiply all these big numbers then assume that they are always running all the time. Then you end up with a very large number which have been reported bigger than Netherlands and Ireland and so on. So full of holes is in that sense there is a lot of overestimation it used to be and that led to many news articles which are honestly I don't know, makes sense if you see those numbers people would be concerned. So I Think that's what the quote is referring to. Right.
Kevin Fraser
So we've seen all these stats, as you mentioned, from the early days in particular, of AI, where seemingly every day we were learning, oh, now it's the energy rate of 100,000 homes for three months at the height of summer, and now it's the amount of energy of Ireland, and now it's New York during the super bowl, and so on and so forth. Which are great for tweets. I'm sure they get a lot of viral traction, but maybe not the most empirically driven. And Masarov pointed out the critical difference between training and inference that comes into this picture as helping us get a sense of where this energy use is actually coming from. Dan, can you kind of explain for listeners who perhaps aren't as steeped in AI vocab as the three of us presumably are, what is the difference between training and inference and how is that relevant to this energy conversation?
Dan Zhao
Sure, I think that's a very good question. So a while ago, and you know when I say while, I contextualize that in AI space. So just a year or two ago, right, people were so in AI time.
Kevin Fraser
That'S at least like 15 years ago.
Dan Zhao
Exactly. Right. And so basically, if you think about it this way, like, everyone was very occupied with the energy costs of training, right? These large behemoth models needed to undergo something called pre training, which basically means you take a very large model, you feed it tokens, and essentially it trains very slowly, depending on your hardware or the generation of GPUs you have, the networking, the clusters you have, so nothing really accessible to your common folk with a single gpu, for example, consumer grade. But these sort of large companies, large labs, they're basically training these models on tons and tons of tokens, right? And basically getting this sort of large language model out of it. So you can think of your ChatGPT, for example, you can think of your llama that are released, for example. And so during these times, you are actually trying to train your model to get up to a certain level of performance, and then that model should be good enough for all sorts of downstream tasks from chatting, tool use, so on and so forth. Inference is a little different because inference has kind of allowed people to think a bit more flexibly about taking these models. So they don't necessarily need as many computational resources for as long or as intensively. You basically think about taking a model that's already pre trained and then really setting it up for deployment downstream, right? So when you think about inference, it's basically the things that people nowadays are much more familiar with when they engage with ChatGPT, for example, you're basically throwing in a query that's coming out the other end response. It seems easy, but under the hood there's a whole bunch of complicating factors. There are ways to route different requests. If you think about it, millions of requests, maybe every five minutes go to ChatGPT. They need to find a way to route it through their servers, make sure their servers don't crash. They need to make sure that things are maintained properly, quality doesn't degrade. They need to think about toxic stuff or think about harmful content. They need to filter out. And so inference is really, you take a trained model, you throw things in and you hopefully get desirable things out. And that's much more faster and much more easier in some senses of the word than training, for example, is.
Kevin Fraser
And so one of the things that I got called out on in preparation for this podcast, and thanks to excellent research from Leo Wu, was the assumption, I assume many people hold one of those common understandings that the energy use has to be this training function. Right? Whenever we hear about training these large language models, and in particular the frontier models here, we think about these massive data centers being built in the middle of nowhere with tens of thousands of GPUs and even more CPUs. They're running 24, 7, 365 days a year and just they need water to stay cool, they need tons of energy sources. Surely it's got to be the training that's driving these energy costs. Musharraf, why is that not the case? Where are we seeing the preponderance of energy use coming from?
Musharaf Chaudhary
So that's a very good question. Thank you. Training essentially happens, say only once. It's not really once, but just to get a sense of you train a model once and then when it gets deployed, as Dan was saying, then millions of people at this point, hundreds of millions of people are using the same model. And of course you retrain the model, you fine tune it many different way, but none of them comes close to how many people are actually interacting with the model. So a single training instance consumes a lot of energy, tens of thousands of watt hour. Based on some of the smaller model and open source model that have been published, we don't really know how much actual energy is consumed by training. ChatGPT scale models. It looks big because we can point to this model, took this much energy to train, but it gets much, much bigger when you think about all the millions of people sending tens of queries. Each person, each inference request consumes a small amount of energy, but when you multiply it by hundreds of million, they add up to very big number. And all these big data centers, they are not necessarily just serving training, they are also serving inference requests. So the ratio between energy consumption of training and inference depends on what company you are talking about and what model. But many of the numbers that I have heard from different providers ranges from 30, 70 and 40, 60. I mean, the smaller one being training and the bigger one being inference. So at the beginning, when these models were not very popular, you could easily think that training consumes the lion's share. But as more and more people use, it easily gets dwarfed by all the people and all of their requests.
Kevin Fraser
And I think what's critical to point out about the fact that inference is driving so much of this energy use, the calls you're making to the LLM, the prompts you're sending, presumably this is only going to skyrocket as we see AI agents become more and more ingrained to our daily lives. These AI tools that are able to act autonomously on our behalf, if they are taking all these actions, pursuing all these tasks for us, those inference, the amount of inference energy costs should presumably only go up. Which raises a lot of questions that we'll get to in a second, such as whether we should say thank you to our AI or not. But I'll leave that aside for just one second. Dan, we have talked a little bit about the fact that we need a lot of water, we need a lot of energy and electricity to make sure that we can train these models and engage in inference as well. What's it actually look like in terms of where the water's coming from, what sources of energy we're relying on, and why might that be a concern for folks from a sustainability standpoint?
Dan Zhao
Sure. So if you think about it, and I'll probably talk very simply without going too many into the details, just so folks get the big picture, right? So if you just heard it from me, which I appreciate, I mean, everyone appreciates the finer details, but I think the big picture here is probably more important. So if you think about, let's say when I was at mit, for example, at Lincoln Labs, at CSAIL and the supercomputing center, you walk in and you basically see racks among racks of these sort of GPUs that are basically just sitting there. And when you walk in, the first thing you notice, it's quite hot, right? Like these GPUs. People can think about these GPUs as these small black boxes. Essentially, they consume power because they need the power to run calculations, do a whole large sum of number of electronic operations, and basically these GPUs kind of run. And there's this common trope for me that GPUs go brrr. They indeed do go brrrr. And to do so basically requires switching power through a lot of the billions of transistors that are kind of on these GPUs to basically come up with these mathematical operations for either training or inference or whatever it might be, right? So obviously the GPU sit at the center of that now with the GPUs. Like, even when the GPUs aren't active, when they're idle, like these transistors, they still kind of have static power leakage. So there's still power that's being consumed there, even if that not being used. And one thing I should note is also, like, not many people know how to efficiently get every single drop of efficiency out of GPU usage, right? When people run a model, load a model and run it on gpu, they don't know things like, oh, I need to basically overlap my communication costs with compute. I need to keep my GPUs busy. If you didn't know what that meant, forget I said it, it's not a big deal. But GPUs are an expensive resource and we need to keep them busy and running. Other things involve things like, oh, you want to basically saturate the memory of your gpu, otherwise a lot of this sort of power that's going to be consumed anyway is kind of going to waste, right? And so there are these considerations that are restricted to the GPU as well. So that's one aspect. Then there's also.
Kevin Fraser
Sorry, so you're essentially saying that because GPU time is so scarce and so valuable, right? We hear all the time about new startups that are just renting out GPU space, for example, just so that folks can train new tools or allow for greater inference. So the economic incentive here is to run these as much as possible for as long as possible, rather than saying, oh, you know, let's take a break and let them, let them breathe and stretch before their next training.
Dan Zhao
There is a human dynamic here, as there is to all human systems. And as a former economist, I tend to appreciate the frustrations that go with that. Let me put it this way. Democratizing AI is great. So more people have access to resources to compute, to run Experiments, and hopefully innovation will be born out of those as well. But on the flip hand, you have more and more people using GPUs inefficiently and those GPUs are going to be run one way or another. It's in some form a tragedy of the commons if you think about it, right? Because suppose that this is what we did at mit, for example. We came up with a system that basically caps the power draw GPUs. So that basically meant was we limited the amount of power these GPUs could draw. And this is currently in practice at MIT Super Computing Center. Now what this did was we save a lot of energy in aggregate and we found there's a sweet spot where users don't notice a single or any sort of perceptible difference on their job performance. The problem there is, well, if they realize that we've capped their power and they were able to get their jobs done in shorter amounts of time and they were still saving energy, well, people might still order more jobs to run anyway, just because. Right. And so that energy or that effort you spare might still get used up anyway because people are going to say, oh, those GPUs are still going to be running. I might as well be running something on them. Right? And so when more people come into the deep learning world and they're running GPUs, you know, learning how to run GPUs efficiently is very important. And sometimes it's the job of big labs and other places that kind of do the behind the scenes magic to make sure that you squeeze every drop of efficiency out, if that makes sense.
Kevin Fraser
Right? So essentially you could think of a novice farmer going off and trying to gather all the wheat. They may do it in a horribly inefficient manner. They're expending all these man hours, they're using the wrong tools, so on and so forth, way more labor intensive. Whereas the trained farmer, the big labs, for example, they know how to harness.
Podcast Host
These resources most efficiently.
Kevin Fraser
So that's. Is that a fair characterization?
Dan Zhao
It is fair, but it's also clear that, you know, there's this knowledge and information asymmetry, right? Which is natural. You can't really blame one side or the other. Like, for example, even if you had folks who, if you gave folks full access, for example, to the gpu, to the scheduler behind the GPU resourcing, if you gave people access to the networking behind the cluster, like they wouldn't really know what to do with that anyway, right. And so it's not necessarily a fault that's be assigned to either side. It's just that it's a natural consequence of any new technology that needs to be used. Especially if this new technology comes with these sort of costs that need to be borne by one side inevitably.
Musharaf Chaudhary
Right.
Dan Zhao
That then spill over aggregately into society. So it is a bit of a sort of natural circumstance as well.
Kevin Fraser
Right. So you have to learn how to farm to start planting stuff. That's going to be a natural process. So Dan, I cut you off. You were going, you were mentioning that we have the GPU kind of incentives in terms of running frequently or using them to the greatest extent possible. What's this second kind of driver of energy consumption?
Dan Zhao
Yeah. So when you think about a lot of these things that come up nowadays, I think Masharov had mentioned this as well, like demand. It's just simply a matter of demand as well. People, more and more people are wanting to use these things. More and more people might use them inefficiently and more as more people use these things and as this sort of technology and span of capabilities grow, the more we can do with these models. I'm not sure if you saw Sora 2 come out the other day. Its video generation capabilities are amazing. The more that's going to take and then the capabilities. I'll give you a more concrete example. So as capabilities grow, modalities grow. So for example, video generation is one example that we didn't really think about as much as text generation back in the day. And again back in the day is like two years ago. And so things are going to grow. And then this also plays into what you were mentioning earlier, Kevin, about agents. Right. So for example, most recently when I was a senior research scientist at Microsoft, when you think about agents, the amount of iterations that they take in a single action is far more than a single round of chat between you and chatgpt the fact that you also then can use images, images also spend quite a bit of tokens more so than images and then text sometimes depending on the number of images you're using to do a single task, which is much more complex than answering a single question, for example, on a computer like that's going to drive things. So the increase in capabilities, the increase in demand, along with what I mentioned at first, I think those three make a powerful combination when it comes to. That's up.
Kevin Fraser
Thanks, Dan. And you're affirming one of my bad jokes and I have many of them, which is if a picture is worth a thousand words, then a generated image is worth 10,000 tokens. So you're welcome to steal that from me anytime you want. Masarov, what are you doing about this? What is your team doing to help solve this issue? Because we mentioned earlier at the outset that there are these holes in this conversation and yet you know, Google released a report saying that there's about 0.24 watts per median query that gets used when you're using Gemini and apparently that's the same as operating a microwave for a second. I don't really know what that means in terms of how I should change or alter my approach to using Gemini or ChatGPT, but the labs are sharing information. Sam Altman sent out a tweet about some of OpenAI's energy uses. What gaps are you filling or how is your team working on this issue?
Musharaf Chaudhary
Thank you for asking this question. So we have to go back to earlier when I said that there were gaps few years ago because people simply didn't have tools. So for the last almost five years at the University of Michigan and with many of my colleagues across the U.S. i'm leading the ML Energy Initiative and where we are trying to build tools to precisely measure and understand and then optimize energy consumption of AI. So one of those tools is zeus, which we built from ground up where it can interact with different kinds of GPUs and CPUs and memory and collect over time how much energy is being consumed by the workload that is running and the workload being AI inference or training. And using zeus we have measured the sort of as precisely as possible within the GPU how many watt hours or joules of energy are being consumed and we produce this, this output which I will refer to as the ML Energy leaderboard where we measure the energy consumption of open source models. So it led to two things. One is it gave people a tool but also a methodology in terms of how to think about energy measurement. At the beginning of the podcast when you mentioned these MIT Tech review article, they actually used our tools and worked with us for almost six months to measure all the models we did but couple other questions they have had. So these tools allows us to actually help journalists to collect all these numbers and that also caught a lot of attention from different companies. For example, if you read this article or white paper from Google you will see that they also refer to ML Energy benchmark and they talk about how we measured and how they have expanded the methodology to make it even more accurate by considering idle machines in their data centers and they Came up with this median number, by the way, which should be watt hour instead of watt because it's energy. But that's okay. And so that's one part what we are doing is making tools available. So nobody can say that it's, it's impossible, it's very hard, we don't have means to measure it, and it's difficult. The other part that we are doing is in the optimization space. So zoos measures, but zoos also optimizes the way it optimizes. Part of it is figuring out the precise power cap to set in a job specific way. Then earlier mentioned in at MIT they were capping all of the servers, but Zeus have an optimization method that sets individual GPUs power in a unique fashion that together works well for the training job to actually perform better. And it can be applied to small models where it fits in a single gpu. But it can be applied perhaps even more effectively for bigger models because of the distributed computation structure of the model. So essentially I will simplify it and give an overview of why it works for distributed model. Are bigger models because they are too big? They don't fit in a single device. Oftentimes you need 1632 GPUs just to hold one copy of the model to start training. Fewer for inference. But essentially what happens is that the computation flows through all of the devices and that leads to multiple computation paths. Some of them we will call the paths that are dictating the runtime and some non critical. The ones that dictate are the critical ones. So all of the computation that are happening on the critical path, if they slow down even a little bit, your entire training, your inference request will slow down. But all the ones outside the critical path that are non critical, they can be slowed down as long as they don't become slower than the critical path. And so what we have built is a tool that automatically finds this critical path and precisely computes how to slow down everybody outside the critical path by setting precise frequency of the GPU that it should be operating at and at precise points in. So it's a coordinated dance across thousands of GPUs that happens at tens of millisecond granularity and then that allows us to save up to 20, 30% of energy consumption of training. So when you set 300,000 or however many homes now, you can get the same thing done in 200,000 of how many homes and save all of this energy, which as Dan mentioned again, like we'll still be used for doing more training. Instead of saving. But at least our work is making sure we are effectively using the energy we are paying for.
Ben Whittis
Hey folks, Ben Whittis here and I just have a confession for you. I'm not into crypto, just not into it. It's just numbers on a screen. I don't believe it'll really hold its value. I think its main use case is crime. Gold and silver are tangible. They're not just numbers on a screen. Unlike crypto, these are real assets with thousands of years of trust behind them. And that's why I want to tell you about Noble Gold. Noble Gold is a US based team. It is available from 6am to 6pm Monday through Friday and Saturdays offering personalized and consistent support. Pricing is clear and upfront. No hidden fees or tricky fine print. Gold IRAs let you hold real assets in a tax deferred or tax free retirement account and it's the number one ranked gold IRA country four years in a row with more than $2.5 billion in precious metal transactions. So the folks at Noble Gold are here to serve. They anticipate questions and guide people through complex topics with patience and clarity. They're informed. They know the markets, the metals and the strategies and they explain them in a way that's easy to understand. So take advantage of this limited time offer. Open a new qualified IRA or cash account with Noble Gold and get a free 10 ounce silver flag bar plus a silver American eagle proof coin. Visit noblegoldinvestments.com lawfair that's noblegold investments.com lawfair.
Commercial Announcer 1
Take control of the numbers and supercharge your small business with Xero. That's X E R O. With our easy to use accounting software with automation and reporting features, you'll spend less time on manual tasks and more time understanding how your business is doing. 87% of surveyed US customers agree Xero helps improve financial visibility. Search Xero with an x or visit xero.comacast to start your 30 day free trial. Conditions apply. If you're an experienced pet owner, you already know that having a pet is 25% belly rubs, 25% yelling drop it and 50% groaning at the bill from every vet visit. Which is why Lemonade Pet insurance is tailor made for your pet and can save you up to 90% on vet bills. It can help cover checkups, emergencies, diagnostics, basically all the stuff that makes your bank account nervous. Claims are filed super easily through the Lemonade app and half get settled instantly. Get a'@lemonade.com pet and they'll help cover the vet bill for whatever your pet sold swallowed after you yelled drop it. If you're a custodial supervisor at a local high school, you know that cleanliness is key and that the best place to get cleaning supplies is from Grainger. Grainger helps you stay fully stocked on the products you trust, from paper towels and disinfectants to floor scrubbers. Plus, you can rely on Grainger for easy reordering, so you never run out of what you need. Call 1-800-GRAINGER click granger.com or just stop by Granger for the ones who get it done.
Kevin Fraser
Well, Mushrof, I'm still trying to get the two step down here in Austin, so I think the dance you just described is far beyond my skills. But I'm impressed that we're seeing just how nuanced we can be in making these training runs more efficient and making inference more efficient. And I wonder how we can try to get more transparency around the fact that these efficiency gains exist and what would it look like to sort of mandate or encourage the adoption of those mechanisms. But I'll, I'll get there in a second because I think it's also important to flag that your benchmarking work is pivotal in terms of getting a more holistic picture of where and how energy is actually being used by these models. But you noted something critical, which was you were analyzing open source models and as I'm sure all good scaling laws listeners know, the big boys, for lack of a better phrase, with the exception of LLAMA from Meta, are all closed sourced. And so Dan, how are we trying to remedy the lack of information and the lack of transparency around some of the biggest companies and the biggest models and the energy uses here? And what work is your team doing to amplify some of the efficiency gains that Musharraf was talking about?
Dan Zhao
Yeah, so our work at MIT focused a lot on sort of energy efficiency from a very early time period. But you know, at the time we didn't think that this would actually go anywhere. This is back in 2021, 2022, we were looking at like training, measuring the energy of training and inference for, geez, back in the day, Resnet. So these were old, old CNNs that still see some use nowadays for computer vision. We were doing things for BERT, like models, simple BERT, like models, GNNs for molecular inter atomic potential measurements and things like that. We eventually moved on to works where we tried to benchmark the inference costs or energy inference for LLMs. But back then LLAMA had just come out and no one really was paying attention to energy inference. The main difficulty is that it's almost near impossible, unfortunately, because there are several factors at play here. The most reasonable way people have gone about it has been trying to find models of a similar size, be it in terms of the number of parameters or flops, comparable to whispers on the wind, as to what you would think the big companies are doing, or in some ways tracking alternative data about measurements of certain emissions. But then again, energy does not equal emissions because different energy where energy is drawn will translate into different emissions. If you're using energy from wind farms and windmills versus energy from coal versus energy from nuclear, those emissions are going to be vastly different, right? So all these approaches are imperfect, but they try to get at a scale of measurement. And so that makes things very difficult when it comes to trying to understand it from the angle that you're describing. So what most people have tended to do has been just to try and offer improvements, right? Because at the end of the day, dollars and cents are more likely stronger incentives to sort of move things around. So advances in both hardware as well as advances in the model, algorithmic side research, well, which we do is just trying to put out there that, hey, if you adopt this, you can reduce energy, which means you can basically push more throughput or save on dollars and cents. GPUs go brrr. And basically everyone's happy, right? But like I said at the beginning, there is this information asymmetry that does make things difficult. And so the proxies are what we're using as open data, especially during our, during with our work at mit, that's essentially what we can do at this point. And basically using these open models as testbeds. So for example, if you look at llama, llama 1, llama 2, llama 3, 3.1, 3.2, et cetera, using these techniques and adapting them and applying them on these models and saying, hey, look at how much we've reduced model flops this way, look at how much latency we've saved this way. Look at, or more specifically like, oh, benchmarking on V100 or A100 or whatever else, saying like, okay, in our setting we show that energy efficiency increases this way, but only really the large labs themselves, when they do it on their own hardware, on their own network, stacks on and so forth, that will they know the price, precise numbers. But all we can do is sort of offer solutions that we see work on open benchmarks, open models, and then basically take it from there. So that's how I would think or how I see the current state of things today, at least.
Kevin Fraser
Yeah. So this will come as no shock to regular listeners, which is to say I'm generally pretty bullish on AI. And notwithstanding my optimism around its use cases and its potential to help ameliorate many social woes, I think there's a sense, economically, culturally, so on and so forth, AI is here to stay. Obviously it's not going anywhere. The amount of billions, if not soon, trillions of dollars invested in this space suggests that the momentum will continue. And so I think that your point, Dan, of saying how are we going to help this process be more efficient because we know it's going to continue, makes a heck of a lot of sense. But if you don't mind, I want to personalize this for just a second because I think there are some folks, you know, you can read the New York Times or you can just observe it. Anytime there's an AI conversation, there will be someone who will say something along the lines of I don't use AI or I rarely use AI because I think of the environmental ramifications as being too high to justify whatever Studio Ghibli meme I'm gonna get out of this or whatever silly new joke I'm gonna have it generate for the two of you. And Masraf, I'd love to start with you, if you don't mind answering. I'm guessing you use AI pretty regularly in your day to day life. Or are you a AI vegan who. Who tries to limit as much as possible your AI user?
Musharaf Chaudhary
I mean, I use AI when needed. I was talking to my student yesterday and they were asking like what AI do I use? I said I don't pay for AI, so I just use for very simple things like when I need to learn something, to be quickly written, proofread and that type of work. But not too much in terms of idea generation. Not because I am for or against is just that I want to. I don't know, I feel like ideation is one of the fun part of being a professor and I want to keep it to myself and take some time to think about ideas. But I mean it sort of makes sense to use it as a tool like any other tool when you feel like it's going to make things faster. At least that's the way I see it. To me it's just a tool that is very good at doing certain things and the things that I think it is good at, I'm going to use it for Right.
Kevin Fraser
When you see a nail, you grab a hammer. When you need some improved editing, you grab ChatGPT. Makes total sense there. I get that. Dan, how about yourself? Are you a frequent AI user and in what senses?
Dan Zhao
I was going to say technically I have to be. As a researcher, you kind of have to tinker with these models. When you work on, for example, developing your agentic models or trying to make models more efficient, you kind of have to use it, although probably not in the way that most people would want to use it. In terms of communicating solely via text, I was the largest holdout for a very long time simply because I1 didn't want them to have my training data. That's number one. Perhaps a bit paranoid or futile in the end, but you tell me number two, I think it was also because the other reason was because I was always suspicious of like, oh, if I use this, and this does reduce friction and save time for me, I'm wondering whether or not certain sets of gears in my head will no longer click as quickly as they would in the past. Right. So I've been tried to at least very be conservative in terms of my usage. Although I will admit like in certain things, like there's this, for example, when I write papers, I use latex.
Musharaf Chaudhary
Right.
Dan Zhao
Overleaf has come in. That's helped. And now I don't have to spend half an hour trying to correct the table formatting because I can't figure out how, why the table won't render a certain way. I'm sure Mussoraf knows exactly what I'm talking about as well. I give it to ChatGPT, it figures out for me. I don't ask questions. Yeah, blame me up for that, feel free. But that's, that's how I see it.
Kevin Fraser
I will not play you for that and I will not detail all the images I use for my students that try to make, you know, boring case law a little bit more exciting. Yeah, but it sounds like from the two of you, the, the main driver isn't necessarily in a concern out of energy consumption by way of using AI when it comes to your own personal habits. So can you steel man and really bolster the case for the people who say I don't want to use AI because of these energy costs. What, what is the most persuasive argument for saying yes indeed, you should refrain or perhaps scale back your AI use because of the energy intensive nature of it. Dan, I know this is a toughie, but I'm going to start with you.
Dan Zhao
Sure. I would say that it all kind of harks back to what I was saying at the very beginning about the tragedy of the continents. Right. If you won't submit that single query, someone else will. Right. And so that energy, that opportunity cost is very, very small in that sort of aggregate setting. Right. And so if from a purely economic point of view, if you're thinking about the productivity gain versus the, the very small sliver of energy cost. So for example, one of our works on benchmarking the large language model, energy costs, show that a single query, it's not really worth a lot, really, in terms of energy from a single person over time, over demand, over usage and utilization, it will grow. Absolutely. But as an individual submitting a query once or twice for a session, it's a negligible difference. Right. It's really the common, this tragedy of the commons and the coordination issue that comes up in aggregate as an externality that kind of really produces these effects in aggregate. And that's sort of something to worry about. And it's also something that can only really effectively be addressed on the aggregate at the higher level. So I would say, like, given the productivity benefits potentially to you for using these. For me, for example, formatting latex tables will save me a whole bunch of time. I'm not claiming I could use that time to save the world, but I can use that time to continue research on energy efficiency, like the work at mit, Right. To be able to do these things. But that probably would be my first and foremost argument when I think about why am I myself not cutting back? Am I just being a hypocrite? I'm sure I am. But at the same time, when I think about the actual concrete effects, that's sort of the calculus that goes on in my head.
Kevin Fraser
And Musharraf, I wonder when you pick up the paper and you see things like using AI for this number of queries at this time of night is akin to turning on the microwave for eight seconds. Do you think, can we please stop doing this? This is so annoying. This isn't really helpful to anything. How if you were the AI information czar for a day, how would you change how we're talking about AI and energy? What do you think are the most important things that policymakers should be talking about that the public should be aware of, that labs should be disclosing?
Musharaf Chaudhary
It's very hard question because people want something that people can relate to so they can understand, okay, this is how much we are talking about. It could have been instead of a microwave, some people have used how many light bulbs, how some people have used how many miles and meters in electric vehicle, how many homes sometimes I must have also used and. Or how many pounds of coal to burn. People just want something physical so that people can relate to. And I think things like microwave and teaspoon of water, those things come up. They're looking for household things so that everybody can think of, okay, this is how much is it? And it's nothing. I'm opening the microwave so many times a day. And just to get across this sense that as Dan was saying, individual users, one query is so small it only is a problem because all of us are using it and it will only be solved when most of us stop using it. A few people adding and deleting is not going to make a dent one way or the other because each of the individual queries is not infinite, similarly small, but really, really, really small. And so in terms of how to express this, one way of going about it would be, I think, changes in culture and education system, I would say, because it's very easy for us to understand that second versus 10 second and 100 second and what it means. And in the U.S. of course, we use like how many miles you have to drive. It doesn't have to be time the other way around, depending on who you talk to. So similarly, for energy, we have to create and sort of cultivate this language that we can use that everybody understands. I think all of these different comparisons show up. People are still searching for the right way of posing it so everybody can understand the same thing and we'll keep looking for different examples. Personally, I don't really have any suggestion like people have suggested to me many different ways because every time I give this type of talks for years, I just say, okay, this many jewels. And they say, okay, you should use these or that sort of. I know what these jewels even mean, like, how can I relate to it? But none of them seem perfect, so I end up not using any of them. And unfortunately, as the informations are, I don't really have a solution that I can provide right away.
Kevin Fraser
Masraf, I'm sorry, you're definitely fired from your job, but it's okay, you know, you can put it on your LinkedIn still, you were your information czar for all of two minutes. So Dan, I wonder, for the folks who are concerned about this space and want to make sure that we are using and training AI models as efficiently as possible, what are the primary bottle. Excuse me, what are the primary bottlenecks that you and your team may be facing. Is it a matter of a lack of information from the labs? Is it a matter of resources for your own work? Is it, what, what, what's holding this research back?
Dan Zhao
Yeah, I, I'm probably going to take a rather unconventional answer compared to some of the answers I've given on similar places. So I think public education is probably going to be very important. So when we think about. So given the places I've been, for example, right, like in big research labs, in public academic labs, industries, so on and so forth, everyone wants to do the sexy thing. So they want to be like, oh, I want to build the next agent that gets, you know, 99% performance on OS World or Windows Age arena, which are like these agentic benchmarks that are built nowadays that determine what are state of the art in terms of like computer usage capabilities. Right. No one's really thinking, oh, how do I really save entry? And part of this is because per Mashrav's point, it's a bit abstract to think about what that necessarily means. And also measuring it is so difficult as well. Right. And so as a result it becomes very berkey. So I think public education is important. Having people understand what a GPU does and how you can translate making based on that understanding of what GPU, how GPUs work, right? And understanding, oh, this is why I want to fill up my memory, pick up a batch size along with the model size so that the GPU memory is filled or saturated. I think people need these kind of focus on the sexy things they understand like, oh, this is how, these are how elements work, these are components that go down, this is what self attention is, so on and so forth. But they really, really think about, oh, this is what's actually happening on device, on the gpu, right? Or oh, when this is getting sent to a data center, this is what's potentially happening. Or when they think about, oh, this is how a GPU architecture and how that maps to, let's say a loop in deep learning, training or inference. Their deeper understanding and appreciation of this will not only make sure that those individual effects feed up on aggregate, but also at the same time they're also going to get a bigger bang for their buck when they run these GPUs, right? They're going to get more efficiency, they're going to get better performance. So it's a good benefit from both sides. It's just that it takes that fixed cost to overcome for people to learn because it's not easy. If it were easy, everyone would be doing it. And there's also that initial cost of inefficiency that's required to actually learn and get there, like I mentioned at the beginning. So at least that's my somewhat optimistic take at this point.
Kevin Fraser
Yeah, it's really interesting because for me, and again, I'm just a lowly law professor, the thing that comes to mind is you don't need a Ford F150 to drive in downtown Austin, right? The roads are stable, you're not going over any massive cliffs, you're just driving on a paved road. You can get by with a Fiat or like I do, you can ride around on your bike. But in the context of using AI, there's this idea of, oh well, why not just use the reasoning model? Why not go to deep research to answer the question of what should I eat for dinner tonight, right? You're going to use that model in a far more energy intensive way. And if we're not talking about this in kind of layperson's terms, then users may not understand the difference between how they're engaging with these models. But to your point, Dan, obviously there's also just an education point on the startup side of if you are getting into this space, thinking how you can pick up on those more efficient training mechanisms sooner rather than later is super fascinating. So Masharoff, given that, you know, you all have the world to save, I don't want to take up too much of your time. What is driving your optimism in this space or your pessimism in this space? What, what is kind of top of mind for you when you put your head on the pillow?
Musharaf Chaudhary
I'm optimistic. Let's start with that. I think there is a lot of efficiency to be extracted. So I have this vision, what I call energy optimal AI as I think it came up multiple times. If we don't do it, someone else will. So AI is going to happen. What we want to do is to make the AI happen within the same amount of time, getting to the same level of accuracy or more, but figuring out what is the minimum amount of energy we need to sort of spend to get to that point. And to do that we need an approach, that's what I call a full stack approach. Starting at the very top of all the models and algorithms that people are creating. There is a lot of innovation that are happening there. In the middle at the software layer that is translating all of the models to be executed on the hardware. There's a lot of work we are doing. Dan is doing a lot of other our colleagues are doing. And then at the bottom layer, sort of in the hardware level, there's massive amount of progress that are happening. There are new kind of accelerators coming up and existing ones are being updated and made more efficient and added with more cores and so on and so forth. So as AI becomes commoditized and democratized, it will also get cheaper because everybody at every layer are working on to make it more efficient and cost effective even for these big boys. Because at the end of the day everybody wants to, I don't know, make sure that they give better service for lower cost, either to make it cheaper for everybody or at least to make more profit for themselves. So for the fixed amount of AI that we want, the energy cost I think will keep going down. The only thing that's happening here is that we are still at the beginning of AI. I think there is a lot more for so called AI to be had. I think Dan mentioned earlier Sora 2 and then Google has VO3. All of these things we are going to world models where there are much more, bigger and more expensive than whatever text and other things we are doing. Soon we will say back in the days we used to do text, they were so much better and doesn't consume anything. Whereas now we are living in this, I don't know, world model and whatever else comes up. So that's what sort of is going to drive the energy consumption. But as we go through all of these stages, I think there will always be these opportunities to optimize them and make sure that we are getting as close as possible to, to energy optimality for that particular type of AI. So I am quite optimistic that we continue to find different ways to keep the cost low and so it doesn't sort of go out of bounds. Yeah, sorry for it.
Kevin Fraser
No, that's great. Sounds like you sleep well at night, which is good to hear. And Dan, I'll tee you up with one final question, which is we've talked quite a bit about the market being pretty effective here where there's a huge driver for the labs themselves to be more efficient. What is your commentary for the state legislators who are introducing, e.g. aI energy related bills? Is this a moment of saying, hey, just let the market do its thing, let's let this play out? Or do you think this is a time of saying, no, we need to mandate the sort of efficiency gains we're seeing by lab A and make sure that labs B through Z are applying them as well?
Dan Zhao
Yeah, that's a very complex question there. Right. It's a very multifaceted question too. And it depends because the differing incentives in the sort of complex whirlpool of things make having a clear cut answer also very difficult. But I'll give it a shot anyway. So first and foremost, obviously innovation is still going to be key. If pursuing energy efficiency gains blindsidedly and imposing those on to or taking those strategies and posing them on places that may not work, that probably won't take priority like in terms of reality, people are probably going to put energy efficiency considerations aside in terms of chasing pure improvement performance, be it in terms of measured from either a model performance perspective in terms of throughput tokens per second, model form in terms of improved loss, or be it in for example model flops, so on and so forth, or in terms of more product metrics when it comes to actually putting these products into putting these LLMs and agents into product speed, like oh, better user engagement, so on and so forth. So that's one consideration. The other consideration is also that we probably don't want a one strategy fits all at this stage. I feel like we're still very early on when it comes to trying to understand what strategies work because again, everyone might be working with different model components like the different elements that are out there. Architectures are still quite different though a lot of them are still bounded in LLM land. When it comes to all these sort of autoregressive decoder style models, multimodal so on and so forth, common motifs, but each one is operating probably most likely on different constraints, different powering constraints, different networking infrastructure, different compute clusters, different data centers, so on and so forth. So these ops, these operations are very different and opx, capex are likely very different too. So when it comes to understanding, at least for state legislators, what's in their own backyard first is probably most important, right? If you have places that are concentrating your backyard that might be a specific profile, then you might want to target towards that profile first. But that also requires understanding, I think a lot of what's going on there. And that understanding I think might be lacking in state legislatures at this moment, especially because of the sort of lack of subject matter expertise. Better for worse. The other thing I'll also say is alluding back to an earlier point, I think that in terms of further efficiency gains, I don't know how long LLMs will be the main focus. We've seen slight differences in model architectures. A lot of the current ecosystem is Indeed based on LLMs, these sort of auto aggressive decoder style models that are multimodal, et cetera, et cetera, as far as we know. But one thing that I do think is universal or near universal is probably human behavior. Like if we can somehow induce human behavior to change in terms of how they're able to operate or collectively work together towards something, that would be great. I'll give you an example. So machine learning conferences, I'm sure those are like the bread and butter that make everyone's day, right? Places I typically submit to are things like Europe, sci, cml, iclr, as many researchers do. Other folks prefer other settings, like different IEEE settings, supercomputer, supercomputing, so on and so forth. If you look at like at mit, as we did like at usage, you'll see them spike around these deadlines because everyone's panicking and everyone's delaying and everyone procrastinates. Fine. But like my point of saying this is that even if eventually we do fall off the LLM wagon and go towards a new architecture, and then we have to redraw from scratch or rebuild from some basics of what efficiency techniques mean for these new model architecture types rather than LLMs. And we've already seen this happen a little bit. So back in the day used to be dense LLMs. Now MOEs were a mixture of experts, started a little more popular and they found their ways into LLMs. Efficiency techniques that worked solely on LLMs that didn't count for these MOE additions don't really work as much, need some reworking. But human behavior is always universal. So if we can get human behavior, you know, on board, then, you know, I'm probably happy with that long winded way of saying it depends and all.
Kevin Fraser
We need to do is change human nature, which easy smeezy, right? I'm sure, I'm sure we're all just ready to do that right after we close this episode. But you all have some work to do. Thank you for doing it. I find it fascinating and surely I'm going to be sending you a note to please come back soon, but for now, Masharoff Dan, thanks so much for joining.
Dan Zhao
Yeah, thank you for having us.
Podcast Host
Scaling Laws is a joint production of lawfare and the University of Texas School of Law. You can get an ad free version of this and other Lawfare podcasts by becoming a material subscriber at our website lawfairmedia.org support. You'll also get access to special events and other content available only to our supporters. Please rate and review us wherever you get your podcasts. Check out our written work@lawfaremedia.org. you can also follow us on X and Blue Sky. This podcast was edited by Noam Osband of Goat Rodeo. Our music is from Alibi. As always, thanks for listening.
Commercial Announcer 2
Change isn't coming. It's already here. Commerce is going digital and tax complexity is multiplying. Tax rules evolve, rates shift, data floods in. Vertex connects it all. A global tax compliance platform powered by tax ready data and intelligent systems. Smarter Tech. Continuous tax compliance built in confidence. Learn more@vertexinc.com.
Date: October 17, 2025
Host: Kevin Fraser (AI Innovation and Law Fellow, Texas Law; Senior Editor, Lawfare)
Guests:
This episode explores AI’s rapidly increasing impacts on global energy and resource consumption. The hosts and expert guests discuss public misconceptions about AI’s energy use, clarify the distinction between training and inference, and cover efforts in research, industry, and policy for creating more energy-efficient AI systems. They consider whether rising AI-related energy demands should cause alarm, debate the limits of individual impact, and envision the future of energy use in AI as both a technical and policy challenge.
“At the beginning there were no good tools to precisely measure... so people were using estimations to get a sense of rough order of magnitude… multiplying all these big numbers… you end up with a very large number, which have been reported [as] bigger than Netherlands and Ireland... There is a lot of overestimation.”
“Large behemoth models… undergo something called pre-training… not accessible to your common folk… large companies, large labs, basically training these models on tons and tons of tokens.”
“Training… happens only once… when it gets deployed… hundreds of millions of people are using the same model… each inference request consumes a small amount of energy, but when you multiply it by hundreds of million, they add up to very big number… the ratio between energy consumption of training and inference depends… but many of the numbers… ranges from 30/70 and 40/60… the bigger one being inference.”
Why AI Uses So Much Power and Water
“You walk in [a supercomputing center], and the first thing you notice, it’s quite hot… GPUs go brrr...even when not active, transistors have static power leakage… not many people know how to efficiently get every single drop of efficiency out of GPU usage.”
“Tragedy of the Commons” Dynamic
“If [users] realize we’ve capped their power… they might still order more jobs… energy you spare might still get used up anyway.” (14:46)
Growing Modalities & Complexity
Filling the Data Gaps
“We have built… a tool that automatically finds this critical path and precisely computes how to slow down everybody outside the critical path… [saving] up to 20, 30% of energy consumption of training.” (23:30)
Transparency & Closed vs. Open Models
“Only really the large labs themselves, when they do it on their own hardware, on their own network… will they know the precise numbers.”
“If you won’t submit that single query, someone else will… as an individual submitting a query… it’s a negligible difference. It’s a tragedy of the commons and a coordination issue…”
“People want something that people can relate to… but really, each individual query is so small…” (40:55, Musharaf Chaudhary)
“Having people understand what a GPU does… making based on that understanding… they’re going to get more efficiency… but it takes that fixed cost to overcome for people to learn because it’s not easy.”
“I have this vision… energy-optimal AI… AI is going to happen. What we want to do is to… get to the same level of accuracy… but with minimum energy… At every layer—model, software, hardware—innovation is happening to make it more efficient.”
“Innovation is still going to be key… we probably don’t want a one-strategy-fits-all. [Labs use] different model components, architectures, constraints… When it comes to state legislators, what’s in their own backyard… might be most important… and that understanding might be lacking right now.”
On Overblown Energy Stats
“It led to many news articles which are... [understandably concerning].”
– Musharaf Chaudhary (04:03)
On the Inference Boom
“As more and more people use it, [training energy] easily gets dwarfed by all the people and all of their requests.”
– Musharaf Chaudhary (09:12)
On Inefficiency in AI Use
“It’s in some form a tragedy of the commons… even if you efficiently allocate, demand will fill the vacuum.”
– Dan Zhao (14:46)
On Benchmarking Efforts
“We have built… a tool that automatically finds this critical path… and that allows us to save up to 20-30% of energy consumption of training.”
– Musharaf Chaudhary (23:30)
On Individual Impact
“If you won’t submit that single query, someone else will. It’s a negligible difference… It’s the tragedy of the commons and the coordination issue that comes up in aggregate…”
– Dan Zhao (38:39)
On the Communication Challenge
“People want something physical so they can relate to… but none of [the analogies] seem perfect.”
– Musharaf Chaudhary (40:55)
On the Path Forward
“For the fixed amount of AI that we want, the energy cost I think will keep going down.”
– Musharaf Chaudhary (47:59)
On Policy & Human Behavior
“If we can somehow induce human behavior to change… I’m probably happy with that.”
– Dan Zhao (54:39)
This episode dismantles headline-grabbing myths about AI’s energy demands and replaces them with data-driven perspective. Training does consume enormous resources, but as AI applications proliferate, routine inference by millions worldwide is the larger, growing factor in total consumption. Technical innovation—across hardware, algorithms, and measurement—shows promise for major efficiency gains. Yet, individual restraint is negligible in the tragedy-of-the-commons dynamic: Large-scale progress will hinge on transparency, industry incentives, smart policy, and better public/technical education around the true costs and tradeoffs of AI adoption.
Host Contact: scalinglawsawfairmedia.org
Subscribe: Find Scaling Laws wherever you get your podcasts.