
David A. Patterson is a pioneering computer scientist known for his contributions to computer architecture, particularly as a co-developer of Reduced Instruction Set Computing, or RISC, which revolutionized processor design.
Loading summary
David A. Patterson
David A. Patterson is a pioneering computer scientist known for his contributions to computer architecture, particularly as a co developer of Reduced Instruction Set Computing, or risc, which revolutionized processor design. He has co authored multiple books, including the highly influential Computer A Quantitative Approach. David is a UC Berkeley Party Professor Emeritus, a Google Distinguished Engineer since 2016, the RIOS Laboratory Director, and the RISC V International Vice Chair. He received the 2017 Turing Award together with John L. Hennessey for pioneering a systematic quantitative approach to the design and evaluation of computer architectures with enduring impact on the microprocessor industry. In this episode, he joins Kevin Ball to talk about his life and career. Kevin Ball, or K. Ball, is the Vice President of Engineering at MENTO and an independent coach for engineers and engineering leaders. He co founded and served as CTO for two companies, founded the San Diego JavaScript Meetup, and organizes the AI in Action discussion group through Latent Space. Check out the Show Notes to follow K. Ball on Twitter or LinkedIn or visit his website Kball LLC.
Kevin Ball
It is my absolute honor today to welcome Turing Award winner David Patterson to the show.
David A. Patterson
Thanks for having me.
Kevin Ball
I'm excited to have you. And I have a bunch of questions. I just I love getting to geek out with people on stuff, but I'm curious before we start. You have a long, illustrious background. You've been in a whole bunch of different domains of the tech industry. How do you introduce yourself these days? What do you bring forward?
David A. Patterson
I think I usually start with I was a Berkeley professor of computer science for four decades and eight and a half years ago I started working for Google. So I've almost got a decade at Google. So I've got a half century of experience in the field.
Kevin Ball
I love the encapsulation as a half century. And the field isn't that much older than that.
David A. Patterson
Yeah, the field's about as old as I am, so it's in the late 40s, is the very beginning of the ENIAC and things like that. And the first when I decided to study computer science, that had no reaction from my relatives. So it's not like in the 1960s everybody said computing was the future. It was just this kind of side thing that few people were interested in. And now my relatives think I was very wise in the field I picked.
Kevin Ball
Yeah, absolutely. So the subject that you won the Turing Award on was related to kind of risk and MIPS and that sort of domain. And I know I saw that you've been deeply involved with the RISC V project or at least you wrote the book or one of the books on it. So I'd love to kind of dig in a little bit and get your perspective on RISC V, what it means and what people are trying to do with it.
David A. Patterson
All right, so when I explain it to my relatives, what I say is when software talks to hardware, there's a vocabulary. And the technical name for that vocabulary is instruction set. And the words of that vocabulary are like the keys of the calculator and subtract and multiply, dividend that stuff. And so the so called reduced instruction set computer debate which we had in the 1980s was what's the best instruction set or vocabulary for microprocessors? And the prevailing philosophy was that it should be very sophisticated into this idea closer to software. And John Hennessy and I, who shared the Turing Award and people at IBM argued that that was the wrong model, that we should instead keep the instructions actually relatively simple and that's kind of reduced simplified instruction set and that the compiler would be able to map programs into it. And if we kept the instruction set simple, we could iterate them faster. It could potentially be more power efficient and things like that. So that was the debate. And what it came down to is for the more sophisticated instruction sets, you can think of a polysyllabic where words and the vocabulary, you needed fewer of them to execute the program, but they might run more slowly. So what was that ratio? And it turned out the RISC style tended to execute about 30% more instructions, but we could execute them about four or five times faster. So that was the net win. So that's risk in risk in a nutshell.
Kevin Ball
Yeah, absolutely. And it has continued to evolve. I would say probably if we look at processorship today, the, the debate is over. Essentially risk has more or less won.
David A. Patterson
Yeah. New instruction sets over the decades, but nobody's tried to do it. The ones, these very sophisticated ones, they're still similar at the core. Similar. The very RISC I that we did at Berkeley in the 1980s.
Kevin Ball
Yeah. So looking now RISC V relatively recently, the first version finalized and out in the world. What is different? I think I saw in a brief that it learned a set of key lessons from previous generations and avoided some key mistakes. So what do those look like?
David A. Patterson
Well, we did four RISC architectures at Berkeley in the 1980s and Hennessy did a couple that are called MIPS at Stanford. And what happened was the origin story for RISC v is in 2010 we were going to do research around parallel computing that was sponsored by intel and Microsoft. The switch over from single Core to multi core and that was what was funded. And we were going to need an instruction set to do our research. We could see that Moore's Law was slowing down and thought the future looked like is you'd have a core instruction set and then you'd add special purpose instructions for specific domains. That's what we thought was going to happen. So we needed a core so it would make Sense to use x86 kind of based on who our sponsor was. But a it was, you know, it's a ghastly hard to extend and intel wouldn't let us use it, nor could we use arm. You know, ARM was popular but you couldn't extend it. So we had to invent our own. And so led by my colleague Christo Sanovich and two grad students, NCIP Lee and Andrew Waterman, they said let's do a brand new risk architecture learning what we've learned over the last 30 years about things that were probably mistakes back then and what we learned since then. And because we did four risks architecture in the 1980s, they called it RISC V. So I think the fact that they call it RISC V gets me more credit than I deserve. But the idea was as researchers we thought everybody would want to use it, everybody in academia would want to use it, other researchers want to use it. So we made it available. Kind of the Berkeley tradition. Things are open source and that's how it got started. About four years later we were using it in our classes and our research. It didn't get really. Other universities didn't really pick it up, but it was out there. But we started getting comments like why did you change the instruction set from the fall semester to the spring? You said why did you make some of those changes? Like why do you care what we were doing in our classes, in our research and then in talking to them we found out there was this thirst for an open architecture that people could use rather than have to get a proprietary architecture. And once we realized that then we thought that was a great idea and wrote a paper kind of inspired by the Linux software should be free. This was, we called it instruction sets should be free and then soonly started a Foundation around 2014, 2015 and now 10 years later, fortunately it's actually caught on. It's people like the enthusiasm around open source software. This has this kind of religious fervor, philosophical attraction. Same thing for the open architecture. Most of the people involved really like the idea that potentially we could have a lingua franca across all computing biggest to the Smallest and if we could have one, it better be open. As we see instruction sets that are proprietary are tied to the fortunes of those companies. And you know, who would have thought intel would ever be vulnerable? Right. That just seemed impossible. But yeah, that x86 is tied to that company who's having difficulties. And there have been many other instruction sets that have gone away because of the fortune there. So the idea of an open architecture that's kind of community oriented, it's a standard. It's not like Linux, which is an implementation, it's a standard like USB or something like that. But a lot of enthusiasm around it. So if you look at the very core of that instruction set, it's similar to RISC1. But since we have so many transistors today and we had that idea of adding features when you need them. So we have RISC V allows optional features for all kinds of applications like encryption or machine learning or things like that, options that you can add. But at its core is that same risk philosophy.
Kevin Ball
Let's maybe talk some about that extensibility because I think one of the things going on right now with Moore's Law, ending with a bunch of these other things is we're sort of moving into a world of domain specific accelerators. And probably the biggest one of those is around machine learning, people using GPUs or TPUs and things like that. But one, what does that look like for people using RISC V if they want to tap into a domain specific accelerator? You said it's open, they can do that. What does that actually end up playing out like? And where are you seeing this happen?
David A. Patterson
Yeah, so what I've ended up doing at Google for almost a decade is working on domain specific accelerators for machine learning. When I retired from Berkeley and wanted to keep my hands in the technology, I thought Google would be an interesting place. I had gone to sabbatical recently and they just kind of, because of my experience, they had me report to Jeff Dean, who's kind of a famous software engineer and I think just because of my stature, but he was in the machine learning part of Google. And not that I thought this is something I wanted to do, but I didn't have any strong opinions. But he was in Google Brain that he founded. Jeff was a big and early believer in the potential of Mach machine learning. AI was one of the first movers in that. And so Google was the first mover in that. So I ended up spending the last eight years learning all about domain specific accelerators for ML AI. So now RISC V in particular, its first foothold has been in embedded computing. So there's DSP extension, digital signal processing extensions, there's compression expenses to keep the instruction set smaller, and then the extensions for machine learning. Kind of. To my surprise, I was involved in the IEEE Floating Point Format standard, which, you know, in the 1980s before that, different computers had different floating point formats. And so imagine porting floating point programs when the floating point didn't do the same thing different. So it was a real kind of mess on big computers. And the floating point standard was set up around, wow, microprocessors are starting to have floating point. Let's standardize it. And so thank God we did. But I thought the data types were settled, which was, you know, single precision, double precision, maybe down the line there'd be bigger things, but that was it. Well, to the surprise of many machine learning, because it's doing, you know, symbolic processing. It's not, whatever it is, 50 bits of precision, the range is very important, but the precision isn't all that important. So Google created a new floating point format, which I never expected to happen in my lifetime given the standard, but, you know, at first a 16 bit format that they called brain float, which is a different format because that's where it was done. And since then, Google and Nvidia and other companies have made it even smaller. So 8 bit floating point, 4 bit floating point, there's only 16 values. How can that be floating point? But amazingly enough, people are figuring out ways to get AI done with very, extremely neural data types. So data types has been a significant area of innovation in architecture. And I think when Bill Dalley of Nvidia talks about the giant gains that we've made in machine learning, he credits data types as one of the significant ones. So if you had an instruction set that you couldn't change from, you know, you'd be in big trouble. So you need to expand these data types. So those are examples of the type of extensions going.
Kevin Ball
That's a great example. There's all these interesting papers around. Yeah. Trading off between the number of parameters versus the precision of the parameters. I saw somebody doing essentially, what do they call it, like one and a half bit, where they just basically had one, zero or not set, and each parameter was just that. And that makes a ton of sense that, you know, being able to bake that in at the instruction set level, because this is extensible, allows you to experiment in those domains.
David A. Patterson
Yeah. So if you're a computer designer, what you love is people who say, I Can't get my work done. I need a much faster computer. You're not happy with people? Yeah, things are fast enough. My laptop is good enough. I'm going to keep it until it breaks. I'm going to keep it a decade. Right. You're not a fan of those people. The machine learning people are voracious. If you gave them a factor of 10 tomorrow they'd say thank you. And then like, what's next? Right. They can use up everything. And it's a brand new area. We don't know the best architecture for, for machine learning and AI. It's a wide open space. And so this idea that doing this numerical analysis, it's got these two phases, training and what's called serving your inference training is kind of like sending your kid to college. It takes a long time, it's expensive, spend a lot of money. But once they're educated, then they go out into the world and then hopefully answer questions much more quickly than it took them how to learn that material. So these two kinds and numerical precision can be different, what you train in and then what they call quantification. Maybe if you train it using 8bit, you'll be able to serve it or do inference at narrow bits. And there are papers that talk about one ternary, three states, like one and a half bit serving. It's amazing. So what's exciting, you know, from maybe in the 1990s when intel x86 dominated everything, and it's kind of boring. There's this wide open area on both the software side and the hardware side where it's not clear what's going on. And it obviously has gigantic commercial impact, as those of us who watch Nvidia stock can see. So it's both, it's pushing the state of the art in both, you know, software, hardware and simultaneously being delivered to people and having impact in people's lives. So it's a very interesting time.
Kevin Ball
I'd love to dig in a little bit what you're talking about there in terms of separation of inference and training. Because I think the vast majority of the industries using off the shelf GPUs, they're using the same hardware for those. I think I saw a paper talking about Google's TPUs that y'all custom designed, actually having some focused on inference versus training, like what are the different parameters that you're optimizing there?
David A. Patterson
So what's happened in the machine learning community? The big thing kind of a breakthrough was in 2017 when Google came up this new model that's called Transformer and a specific idea that if you just, if you think of an image, if you just, you can pay attention to different pieces of the image and do more computation there rather than uniformly. So that the actual title of the paper that introduced transformer is attention is all you need. So this has proven to be a breakthrough model. And what's happened in the last seven or eight years is people just pushing that model and expanding the number of parameters dramatically and all that stuff. So what's happening simultaneously besides the split of training and serving is this rapid increase in the size of these models. So what that turns out to is typically is memory capacity. So it used to be. Well, I think the original transformer model might have had 100 million parameters, some number like that. And people are now I've already gone to billions and people are talking about hundreds of billions of parameters. And so when we were talking about the data type list, these parameters are probably there. Well, they could have been 16 bit, this brain float, 16 or 8 bit or people like to get them into 4 bits to shrink both the memory capacity and if it's not only smaller, it takes up memory space, but also you get more memory bandwidth, you can fetch more of them per second. So, so that's that shift that's going on driven by this increase in these so called large language models. Now for serving, it tends to be for training, it's really computationally intensive. So there's, if you look at our textbooks, there's a phrase that's called arithmetic intensity which is the number of operations for every byte fetched. So training it could be hundreds of operations per byte fetched. So every data hundreds of floating point operations provide fetch. For serving, it tends to be not that high arithmetic intensity. So what that means is it's more memory bound. So serving tends to be more memory oriented. Training tends to be more compute bound. And so given that there are. Well, first of all you can do serving on a training chip, there's nothing if you can do training, you've got everything you need to do serving. But by specializing you can reduce the costs and reduce the energy and the carbon footprint by doing that. So you can have a smaller chip or reduce the size. All these chips have big matrix multiply units so you could get away with a smaller matrix multiply unit. You can get it right with a lower power chip and try and focus on memory centric architecture design for serving, I would say right now, you know, Nvidia rules the world on training and I think their favorite solution how do you do Serving. Well, you use our old training chips to just serve. That's their advice. That may benefit them economically, but that's their advice.
Kevin Ball
It might be just talking their book a little bit there.
David A. Patterson
Yeah, so I think there's. Whereas, you know, if you think of it historically, if you think about the PC as duopoly between intel and Microsoft, right now it's pretty much Nvidia and Nvidia, but they focused on the training side and very high powered, very big chips. So the serving side is there's more opportunity I think for hardware people to innovate. And with these kind of these trade offs I talk about maybe you get away with a smaller chip or you have closer to the memory system so you get good memory bandwidth. So that's kind of examples there. And again, this sliding era. Certainly Nvidia is the commercially available dominant thing. Some companies like Google have built their own training chips and Google has also built special versions of it for serving developers.
We've all been there. It's 3am and your phone blares, jolting you awake. Another alert. You scramble to troubleshoot. But the complexity of your microservices environment makes it nearly impossible to pinpoint the problem quickly. That's why Chronosphere is on a mission to help you take back control with differential Diagnosis, a new distributed tracing feature that takes the guesswork out of troubleshooting with just one click. DDX automatically analyzes all spans and dimensions related to a service, pinpointing the most likely cause of the issue. Don't let troubleshooting drag you into the early hours of the morning, just DDX it and resolve issues faster. Cycronosphere was named a leader in the 2024 Gartner Magic Quadrant for Observability Platforms at Chronosphere IO Sed, you mentioned something.
Kevin Ball
There that I'd like to dive in a little bit further. So piece of it being around becoming memory bound and I know this is a domain where the sort of ratios of compute to memory have been shifting over time. Even on my laptop, if I'm running into trouble, it's almost always because I'm running into memory constraints. And I saw a paper that you were involved with recently around memory centric computing and redesigning the ways that we set up at least cloud servers to be focused around memory centricity rather than processor centricity. Can we maybe talk about what that looks like and what that means?
David A. Patterson
Yeah, I think the original tagline of that paper was the CPU is not central anymore. So, you know, we're kind of used to measuring software by arithmetic operations. Very computation centric. But increasingly we are bound by memory, either by memory capacity or by memory bandwidth. And what's happened over time with the slowing of Moore's Law, is the rapid improvements in DRAM memory, which we saw for decades. In the last century, it was like clockwork, four times the capacity every three years. Today it's going to be more than a decade, 4x every three years to more than a decade between the 8 gigabit DRAM and the 32 gigabyte RAM. So it's really slowing down. And then along with that slowdown in capacity, the bandwidth isn't improving as rapidly. So one of the famous kind of admonitions in computer architecture was by Gene Omdahl, and he wrote a kind of a one and a half page paper that stated what he thought was fairly obvious and has since become called Omdahl's law is if you have a piece of the pie and there's a part of the pie that you accelerate and the rest of it you don't, he said, by what people say, Amdah's law, that limits how much faster you go. If you're going to make two thirds of the pie go infinitely faster, you're only going to go three times faster because it's one third you don't touch. And it's kind of this. People call it this sad law because architects run into it all the time, they get very excited, oh, look what I figured out. I can make matrix multiply, go 10 times faster. Very exciting. And then all the rest of it doesn't need to go. So what's happening is, you know, as Moore's law is starting to slow down, the logic is still getting pretty good. You know, the actual arithmetic units are okay, but the cache memory technology, it's called static ram, that's not improving very much. And then the dram, which is in a separate kind of technology, is also improving much more slowly. So what they've done is to try and boost up the memory, especially for these accelerators. They've gone to a novel packaging scheme and it's aptly named high bandwidth memory. But it's actually a little physical memory where you stack the dies on top of each other. There's a stack of dies like four or eight, and they're trying to get to 12 or 16 dies, and you have multiple stacks right around the computation unit. So they're very close. And they've got a thousand wires. Standard drams would have 64 wires. This is a thousand wires wire in these little stacks. So it's very specialized memory and it's kind of at the heart of all these accelerators to try and provide the bandwidth that you need particularly for training and for inference. So it's again, and not only is this technology right in the area, it's also in the business pages as we go along. SK Hynix has successfully can make it with 8 chips and is delivering it. Samsung, which was the original, I think creator of the high bandwidth memory, they're having difficulties delivering on this technology. So as a result, SK Hynix is doing well in the market. So this is all what's the AI stuff is tied directly to the business pages and stock prices that you see.
Kevin Ball
Absolutely, yeah. It's really driving it. And yeah, amda's law is I think intimately familiar to any software engineer who's been told premature optimization is the heart of all evil. Right. Like, oh, I got this loop going really fast. Why is my code not going any faster?
David A. Patterson
Yes. You know, I think it's the law of diminishing returns. Like when Gene Amdahl wrote it, it was just like he was talking about parallel computing, right. And he says there's a part that you can make parallel great. The part that you don't make parallel will limit how much performance you're going to be able to deliver. And you know, it was obvious to him as a really smart computer designer that he just had to write it out because people are getting very excited about these parallel processors and ignoring the part that they weren't doing. Yeah, but it's this law you keep running into over your career. Like oops. Yep, screwed up again. Amdahl's law.
Kevin Ball
Yep. Okay, so with the accelerators, that's really interesting, right. They're packaging huge amounts of memory. Going close to it. I think I saw.
David A. Patterson
Yeah. What's interesting is because you can actually plug a lot more drams into your PC than you can plug next to accelerator in the so called dual inline modules. But you can put dozens of them. But because of the physical distance to get that thousand wires, it's on a special package. So what's the downside of the stacks is the capacity isn't very high because it's not that many dies, there's not that many stacks relative to the DIMMs. So incredibly fast but very limited capacity. So how does that constrained capacity for HVM go with what you claim was billions of parameters or hundreds of billions of parameters? Yeah, that's the problem. You need a lot of GPUs to get enough capacity to be able to solve these big problems because of the limited memory capacity of the HBM stack. So. Well, it's challenging. I guess if you're selling GPUs, it's not challenging, but just buy more.
Kevin Ball
There you go.
David A. Patterson
Yeah, not a problem.
Kevin Ball
Continuing to dig into this sort of memory and data centricity, I think I saw something around building out kind of database architectures around a memory pool rather than once again being sort of CPU centric, what does that end up looking like? I'm interested in this shift of if our constraints are now memory, which they've been on and again off again, but increasingly like data has so much inertia, like that is the fundamental constraint that's not getting faster from Amdahl's law. Like how do we shift our hardware architectures and our software to better manage that?
David A. Patterson
Yeah. So the paper that you're talking about is actually being presented at cidr, which is a database conference. I can't remember what it stands for. And I was at a workshop in Germany where they had a few architects and a lot of database people and we got together and talked about this memory centric approach. And the authors of the paper, you know, there's a lot of database people there who are better informed about this that I am. But one of the enabling technologies is this new kind of a Successor to the PCIe bus. It's called CXL. And the idea is to be able to create a coherent address space across several servers. And what we argue in that paper is one of the downsides of database accelerators in the past is that you needed a memory bandwidth for these accelerators. But. But you would have to put a lot of physical DRAM with each of these accelerators and if you didn't use it well, it would be very expensive. So what we argue is that CXL allows you to very easily have, and CXL is standard on all the new servers, have pools of DRAM that you can be shared across CPUs rather than having to put a lot of DRAM in every physical one. So you could have given a pool of DPUs. Then maybe accelerators, database accelerators. Make more sense is that you only need to use memory from the pool rather than you have to justify the cost of having DRAM that is only can be used in these narrow situations. So I'd say the bottom line of that paper is pooling of DRAM is realistic. We should be thinking of being more memory focused, thinking of the problem of how do we get access to the data rather than thinking of it as computing, as the focus of what's being done.
Kevin Ball
Yeah, well, and I wonder, and you said you're not as familiar on the software side, so redirect me if I'm going off into an area that isn't in your domain. But does that require the software explicitly managing the memory? Because I feel like a lot of right now, at least higher level software can optimistically ignore memory hierarchy and then you have to start being aware of it when your performance falls down and you're saying, oh, I've broken cache locality or something like that.
David A. Patterson
But when you have this, this is a little different. So how do computers work? Right? It's how could you have gigahertz processors and this dram that takes 100 nanoseconds? How do you make that work? Well, caches were invented and we just keep sticking in levels of caches and trying to hide, you know, give the illusion that you have this incredibly large memory and it's incredibly fast. And it's up to the hardware memory hierarchy to hide that from the programmers. For some tasks that works, they work, the illusion works pretty well, but for others it doesn't. And so programmers do need to be aware. But I think this one isn't so much the memory hierarchy as it is just the memory capacity, which I think programmers have had to worry about for a long time. But this idea that. Well, I think the argument is if you re architect your software to be aware of the pooling of memory across different servers, you have this ability to be able to get a lot better cost performance for data intensive applications.
Kevin Ball
Yeah, that would make sense just in terms of being able to trade off more memory versus CPU and not having to stack them. I was also wondering if you make memory ownership visible to the software, could you hand off between different processes without having to do a memcpy where you basically say like here's your address, go.
David A. Patterson
Yes, that's kind of this distributed shared address space. I think there's, I believe in the, there's a whole bunch of CXL protocols, but I think the latest CXL protocols will allow that type of access to the shared memory. But it's different, it's, you know, it's shared between different servers. And that used to be impossible thing to do. And the way that servers are constructed, there's just a limit to the amount of DRAM you can stick onto one in one server. But using cxl you can get this the illusion of having a much more memory for individual pieces so it's kind of a practical way to get a tremendously bigger memory footprint without a gigantic cost of some kind of supercomputer.
Kevin Ball
Yeah, I remember earlier in my career I was involved with a lot of high performance computing stuff and shared memory models. If you wanted to do it across lots of CPUs you had to go to like SGI had these mega things or something like that. Anything else you were passing messages essentially over a network bus in some way. So it sounds like in some ways this could enable shared compute, shared memory style programming models, but across some number of servers.
David A. Patterson
Yeah, so that was John Hennessy and I. So John Hennessy did the, he had this project called Dash, which was a big shared memory processor. And this was this kind of switch to parallel computing or that that was kind of in the air. And we had a contrasting project at Berkeley called the Network of Workstations. And this is in, I guess this is in the 90s, I guess. I think that's right. And John thought, well, the hardest problem with parallelism is programming and if we keep it a coherent shared address space, that's going to make it easier. And so we at Berkeley, what we said, well that's one way to go. But we think independent, it's so cost effective. Rather than have these big servers with this, you know, interconnect to be able to provide that, it's tremendously more cost effective if we could use know we said workstations, but PCs, put a bunch of them together, use local area networks to connect them and that's going to be the cost performance of that's going to be amazing. And so how did that settle? Well, what happened is, you know, the Internet came along and Internet services wanted, you know, the parallelism was based on number of people, not on a single program that you had to paralyze across 100 processors. So it was a huge throughput demand. So what ended up happening is the, you know, the network of workstations1. Right. That's how all Internet services standardized around that. And you know, John's model was more efficient in dram, but the price of DRAM at that time was so low and you could scale up and it was also very reliable that a single server could fail and the software could keep working around it. So it was much more reliable, much more scalable and much cheaper than the coherent address space model that those SGI machines did. But it was, it's an example. How do we settle debates in computer architecture? Well, we get companies to spend hundreds of millions of dollars or billions of dollars to put it in the marketplace and then we fight it out and oh that's I won. That's how. That's how we do it.
Kevin Ball
Absolutely.
David A. Patterson
Well.
Kevin Ball
And as you highlight a huge amount of the demand in compute growth for many, many years was Internet driven essentially embarrassingly parallel. You just split it out across things. Stateless servers that maybe access some shared state and then you've got databases or things like that that have to actually have that coherent view of state.
David A. Patterson
Yep.
Kevin Ball
Now we're at a scale where that's not super cost effective. Needing this pooled memory for databases. Are there other domains? We've talked about machine learning, particularly at inference time being very memory intensive. Databases are another one. What other domains do you think this type of memory centricity is likely to make a lot of sense?
David A. Patterson
Hmm, yeah, that's kind of the question when people talk about domains. So my examples have always been besides me, data analytics databases. I don't think I have any other areas that are obviously memory intensive. My guess is that'll be increasingly the problem going forward for lots of applications. But it's hard to. I'm not sure. I don't think I have any worries.
Understanding the details of infrastructure tools matter and there's no better way to understand that than looking directly at the code. Open source codebases give everyone the ability to inspect, audit and contribute to the software they use, enhancing trust and transparency. Bitwarden is a trusted open source and end to end encrypted security solution that empowers businesses and individuals to securely manage and share information online. Made by developers like you, Bitwarden offers open source solutions for virtually every credential management use case, from secrets management to password management and passwordless. Developers can even securely manage their SSH keys with the new Bitwarden SSH agent. Get started on your open source security journey today and start your free trial@bitwarden.com this episode of Software Engineering Daily is brought to you by Jellyfish, the leading software engineering intelligence platform. AI Codegen tools can be force multipliers for R and D organizations, but are you making the most of them? Join your peers on April 17th at Glow Live. It's a dynamic 90 minute virtual event that explores the transformative nature and potential impact of AI Codegen solutions. At GLOW Live, you'll hear expert insights on navigating a constantly shifting landscape, adopting cogen tools successfully and measuring their impact on your team, your work and your company's long term success. Register today at Jellyfish Co Glow and.
Kevin Ball
Get glowing so looking at this and looking at so much is being driven right now by machine learning. And as you highlighted, folks doing machine learning, they'll take as much as you can throw at them. I saw something recently where one of these machine learning coding tools, Cursor, was saying, hey, anthropic's throttling us because they are literally out of GPUs. They can't run enough inferences. Everybody is struggling on that. To me then I'm curious, what do you see as like the big unsolved problems for the next five or ten years?
David A. Patterson
Well, one of the questions I had is in 2016, when I joined this machine learning organization is like, well, we'll see how significant this is. And it's very hard to see when you're in the middle of it, whether this is a paradigm shift or not. Retroactively, it's much easier. Like a decade later, you look back, wow, that. And in my career, the microprocessor, the Internet, maybe mobile phones, smartphones, and when you look back, you say, wow, that was a giant change in our technology base. So now that I've been there eight years, I think this is one. This is one of those things.
Kevin Ball
So agree. This is a big paradigm shift. I think that's becoming more and more clear. What are the still big unsolved problems here?
David A. Patterson
Yeah, well, because we're at this. Well, I'd say a couple of things. We're not at an upper bound on intelligence. We're not at, you know what, that's good enough, right? I think with intelligence, I don't know if there is going to be an upper bound, but we're certainly not there yet. It's still, there's things where it screws up. There are things where we'd like it to be a lot smarter. So I think on the machine learning side, can we figure out how to deliver something useful that people can depend upon? You can ask it questions and trust its answers and can we deliver that economically and can we reduce the carbon footprint of our solutions as well? That's a topic I ended up being involved in quite a bit, the carbon footprint of this. So I think, I think if we only focus on the part of the industry that's machine learning, there's giant challenges there. What's the best architecture? Can we improve the algorithms to reduce the cost of training and serving? Can we come up with architecture ideas that can reduce and hardware ideas that can reduce hard serving? There's just a huge set of problems there. So it's this very exciting time, like I Said for my career, there have been kind of boring times. This is not one of them. And what's great is if you're a researcher, then if you've got a good idea, people are anxious to hear it. There are times when everybody's making a lot of money and it's kind of boring. It's hard to get people's attention because they're making a lot of money and they don't need to change anything. That's not where we are today. So it's a very exciting time. If people are interested in hardware to get into this space or algorithmic advances.
Kevin Ball
I'd be curious to dig a little bit deeper on the carbon footprint side. I saw something recently that Microsoft is projecting. They're spending 80 billion on data centers this year. I haven't seen similar numbers from Google and Amazon, but I know that they're also deeply investing in capacity for this. And there's a lot of worry out in the world around like, what's the environmental impact, what's the energy impact? So can you talk a little bit about the research you did in that domain?
David A. Patterson
Yeah, I know I've worked a lot on this topic. So I got started in it because I guess in 2021 there were papers coming out, they were making alarming claims. And I would ask my friends in machine learning at Google, like, is this true? So there was a paper that came out in IEEE Spectrum, which was the flagship of ieee, one of the big organizations in the world, and it said it was 21. And he said by, I think it's by 2024, training a model would cost $100 billion and produce as much emissions is the city of New York in one month, like, oh my God, is that true? So we started investigating it and we found that there was a particular paper that inspired these concerns. And it was actually a paper by a group at the University of Massachusetts who were trying to guess what it cost for one of the Google projects. So in machine learning, there's a thing that you do to try and find better models, more efficient models, and that's called neural architecture search. So you're using kind of machine learning to find better models, more efficient models. And so it was actually a more efficient version of the transformer model called Evolve Transformer. So they tried to estimate what was the carbon emissions of that search. And because they didn't have internal Google information, there's a thing that we call in that paper we call the four M's which is, which affects the machine costs, which is the model Itself, there's the machine it's running on. And then the third M is mechanization, which is an M word that means how efficient your data center is. And the fourth one, which was the big surprise was maps, because the cleanliness of the energy is highly dependent on geographies where you are, if you're near a hydroelectric dam in your solar or wind, you know the energy is going to be much cleaner. So those four M's, so they didn't have access to the four ms, and so they did an estimate based on averages. So that was fine. So they were a little about a factor of 5 higher than Google was because we optimized a bunch of those things. But unfortunately they misunderstood how we did the neural architecture search. So they were off by another factor of 18. So we did a small proxy model to search the space and they assumed. So the paper itself that everybody based their work on was off by a factor of about 90 too high. But then they misunderstood what the paper was about. They thought it wasn't searching for a new model, which they do that occasionally. And then they put the model out, they publish it and put it on GitHub and people download it and use it thousands of times. The people who read that paper thought that was the training of the model. And so, not surprisingly, it takes more than a thousand times as much energy to find a model than to train one. So he multiplied together that the conclusion was too high by 120,000. And so you end up with, you know, the claims like it's going to, you know, cost $100 billion like New York. So now the problem was, now that we knew that, how do we get the word out? So there's no real good mechanism because this paper appeared in the conference to say, hey, by the way, remember that paper everybody's citing? It's off by 100,000. So I tried to go around and give talks in life, but it's basically an unsolved problem. So that's got me into the space. I think what happens today is people just have a hard time because you hear these, you talk about tons of emissions. It's hard to put it in perspective. So there's something called the International Energy Agency, which is like an organization of for 50 countries. And what they said was recently, how much energy is going into data centers? Well, it's about 1%. 1% of all electricity is going into data centers. That doesn't include the Internet, doesn't include crypto. This is kind of Amazon and Google data centers. And then AI is only a piece of that. Our measurements were that it was about 15%. So it's less than a quarter of 1% is AI. That's where we are today. And so this IEA agency, they said looking into the future, even with strong growth, you know, it'll get bigger. But it's compared to the other things that are going on, it's not that big a deal. So like air conditioning. Air conditioning is going to grow a lot in the next five or 10 years and it'll be much bigger energy driver. And also they talked about high electrical things like aluminum plants. Just plain old economic growth was going to be another thing. So when you put things in perspective, it's going to grow, but it's small relative to other things that are going on. But it's very hard to communicate that kind of message today because people will notice things because what happens is that's the worldwide average. But in particular regions, if somebody builds a lot of data centers in the same region, then that utility company could be taxed. And that's new. So you'll see some cities where, wow, they want to build a lot of data centers here we have a limited energy supply there and that's going to tax it. So it's kind of a local problem. But if you have the big picture, it's probably not a global problem. But nevertheless the growth is so much and companies want to be able to build these data centers. They're investing or investing or they made plans for nuclear energy. So these so called small modular reactors or even fusion, which is hard to believe.
Kevin Ball
I saw that. Yeah, I'll believe it when I see it, but that would be incredible.
David A. Patterson
Well, yeah, the other thing that's in the press all the time is quantum computing. And one of my friends said, you know, there's a chance that fusion's going to work before quantum computing.
Kevin Ball
Fusion's been one of those that we're going to solve it in the next 10 years for about the last 50. Right, right.
David A. Patterson
But the people are more optimistic in the last five years that we're five years away than for decades we've been 10 years away. And they are using ML to help. Figured this out some, but it does seem like I can't believe it's going to be real. Obviously if fusion happens, it'll be amazing. But the small. I have a. One of my nieces is a nuclear engineer. The small modular reactor thing, you know, we've had reactors in submarines for decades. They're training 18 year olds to operate, you know, nuclear reactors. So there's, there's another part of the space that technically could be very helpful. But you know, the kind of the sociologically putting reactors into neighborhoods maybe bridge too far.
Kevin Ball
Yeah, no, absolutely. And I think, you know, in that power domain. So actually first off, let me restate in case anybody missed it. Right. If you're worried about the environmental impact, the paper that you're probably tracing your worries back to is off by 100,000.
David A. Patterson
And the papers in academia, we count citations, its citation rate's gone way up. Our paper said, by the way, there's a little flaw there. We're way behind, we're not catching up. So that's kind of. I was talking to a guy who does the data center and stuff and he says that's what it works. If somebody makes a mistake in a calculation and people read about, wow, I didn't know it was that bad. And then that's news. And once it's news, it is very hard to fix. Even today, if you were to search Google about what's the cost of training, you might get this so called the same as five car lifetimes. That's that original paper, that's off by 120,000. But Google will help you find that erroneous result.
Kevin Ball
So another topic that I like to talk about with folks who you mentioned, right. You have five decades of experience here, almost as long as the industry, and you just said something of you've seen boring times, you've seen exciting times, and we're in one of those exciting times. I'm kind of curious with that perspective. What do you see going on in the evolution of the tech industry right now? What is changing the most? What's staying the same? What are you excited about?
David A. Patterson
Well, right now it'd be like asking about right after the microprocessor was invented, which I asked some of my grad student friends, I was a grad student when it was invented. They told me, I said I thought it was a big deal. Which I'm very happy that I said that.
Kevin Ball
Got that one right.
David A. Patterson
Yeah, because it wasn't, it was kind of a toy. You know, actually in the early years there was real computer conferences and also they had these kind of pretend computer conferences where these toy computers were at. They weren't real computers, but yeah, but yeah. So because it's right then it's kind of hard to see around the AI stuff and what's got nothing to do with AI. That's very exciting. I think the quantum computing stuff isn't really about AI. Quantum computing is like 0 degrees Kelvin almost. And AI success is machine learning, lots of data. So quantum computing and lots of data don't really fit together. But I think what Jensen said was pretty plausible. What did he say it was? 15 years is too early, 30 years is too late, 20 years. And, and I think so it may have a big impact on pieces of computing in decade from now, or say, but we're not going to have quantum cell phones. Right? This is big science and there'll be some problems that are, that it can solve that we thought were unsolvable. But most things, as far as we know right now, most will still need regular general purpose computers for many other things, including, as far as we know, AI and machine learning. I'm not a person who thinks that's going to be this gigantic paradigm shift. It'll be, you know, it'd be an amazing accomplishment to do it, amazing scientific accomplishment, but I don't see it as changing the industry. We just don't know how far AI is going to go. Right. It's how much of our own technology will we throw away and replace with AI. So for the people closer to AI, like in the vision community, have friends in the vision community, once you know that there was that tipping point with Alexnet and the imagenet competition where it won and within two years everybody had abandoned what they were doing. It. They completely changed all their courses. So it just, everything went away and it became machine. So it was absolutely revolutionary. And that how much of computing technology will that happen to or, you know, where are the walls where it can happen? Because when it does happen, it's hard to see the limits or it's hard to see how far it can go. So we don't know where that wave of AI, how far it's going to spread into the computing field, but it's definitely affecting lots of pieces. People are talking about using large language models to make it easier to do the, what's called register transfer language of hardware design, which is, you know, that's not something I would have seriously thought about. So it's hard to ignore how pervasive AI is going to be, but we don't know how wide. But if we knew the answer to that, I think we'd have a better understanding. But there's this chance it's just going to infect all of our underlying technologies, doing things very differently that we've done from the past. And so it's hard to predict given how extensive this change we might be undergoing in the next few years.
Kevin Ball
Yeah, absolutely. I Feel like it's already dramatically changing the way people do software development. You can write software with this thing and it does a pretty good job and can speed you up dramatically, but it also shifts the types of architectures you want to write, which then changes the education path of what type of software is good software and how does it work.
David A. Patterson
Yeah, I was actually involved in another paper, kind of out of my depth that some friends wanted to write. And it's about Shaping AI for the public good is the theme. And I think the title is Shaping Eye for Billions and Archive Paper. And there was actually, I did an article, Economist, about that topic. But I think the way we think a great use of technology is to act as kind of assistant with the human involved. So there's a human expert and it's working in conjunction with the human expert, which makes the human more valuable economically. So rather than getting rid of hundreds of programmers, let's make hundreds of programmers tremendously more productive. And in that paper it talks about, you know, people worry about job loss. Well, in economics, there's this question of whether it's the product you're doing is elastic or inelastic. And what the question, what that means is if you make it more efficient and cheaper, what's going to happen? Well, if it's agriculture, you're making food, there's almost so much food that people can eat, probably the number of jobs will go down. But in topics like software, where in the past it's made the number of jobs go up, even though you bring the price down, there's tremendously more demand for it. So we think there's a, you know, if research focuses on elastic fields and improving human productivity, that could have this very positive effect that's going on. But I certainly see it right now that if you're an expert and you can use AI to make yourself much more productive and you can see when it screws up, it's a very powerful technology. On the other hand, if you're a novice, you know, how do you avoid the hallucinations or the mistakes it makes from having you do something that's embarrassing? So it looks like that's where it is right now. Who knows where it's going to be in a few years? But I think it's this reinventing how we write software and how we design hardware is an example of this extensiveness of where a might go.
Kevin Ball
Yeah, and I think you're spot on. Right. We're entering a period of software abundance, but there does not seem to be. Software is like dreams Made life. We have no shortage of dreams. There's always more software to write.
David A. Patterson
To me it's this. I talked about this. No, like that's smart enough. We don't need to go any further. I think same thing for software quality. I mean the things over my career, the embarrassing stuff is how insecure our technology is, you know, enabling. Purple Hill hired people in Lithuania to steal money from your grandmother. Right. We helped make that happen. So if we could make a serious dent on the security problem, that would be amazing if that could happen and maybe AI can help us. I mean that's an example of a quality of improvement that would be wonderful for our field.
Kevin Ball
So we're coming to the end of our time together. Is there anything we haven't talked about that you would like to share with folks upcoming?
David A. Patterson
I think one of the in the embodied carbon space I'm very proud of. I think within a couple of weeks Google's going to publish a paper that talks about the two parts of carbon footprint is when it's operating and what it costs to manufacture. And people have speculated a lot about how expensive it is to manufacture chips. We're going to release a paper or a blog. I think we're going to have done what's called a life cycle analysis and the title of the paper is Cradle to Grave. So the whole lifetime we'll have the data out there to shows how expensive is it to build these AI accelerators like GPUs. What are the carbon emissions associated with it? How does the operational piece compare to the manufacturing piece? And that'll be the first time that data is out. So you know, it was nice that Google has both environmental experts and computer experts and we collaborated together. This should I hope will come out over the next month. And then you know, reading about, thinking, you know, we tried to make a thoughtful paper under the assumption that there's people who are open minded and would like to hear about AI. It's less clear to me now based on the kind of Twitter like instant reactions based on one sentence in the paper whether how many open minded, thoughtful people are out there. But I think I'm hopefully if you were to read about that you'd give you something to think about. You know, we try not, you know, talk about the upsides and the downsides. Not just the upsides of AI or the downsides, only the downsides of AI.
Kevin Ball
Yeah, there are a lot of outrage merchants out there.
David A. Patterson
Yeah, yeah, there was this the Andy Konwinsky is a former PhD student. We did this paper a long time ago on cloud computing, where it was not as controversial, but it was controversial. When we wrote the paper, like, everything's cloud computing, there's nothing there. And so we kind of explained it. What are the upsides? What are the downsides, how researchers could make it better? He wrote that paper, he said we need to do that again. But A.I. like you said, because of the outrage merchants. So I don't know if he making a dent in the conversation, but we gave it a shot.
Kevin Ball
I hope it gets better uptake than the carbon footprint. Corruption.
David A. Patterson
Yeah, that would be a goal. Can we get as many citations as the paper that's off by a hundred thousand. That if we got there, that would be successful.
Kevin Ball
Awesome.
Podcast Summary: Software Engineering Daily
Episode: Turing Award Special: A Conversation with David Patterson
Release Date: April 10, 2025
Host: Kevin Ball
Guest: David A. Patterson, Turing Award Winner
In this special episode of Software Engineering Daily, host Kevin Ball engages in an in-depth conversation with David A. Patterson, a pioneering computer scientist and co-recipient of the 2017 Turing Award. Patterson, known for his seminal work in computer architecture and as a co-developer of Reduced Instruction Set Computing (RISC), shares insights from his illustrious career, his contributions to the tech industry, and his perspectives on current and future technological trends.
Kevin Ball opens the discussion by acknowledging Patterson's extensive career spanning over half a century in the tech industry.
Kevin Ball [01:32]: "You've been in a whole bunch of different domains of the tech industry. How do you introduce yourself these days?"
David Patterson responds by highlighting his roles:
David A. Patterson [01:51]: "I was a Berkeley professor of computer science for four decades and eight and a half years ago I started working for Google. So I've almost got a decade at Google. So I've got a half century of experience in the field."
He reminisces about the early days of computer science, noting how it was not widely recognized as the future by his contemporaries.
The conversation delves into Patterson's foundational work on Reduced Instruction Set Computing (RISC) and its evolution into RISC V.
David A. Patterson [02:57]: "RISC in a nutshell keeps the instructions relatively simple... RISC V allows optional features for all kinds of applications like encryption or machine learning or things like that."
Key Points:
Notable Quote:
David A. Patterson [05:06]: "We made it available. Kind of the Berkeley tradition. Things are open source and that's how it got started."
RISC V's flexibility allows for domain-specific extensions, making it a cornerstone for modern computing needs, including machine learning and embedded systems.
Patterson discusses the shift towards domain-specific accelerators, particularly in the realm of machine learning (ML) and artificial intelligence (AI).
David A. Patterson [09:18]: "Instruction sets that are proprietary are tied to the fortunes of those companies... you need to expand these data types. So those are examples of the type of extensions going."
Key Points:
Notable Quote:
David A. Patterson [12:19]: "It's a wide open space. This idea of doing this numerical analysis has these two phases, training and what's called serving your inference..."
The discussion moves to memory-centric computing, addressing the challenges posed by the slowing of Moore's Law and the increasing demand for memory bandwidth.
David A. Patterson [20:06]: "We're not at an upper bound on intelligence... Can we deliver something useful that people can depend upon?"
Key Points:
Notable Quote:
David A. Patterson [25:54]: "Pooling of DRAM is realistic. We should be thinking of being more memory focused, thinking of the problem of how do we get access to the data rather than thinking of it as computing, as the focus of what's being done."
A significant portion of the conversation addresses the carbon footprint associated with AI training and deployment.
David A. Patterson [38:40]: "We found that there was a particular paper that inspired these concerns... the claims like it's going to, you know, cost $100 billion like New York."
Key Points:
Notable Quote:
David A. Patterson [38:40]: "It's less than a quarter of 1% is AI. That's where we are today."
He emphasizes the importance of accurate reporting and understanding of AI's environmental impact to inform sustainable practices.
Looking ahead, Patterson shares his views on the future challenges and opportunities in AI and technology.
David A. Patterson [35:45]: "We're not at an upper bound on intelligence... it's still, there's things where it screws up."
Key Points:
Notable Quote:
David A. Patterson [52:24]: "If research focuses on elastic fields and improving human productivity, that could have this very positive effect that's going on."
As the conversation wraps up, Patterson hints at forthcoming research and publications addressing AI's lifecycle carbon footprint.
David A. Patterson [53:19]: "Within a couple of weeks Google's going to publish a paper that talks about the two parts of carbon footprint... Cradle to Grave."
He underscores the necessity for balanced discourse on AI's benefits and challenges, advocating for comprehensive evaluations to guide responsible development.
Notable Quote:
David A. Patterson [55:31]: "I hope if you were to read about that you'd give you something to think about."
This episode provides a comprehensive exploration of David Patterson's contributions to computer science, the evolution of RISC architectures, the burgeoning field of AI and machine learning, and the critical considerations surrounding the environmental impact of modern computing. Patterson's insights offer valuable perspectives for engineers, researchers, and enthusiasts navigating the rapidly advancing tech landscape.