
Loading summary
A
Foreign welcome to Generative Now. I am Michael McNano. I am a partner at Lightspeed. Nvidia is undeniably a Cornerstone of the AI Revolution. Their groundbreaking GPUs are the workhorses of modern AI research and development. Nvidia also made some major announcements at CES this year. And that's why I'm revisiting a conversation I had with Bill Daly. Bill is the Chief Scientist and Senior Vice President for Research at Nvidia. He is one of the most forward thinking minds when it comes to computer hardware and architecture. His decades long career started in academia at Caltech, then mit, before later becoming the chair of the Computer science department at Stanford before transitioning to Nvidia.
B
We talk about his early experiences, the.
A
1980S playing around with neural networks at Caltech, the pace of the AI evolution and why he believes that AI is the technology that will revolutionize all human endeavors.
B
So check out this conversation I had.
A
With Chief Scientist and Senior Vice President for Research at Nvidia, Bill Daly.
B
Hey Bill, how's it going?
C
It's going well, Michael.
B
Thank you so much for doing this. Really, really appreciate the time. Very excited to talk to you. I've been looking forward to this one for a while. You have obviously incredibly impressive background and role at Nvidia and there's so much we could get into about Nvidia and the state of AI and GPUs and all of the research that you and your team of, I believe hundreds of researchers are working on. But maybe before we get there, like I said, you have such an impressive career. I think the audience would love to hear a little bit about what you've done over the past several decades across academia. Entrepreneurship. This role as Chief scientist of Nvidia. Give us a little bit of the background of your story.
C
Okay. What's relevant to AI and the like probably started when I was a graduate student at Caltech. This is 40 years ago in the 1980s. I took a course on neural networks and I thought that was just a really cool technology. We built little multilayer perceptrons and, and convnets and these things called Hopfield nets that were little associative memories. But it also impressed me that it was a toy, that it was a great technology. But the COMPUTE wasn't there at the time. But that was a formative thing. And then later on I was a professor at MIT and I was building parallel computers and it kind of struck me that parallelism was the technology. It was a way to scale performance in a way that you couldn't do with serial processors. But at the same time, existing software was a huge inertia that with Moore's Law in effect then, and the Moore's Law about serial processors, not about transistors, people could just wait and every 18 months or so their performance of their computers would double. And so why rewrite all your software? If you went with parallel computing, your performance would go up by a factor of four. If you just wait, it goes up by a factor of two. And that's just too easy a path to compete with. So it wasn't really until that ended that the parallel computing took on.
B
Until Moore's Law ended, yeah.
C
And so then in the early 2000s, when I was on the faculty at Stanford, we developed this technology called stream processing, which is a way of really making parallel processing more accessible by managing the data movement in a very effective way. And we partnered with it, with Nvidia, the development of the NV50, which came to market as the G80, to make that technology broadly available in the form of cuda. Now another thing that was going on about the same time when I was on the faculty at Stanford, I was chair of the computer science department when Sebastian Thrun won the grand challenge for a autonomous car to drive itself across the desert from, what is it, Barstow to Las Vegas or something like that. And the technology that made that work, I remember going to one of Sebastian's meetings and they were talking about how they're having trouble having their car tell the difference between the road and the desert. And it's actually harder than it seems because those dirt roads, they're dirt and the desert is dirt. And how do you tell one dirt from the other dirt? And they had the smartest graduate students trying to code up manual feature detectors to do that. And it wasn't working. And so they just acquired a lot of data and they used statistical methods to do wasn't neural networks at the time. Again, the compute wasn't quite there for that. In fact, remember what it was, but it was a way of automatically discovering features by mining lots of data. And it just struck me that that was a very powerful technology. And a few years after that, after I'd left Stanford and joined Nvidia, about 2010, I had a breakfast with Andrew Ng. And at the time he was working at Google Brain, finding cats on the Internet using 16,000 CPUs. And it struck me, okay, you know, that's a lot of compute power. I should say it's a lot of expense for that compute power. But We've gotten there. These neural networks that I played with back in the 1980s, we finally have the technology to make them real. And it also struck me that CPUs aren't the way to do this. What we should do is get this stuff running on GPUs. So I got somebody at Nvidia Research to port his cat finding code to GPUs, and that code ultimately became Cudnn. That's kind of the path I took starting in the academic world at Caltech and MIT and Stanford, sort of seeing all the pieces come together. The original neural network technology, the parallel computing, evolving that into stream processing, into GPU computing, and ultimately converging on where we are today, building the engines that are basically powering this revolution in AI.
B
And these engines. Obviously there's enormous demand unlike anything I could have ever imagined even just 18 months ago. One of the questions I like to often ask people sitting in your seat on this podcast is, did you expect what has happened over the past 12 to 18 months? Obviously you've been thinking and working on this stuff for the past couple of decades, but did you even know what we were about to experience through the explosion of generative AI?
C
I didn't expect it to happen this quick, quickly. So I was convinced that this was a technology that was going to revolutionize all of human endeavor, right? How we play, how we work, how we educate, how we get medical care. Everything about life would be profoundly affected by AI. And I knew that was going to happen, but I thought the change was going to be more gradual and not quite as frenetic as it's turned out to be. And it's interesting because it was a slow start. You know, things were moving along. You were seeing lots of applications of AI, starting with the convnets back 10 years ago, maybe 12 years ago, people were starting companies to, in agriculture, tell what is plant, what is weed, and squirt the herbicide on the bad one. And it was happening. There was growth. But then when ChatGPT came out, it was like somebody turned the rate knob way up and things just became a lot faster. I was not expecting that.
B
Could data be a limitation at some point in the near future?
C
It certainly is one of the key ingredients that you need to make this work. But there's so much that can be done to both mine private repositories of data that many companies have that has not been addressed yet, and also to create synthetic data, which we found very effective in many applications, that I don't see that as an immediate Concern, I think that there's going to be plenty of data, both on the private side and on the synthetic side, that the usual thing people have done is kind of scraping the web and getting data. They may be nearing the limits of what can be done with that, but there's a whole lot more data out there.
B
Yeah, makes sense. So talk us through. What does it mean to be the chief scientist at Nvidia? Give us a day in the life of Bill Daly at Nvidia and your team.
C
Got to be the world's most fun job, I think so. I get to do a lot of interesting things. My role is as chief scientist and senior vice president of research. And it's actually two distinct jobs. So as chief scientist, my job is really to kind of poke my nose into everything going on in the company and try to make the technology better. And so whether it's, I'll attend the meetings on planning for the next generation of GPUs, you know, for our autonomous vehicle projects, for robotics projects, and try to stay up to date and connect people to, oh, there's somebody at this university doing something really exciting that could make this better. Let's take a look at that. Or, you know, maybe we should be pushing harder on a new packaging technology for the next generation GPUs and just trying to push people out of their comfort zones a little bit, get them to try, try things that could make stuff better. And then the flip side is I run the research organization, which is like a giant playground. We get smart people from all over doing exciting things ranging from circuit design on what I call the supply side of the research lab, supplying technology to make the GPUs better. And then we have people doing all sorts of AI, autonomous vehicles, graphics, robotics on the demand side. And it's fun to meet with these people. They're smart people, talk to them about what ideas are doing. And my job is to get obstacles out of their way. I try to enable them by finding out what's blocking them and remove the blockages so they can do amazing things.
B
Yeah, so it almost sounds like the chief scientist part of your job is really about thinking about the future and planning for the future. And then the research org is about, hey, what is the research we can be doing now to make the technology and the product offering better for our customers? Is that a good question?
C
Yeah, it's a really great way of summarizing it so the two fit together. And very often for the chief scientist job, thinking about the future, we try to identify gaps, we try to have this vision of, you know, where we want things to go with, you know, both the GPU hardware, the software, the applications. Then we say, why can't we do that today? And on the research side, we try to fill those gaps. We try to, you know, what technology can we develop that will make that possible?
B
Got it. So maybe starting with the, the latter on the research side, the research org, like, what are some of the, what are some of the things that are, that are most exciting to you and the team right now? Either areas of focus or specific papers or bits of research that you're working on right now?
C
Yeah, well, generative AI has to be kind of the most exciting thing going on. And so we're trying to develop new technologies for that and trying to develop some fundamental understanding of it as well. Our research group in Finland wrote a paper a little while back that basically really kind of laid out how diffusion models really work and actually the process made how they are applied much more efficiently. And so that's the kind of thing we do is try to look at the technology that everybody has jumped on sometimes without really understanding how it works and try to dig down and understand what makes it tick and how we can make it better. We're doing really exciting things across the generative space, both with language models, with vision and video models. And probably most exciting is multimodal models that combine all of that stuff together and it's fun to watch it happen and, and there's a lot of energy there. People are pretty excited about it.
B
Yeah. I mean, on the topic of multimodal models, you know, we're recording this just a few days after the recent announcement from OpenAI with GPT4O. I mean, I think that's an example of what you're talking about. Super impressive, super impressive to see these things come together.
C
Right.
B
You know, again, I'm not an expert on the technology, but my understanding is, you know, this is the same approach that we've seen to these large language models and other types of models over the past couple of years. But now when you put these things.
A
Together into one, just enables a whole.
B
Other type of interaction and experience. That's, that's pretty mind blowing.
C
Yeah. And it also opens up the data to orders of magnitude more. You ask this question about data.
B
Yeah. Where do you get that when you're.
C
Just dealing with languages, there's so much data, but then all of a sudden when you say, okay, let's add in, you know, videos and images and audio, now all of a sudden there's Enormous amount more data. And if you think about how people learn and experience the universe a little bit is by reading books. But a lot of our experiences is visual and is really experiencing it through seeing things. And now our models can do that as well.
B
Yeah. How does the research to understand what's happening with these models, whether it's a diffusion based model or a transformer, the sheer size of these things, how do you go in and understand what's actually happening under the hood?
C
Yeah, I mean, it's a case by case thing and you have to do pilot studies. In fact, very often when we build our big models internally, it actually stops being researched, it becomes a production task. Right. Because a lot of resources are being applied, a lot of people are being applied, we're curating lots of data. But before we set up for that, we'll try to do little pilot studies and we'll say do ablation studies, let's take this away and see what happens. To take this way and see what happens. And then we also try to just develop some math behind what's going on. So we can sort of predict that if we do something, what will happen. And then from that we start developing an understanding of what's going on, what's really represented by this embedding, by this latent space and being able to anticipate and predict what would happen if we make a change to the model or to the process or something like that.
B
Right. So like you said, it's a challenge of production. So even just this research to understand what's happening, just like building any of these models must require enormous scale in terms of compute, in terms of data. I mean, it's like you're building these models on your own from scratch.
C
Yeah, well, we are actually in some cases, but we try. When you put that many resources into something, you have to be pretty sure it's going to work. Or you have a difficult conversation with Jensen at the end of the day. And so you try to do the little pilot experiments in advance of that, so that when you run off a big training run, you have a very high probability of success. Right.
B
What else on the research side is getting you excited right now? I've heard you talk a little bit about autonomous vehicles. Is that an area where your team is spending a lot of time on research?
C
Yeah, we have one group doing autonomous vehicle research and they're very closely collaborating with our autonomous vehicle product team. There's some exciting things going on there, actually applying foundation models to autonomous vehicles, but both as a way of creating A training environment, being able to write a prompt and as a result get a scenario that you can then simulate and run your car through, but also having a model that you can use for the perception and planning and prediction of what the other actors in the scene are going to be doing. And so there's a lot of very exciting things coming together at that nexus of generative AI and autonomous vehicles. Now, there's also a bunch of exciting things going on on the supply side. I mean, we're constantly pushed to say, how can we stay ahead? In my opinion, I think we have the best platform for AI today. But as soon as we announce a gpu, other people can sort of copy what we've done. And in four years time, they'll have a platform that's probably as good as ours today. So how can we continue to stay ahead? And there's some pretty exciting things we're doing on that front as well in terms of new number representations for AI, new ways of handling sparsity of both the weights and activations in these models, and just ways of making the platform more efficient. So for a given amount of silicon area, given amount of power, how can we get more out of it?
B
Yeah, I've seen other companies recently, announcements, really large chips or, you know, lots of different takes on the architecture in them. And, you know, it often makes me wonder about, well, I wonder what Nvidia is going to do next. It sounds like you're thinking multiple steps ahead and maybe you already have the roadmap planned out on the supply side for multiple iterations of chips and designs.
C
Yeah, no, we have to stay ahead. I mean, the things we're going to do for the next couple generations are already pretty much in the bag. And I can't talk very much about that, of course.
B
Yeah, of course.
C
But in research, we're trying to look them beyond that and say, what are the things, you know, you know, three, four, five generations out. Right. And it's fun. It's, you know, with, in some sense, it makes computer design even more fun than when Moore's Law was in place, because back then you made small tweaks and run stuff in the new process and you get a faster CPU. Now, you know, we're getting maybe 10% out of a new generation of technology. So, you know, most of what makes it better is better computer architecture better, you know, better design. You know, things from the creative process, not from the, the semiconductor process.
B
So maybe applying what we talked about earlier with, you know, the explosion of generative AI over the past Couple of years back to autonomous vehicles. What, what has impressed you about, you know, maybe some of the advanced advancements in autonomous vehicles over the past year or two? It seems like on the consumer side there's been some pretty big breakthroughs. Waymo is now doing, I think tens of thousands of trips. Obviously, Tesla full self driving seems to be getting more and more reliable. Are we getting closer to this and how does the work that Nvidia does contribute?
C
Yeah, it's hard to say. I mean, it's one of these things that's a very difficult problem. In fact, I'm on the record of a decade ago saying that we were almost there and a decade later we're not. And the reason is it's one of these things where you've got to get that long tail. And it's really a problem of chasing the rare cases and making sure they're handled well. I think the leaders in this field, the Waymos of the world, are doing a great job of that. It involves an enormous amount of data. It involves having a great discipline and a real safety culture to make sure that you, you really made sure that under situations that you have not anticipated that the vehicle is going to respond correctly and everybody's going to remain safe. So we've done some really exciting things that I find interesting from a technology point of view with these generative models. And we've played around with this various approaches to the architecture of the vehicle, where we have the classic stages of perception and planning and control and the like. And then we also have versions that are end to end. And sometimes we try to combine those where we have the usual stages so we can reach in and both observe and control. But we also have trained them together. So we've actually trained our perception with a loss function that is conditioned by what the planning is. So it's perception tuned to what it's going to be used for. So they're exciting things in the technology there. But ultimately it's a tough game of chasing down the rare cases and making sure you handle them well. And it would be a much easier task if it was only autonomous vehicles on the road. You have to deal with these pesky humans that are out there that do difficult to predict things.
B
Right. So you're saying the autonomous vehicles, they'll do predictable things. What humans will do is completely unpredictable. And if we see something or if.
C
A machine, you can predict them. But every human is different and they may act in a different way. So you're predicting what the most likely human would be and you even try to develop technology which by observing the actors, both the cars and the pedestrians and the like, it tries to characterize them. That one is distracted, this one is aggressive, that one is about to fall asleep. And then predict what they will do based on that characterization. But even then it can be hard.
B
That's interesting. So the models will characterize drivers down to the individual, you know, car and say this, this driver is this type of driver. This driver is that type of driver.
C
Yeah, you need to characterize what a particular car will do because they're not all going to do the same thing. And you can observe, just like you would out on the road and observe and see what you think they're going to. That's going to.
B
Right? That's fascinating. That's fascinating. You mentioned something earlier about autonomous vehicles, how you're even using large language models to invent scenarios and test cases. I think I remember you saying, um, I, I think I read somewhere that that Nvidia has basically been implying applying generative AI to many stages of chip design even. Talk to me a little bit about that, like how is Nvidia actually leveraging this technology to help make chip design more efficient?
C
Yeah, that's, that's a great question. So we have a bunch of projects to apply AI to chip design to make, you know, to basically eat our own dog food in, in some sense. Probably the most most exciting one is, is one where we've taken large language models and, and then specialize them with what's called domain specific pre training. So we basically take a lot of data that we have, our entire repository of previous GPU designs, tests, design documentation, and train up the model on this. And what we found is that then we can get a model which is much better than just a general model, even a huge model like, like GPT4. We can take a llama 7DB or something, train it up on our own data, and it's better than a larger model at a number of tasks. And the most important ones are tasks that assist a designer to make them more productive. One thing we found is that junior designers tend to use a lot of senior designers time asking questions. And it's part of the process of becoming part of the team and learning how GPUs work and all that. But now we can have them ask the model a question and it gives them pretty good answers, which not only makes them more productive, but makes the people whose time they were using to answer the question more productive. These models have also been very good at summarizing bugs so you get a bug report that may be many pages long, and it's a bunch of logs out of some test case with where the test went awry. And it can summarize that bug now, and in many cases also, you know, arb somebody to say that there's an action required by a particular designer to now fix the bugs. That makes the process go better as well. And in some cases we have the models writing code, but we more often will have them writing test code or code that configures a particular design tool to do something than writing the code for the GPU itself. And then there are also applications where we take, we take this technology and we use it as part of the design process. One that I particularly like is we've developed a graph neural net that can take a circuit design and predict what the parasitics are going to be. And this is a huge productivity gain because it used to be the circuit designer would draw the circuit, hand it off to a layout designer, and a couple days later the layout designer would finish the layout and you'd extract the parasitics and the circuit designer would find it. It doesn't work because the parasitics are worse than I thought they would be, and they would have to try something different. And so the design cycle was a couple days around that loop. But with this tool, which it doesn't get it exactly right, but it's very good at predicting what the parasitics will be. It's now seconds around that loop. So the designer draws a schematic, predicts the parasitic, runs the simulation. Now they can iterate quickly while they still have everything in their head about, about what they were working on. Another really cool one is we apply reinforcement learning to designing the adders in our GPUs. This is a critical circuit. And it's also something that people have been thinking about really hard since the 1950s. And so there are textbooks written about how to design good adders. And it boils down to this problem of structuring a tree that does what's called a parallel prefix calculation. It's doing a running sum across the carries of bits to decide whether you have a carry into a particular bit of the adder. And in this problem that people have been meeting on since the 1950s, it turns out we applied reinforcement learning. We treated it like an Atari game of where you put the next carry, look ahead node in the tree. And it wound up beating the best known techniques by a substantial amount as well. And so this is something that's applied now to the design of the arithmetic circuits in our, in our GPUs. Another neat one is the productivity increases every time we move to a new technology. We go from 5nm to 3nm to 2. We have to redo the entire standard cell library, or even within a particular node, if we are targeting a different foundry, we have to redo the standard cell library for that foundry. That used to take a team of about 10 people about 9 months. So think of 90 person months. Now we have a reinforcement learning program that basically designs the standard cells and it achieves a higher quality so that the average cell is smaller than the ones designed by the humans and better in a few other metrics as well. But it does it in an overnight run on one GPU. And so that's a great example of applying this AI technology to making the GPUs better.
B
It's really cool. I mean, you're obviously hearing stories like this from all different types of companies about how they're leveraging AI. And so it makes sense that Nvidia, the company that is many ways inventing and creating AI, is leveraging AI to make AI more efficient. It's really cool to hear some of those examples. Maybe. Let's get into the team a little bit. You mentioned, I think you have like several hundred researchers with PhDs at Nvidia, is that right?
C
I read that somewhere, yeah, it's about 400.
B
How do you, how do you recruit and build a team like that? I mean, these are some of the smartest and brightest people in the world. How does that, how does assembling a team like that happen?
C
You know, it took a long time. So I, I came to Nvidia in 2009 and inherited a team of, you know, I think it was like about 15 people, most of whom were doing ray tracing, you know, computer graphics, and, you know, from there created groups doing architecture and circuits and doing AI. And when we first started in any given area, it was very hard because no one wants to come to a place where they're the only one doing something. But by getting some really good people to anchor each place and then hiring really good people, it then becomes easier to recruit talent because people like to join a team where there are other fun people to talk to and everybody is as smart as you are. And so we found that we had to really set the bar high and hold it there. As soon as we were to let that bar drop and start hiring mediocre people that would be hiring more mediocre people. So we've had to keep it high. And we try to create an environment where people like to be. So we have very little turnover. People come and they stay because they get to do what they want to do. They have the resources to do fun, fun experiments, they get to work with fun people, and they get to have an enormous amount of impact. One great thing about Nvidia is because we supply the whole industry. If you develop, whether it's a piece of new hardware for AI or a new type of model, a new training technique, it winds up benefiting everybody, benefiting the whole world. Whereas in some of the people we're competing with for talent, if they develop something, then their company will use that, but it won't be spread as widely as the things that we develop.
B
It strikes me that this team, you, you know, you've, you've been, we talked about your amazing career and this, this team of several hundred researchers, you've, you've been through several shifts, like big platform shifts, this one maybe being the biggest and, and maybe being the most relevant to your work. How does, how does this team, your team that we just talked about, kind of stay prepared and stay ahead of these shifts and maybe what advice would you have for entrepreneurs or startups that are also building in this space?
C
Yeah, so that's a really good question. So the best way to stay ahead of the revolution is to be the revolutionaries to create this shift. But we can't create all of them ourselves. And although we actually have developed many of the fundamental technologies along the way, some of them have come from outside. And so the other thing we tend to do is we tend to have a set of core technologies, core expertise, and then be very agile in applying that to different things. And so I would say at Nvidia, our core expertise is parallel processing and acceleration. We build processors that have hundreds of thousands of elements working in parallel, and then we specialize them. Over the years, we specialize them for, for raster graphics, polygon based rendering. That was the core technology from the early days of Nvidia. We specialized them for ray tracing. When we added our RT cores, we've specialized them for bioinformatics with the dynamic programming instructions that came out in the Hopper generation and then starting with really in the Pascal generation. But when we introduced the tensor cores in Volta, we added specialization for AI. And so those two technologies, parallel processing and domain specific acceleration, are very powerful. And so what we have to do then is to anticipate what is the next big application shift that is going to demand a different type of specialization. The Parallel processing is quite universal. You can apply that to everything. Key applications need different domain specific acceleration and even AI as it shifted over time, what that domain specific acceleration is, has shifted. So we need to be agile in taking those two core technologies and anticipating the applications and getting ahead of them. And I think that's what anybody should be trying to do. They should have a core expertise and get ahead of the applications. Right, right, right.
B
You were able, you were in a very interesting place that you were able to see a lot of this stuff coming, as you said. Maybe you didn't expect it to happen as, as big and as quickly as it has, but you know, what are, what are the things that you think maybe entrepreneurs are going to need to prepare themselves for over the coming years?
C
Yeah, you know, it's, it's, you're talking about new technologies coming along, new technologies.
B
That they're going to. So we know what everyone's building for today. You know, what might they be building for two years from now, three years from now?
C
Well, to me, the real, if I look back and try to use that as a way of predicting forward. A decade ago we were worried about convnets and recurrent neural networks and then transformers came around and all of a sudden nobody cares about recurrent neural networks anymore. And then you were doing GANS for image synthesis and diffusion networks came along. And so what we have to be able to do is to anticipate new models coming along. People are always developing new models. It's just most of them are no better than the old models since they don't get adapted. And so we're constantly on our toes trying to figure out what's next. And it's hard to say. We'll look at half a dozen things and none of them will pan out. But, but we're prepared and we're agile enough that we could, if any of those had taken off, we could have tracked them. Another thing we do is I personally spend a lot of time visiting universities and talking to people who are working on the ideas for the next thing to try to at least get a feel for what the candidates are, what's out there and what might come to play.
B
You mentioned there are new models. Being an architecture models being developed all the time, you don't really know necessarily which one is going to work. Maybe taking the transformer model architecture as an example, when was that obvious that that was going to matter?
C
Yeah, pretty soon. What was it? The paper attention is all you need came out in something like 2017. And even at that point it was pretty clear that transformers were winning. And you have to take the title of that paper into context. The reason for that title was what was considered to be the right model at that point was a hybrid of a transformer with a recurrent network. Because the idea was each was giving you something. And then the point of that paper was you didn't need the recurrent network if you had the transformers. That was all you needed. The evidence then was that, yeah, that was working better than the recurrent networks. Now, at that point in time, it was being applied to models like bert, which was, I forget, a couple hundred million parameters, maybe that's even large. And I think what people hadn't anticipated is how that would scale up. And as that scaled, it just got better and it won even more. Because the real problem with the recurrent networks is that it was a difficult training process to propagate things back through those recurrent cells. And so as you got more data and built larger models and it took way longer to train them and the scaling didn't work out as well. But for that one, I think it was pretty clear even in the early days that it was a win. And one thing that's impressed me about the whole evolution of AI over the past 15 years or so has been how rapidly people have adopted shifts. I spent a lot of time in the supercomputing world where people would have these codes that they would have, and if you were a supplier, I worked with Cray for a while, you had to run everybody's codes and they would have codes that were 20 years old and they didn't want to change their codes. And so you had this huge inertia in the field of legacy codes that were slow to change. And a lot of the enterprise computing world works that way as well. I mean, banks are still running code written in cobol, but in the AI world, people throw stuff away overnight and tomorrow they have a new model. They don't care about the. It's fun, it moves really quickly.
B
Are there any new or upcoming model architectures that you're particularly intrigued by or excited by?
C
I'm very excited by these state space models, and it's not clear that they're going to win yet, but there's some ideas in there that ultimately are probably going to pan out. In some sense, it's going back to recurrent networks.
B
Interesting. Yeah. What about the state space models do you find so interesting?
C
People have at least done a couple studies that show for these studies that smaller models with less training get better results. And if that actually pans out in general, they'll wind up replacing transformers. I don't think they've gotten to that state yet. Got it.
B
I believe you're an adjunct professor at Stanford and you were obviously former chairman there, the computer science department, I believe. And you give talks at universities and colleges all across the country. I guess two questions. What is maybe a piece of advice that you find yourself giving to people that are about to enter the industry? And also, what are you learning from academia for students who are growing up in this sort of age of AI?
C
Okay, those are both really good questions. So you know that first one, I was actually asked that after a talk I gave at Georgia Tech. It was probably about a month ago. And my comments were, first of all, to realize that new graduates, really, what they have is a license to learn. They've been sort of prepared with a lot of theory and a lot of basics, but are not yet really useful or dangerous in a way. And so what's important for their first job is to pick the job where they're going to learn a lot and to learn the right set of things. And so I think the two characteristics there is to pick a place that has a lot of really smart people to work with because you'll learn from them, and that is working on really leading edge problems, because you want to learn stuff about leading edge problems, not about the stuff that is no longer at the cutting edge. And it's probably also important to have a culture that's good because if it's not, then you can just wind up getting caught up in a lot of nasty politics and have an unpleasant experience. So I like to think that Nvidia is really the ideal place for all these people to come because we check all three of those boxes. We've got lots of really smart people working on leading edge problems. We've got a great culture. So what was the other question? What am I learning?
B
What are you learning from students who are growing up with AI, Right. Who are living through this stuff?
C
Yeah. So it's interesting. This is both students and also the new college grads we hired. Mvidia. I really enjoy talking with them because they have a different perspective. They kind of almost take some of this stuff for granted. But then on the other hand, from that point of view, when a new technology comes out, they see it differently. And hearing their perspective makes me think about it differently. And so I like anytime there's a new technology, I actually like to talk to some of the new hires about it and see what they Think. And I also enjoy just going to the universities and talking to, you know, to current students, the graduate students who are working on this stuff and just getting their perspective, because that perspective about what is coming next can sometimes be clearer than what I'm seeing, because they're seeing it without a bunch of baggage. From what we're doing now, what are.
B
Students of a decade from now not going to be learning that they're. Because of AI that students are actively learning and being a part of right now. What's going to go away?
C
Yeah, that's an interesting question. You know, when I was chair of the computer science department, one of the things I did is I streamlined our curriculum so we had very few required courses, and a whole bunch of people got really unhappy with me because when their course was no longer required, nobody was taking it. And maybe there should be no required courses and we should just let people take what they want. But I would hope that some things are just the core to computer science. You know, algorithms, automata theory, you know, the. The basic theory of computer science, how to program, how to think computationally. But, you know, a lot of what, you know, we learned about sort of, you know, structuring things by, you know, writing classical code is no longer how people build applications. Right now they build applications by, you know, getting an API to an LLM and, you know, you know, piping their data into that. And so I think the generation of students that's coming out even now, but certainly it'll be much more in five or ten years will be thinking about how to plug together AI through a bunch of APIs, and that's how you're going to build things. Something's going to have to go away to make more time to learn that, and it's hard to see what that will be, but I think it'll be the more classical ways of programming.
B
So, yeah, so that it sounds like you're saying coding almost, and programming almost goes away, in a sense. And people.
C
Somebody's got to write some code. So we have, you know, we have Pytorch and things like that. Right, right.
B
Super interesting, Bill. This has been fascinating. I. I've learned so much. I'm. I'm really sure that the audience will as well. So I want to thank you so much for very, very generous time. We know you're a busy guy, so.
C
Thank you very much. This is fun, Michael, and look forward to hearing the podcast.
B
Thank you so much for listening to Generative. Now, if you liked this episode, please do us a favor and rate and review the podcast on Spotify and Apple podcasts. That really does help. And if you want to learn more, follow lightspeed at LightSpeed VP on YouTube, Twitter X, LinkedIn, everywhere else. Generative now is produced by Lightspeed in partnership with Pod People. I am Michael McDano and we will be back next week with another conversation. See you then.
Episode: Bill Dally: NVIDIA’s Evolution and Revolution of AI and Computing (Encore)
Host: Michael Mignano, Lightspeed Venture Partners
Guest: Bill Dally, Chief Scientist and Senior VP for Research, NVIDIA
Air Date: January 16, 2025
This encore episode features a conversation between Michael Mignano and Bill Dally—NVIDIA’s Chief Scientist and Senior VP for Research—exploring the company’s pivotal role in the ongoing AI revolution. The discussion covers Bill Dally’s personal journey from academia to industry leadership, NVIDIA’s groundbreaking advances in hardware and AI research, the rapid evolution of generative AI, and how NVIDIA harnesses its own technology for chip design and innovation. The episode also offers insights into future trends, advice for AI entrepreneurs, and reflections on how AI is transforming both education and work.
Early Fascination with Neural Networks:
Bill’s interest began 40 years ago at Caltech, experimenting with multilayer perceptrons and Hopfield nets when compute power was limited.
“We built little multilayer perceptrons and convnets...but it also impressed me that it was a toy…the COMPUTE wasn't there at the time.” (02:08, Bill Dally)
Pioneering Parallel Computing:
As a professor at MIT, Bill saw parallelism as the way to scale performance, even as Moore’s Law made people complacent sticking with serial processes.
Stream Processing to Cuda:
At Stanford, developed stream processing and partnered with NVIDIA to create the NV50 (marketed as G80), making parallel processing accessible via CUDA.
Early Signs of AI Potential:
Reflected on Stanford’s Grand Challenge win for autonomous vehicles, noting the power of data-driven feature extraction and the era’s compute limits.
Joining NVIDIA and Enabling Modern AI:
Breakfast with Andrew Ng (Google Brain) in 2010 was a turning point; Bill realized GPUs could unlock neural networks’ promise, leading to the development of cuDNN.
The AI Explosion:
Bill always believed AI would revolutionize every aspect of life (“how we play, how we work, how we educate, how we get medical care”), but the pace post-ChatGPT surprised him:
“I thought the change was going to be more gradual and not quite as frenetic as it's turned out to be…when ChatGPT came out, it was like somebody turned the rate knob way up…” (06:31, Bill Dally)
On Data as a Limitation:
While web data might soon plateau, synthetic and private data repositories provide ample room for growth (07:37–08:16).
Dual Role:
Culture and Leadership:
Bill's mantra: “My job is to get obstacles out of their way. I try to enable them by finding out what's blocking them and remove the blockages so they can do amazing things.” (09:22)
Generative AI and Multimodal Models:
Major focus on advancing generative models—language, vision, video, and especially multimodal approaches.
Improving Model Understanding and Efficiency:
The Finland research team’s notable work on diffusion models exemplifies deep dives into the fundamentals of new tech.
On Data Enabled by Multimodal:
Incorporating videos, images, and audio exponentially increases the training data pool and brings AI closer to human-like experiential learning.
"Now our models can do that as well." (12:47, Bill Dally)
Pilot and Ablation Studies:
NVIDIA performs small-scale studies to understand model mechanics before investing in massive training runs, blending “math behind what’s going on” with empirical validation.
Production vs. Research:
Scaling from research to production requires confidence—a failed big run means “a difficult conversation with Jensen at the end of the day.” (14:23, Bill Dally, referencing Jensen Huang, NVIDIA CEO)
Foundation Models for Autonomous Vehicles:
Staying Ahead in Hardware:
Working on new number representations, sparsity handling, and efficiency in GPUs—they plan several chip generations ahead.
Creativity over Moore’s Law:
Architectural and design innovation matters more now as hardware improvements alone offer diminishing returns.
The ‘Long Tail’ Challenge:
Handling rare driving scenarios remains tough:
“A decade ago saying that we were almost there, and a decade later we're not…ultimately it's a tough game of chasing down the rare cases and making sure you handle them well.” (18:00, Bill Dally)
Modeling Human Behavior:
Autonomous systems are learning to characterize and predict individual drivers’ behaviors (“this one is aggressive, that one is about to fall asleep”).
Large Language Models for Hardware Design:
Domain-specific pretrained LLMs help new designers learn faster, summarize bugs, and write code for tool configuration.
Graph Neural Networks & Reinforcement Learning:
Standard Cell Library Automation:
Designing standard cells now takes an RL model overnight (vs. 90 person-months), producing higher-quality designs.
“...in an overnight run on one GPU…it achieves a higher quality so that the average cell is smaller than the ones designed by the humans…” (25:34, Bill Dally)
Growth and Recruiting:
From 15 graphics-focused researchers to ~400 across fields, with deliberate hiring of top talent to create a high-bar, low-turnover culture.
Industry Impact:
NVIDIA’s breakthroughs spread industry-wide, making recruitment attractive compared to companies where new work stays siloed.
Core Competencies & Agility:
NVIDIA’s strengths: parallel processing and domain-specific acceleration. The team continually anticipates new applications and adapts the platform accordingly.
Advice to Builders:
“...have a core expertise and get ahead of the applications…” (30:58, Bill Dally)
Staying Nimble:
New neural network architectures constantly emerge—some (like transformers) replace established methods rapidly post-breakthrough.
Academic Outreach:
Bill spends time with universities to scout new ideas, keeping NVIDIA close to early-stage innovation.
Adoption Trends:
Notably, “in the AI world, people throw stuff away overnight and tomorrow they have a new model.” (34:45, Bill Dally)
State Space Models:
Bill is intrigued by these as a possible successor to transformers if their promise holds.
Advice to Recent Graduates:
“New graduates…what they have is a license to learn…it's important for their first job is to pick the job where they're going to learn a lot and to learn the right set of things.” (36:21, Bill Dally)
Learning from New Generations:
Young talent bring fresh perspectives; engaging with students helps veterans see beyond established paradigms.
The Future of Computer Science Education:
On the Speed of AI’s Rise:
“I was convinced…AI…was going to revolutionize all of human endeavor…But then when ChatGPT came out, it was like somebody turned the rate knob way up…” (06:28–07:16, Bill Dally)
On Multimodal Models and Data:
“A lot of our experiences is visual…now our models can do that as well.” (12:52, Bill Dally)
On Building a Research Team:
“We had to really set the bar high and hold it there…People come and they stay because they get to do what they want to do. They have the resources to do fun, fun experiments…impact…the whole world.” (26:44, Bill Dally)
On the AI vs. Supercomputing World:
“In the AI world, people throw stuff away overnight and tomorrow they have a new model. They don't care about the. It's fun, it moves really quickly.” (34:45, Bill Dally)
On Shifting Role of Programming:
“Right now they build applications by, you know, getting an API to an LLM and…piping their data into that…I think it'll be the more classical ways of programming [that go away].” (39:17, Bill Dally)
Turning Andrew Ng’s Cat Detector Code into cuDNN:
A pivotal moment, initiating NVIDIA’s direct role in accelerating deep learning (03:21–06:00).
Applying RL to Hardware Design:
Using reinforcement learning to outperform “textbook” adder designs in GPU circuits (23:39–25:56).
Education and the ‘API Era’:
Bill’s vision of a future where plugging together AI services supplants traditional programming skills (38:46–40:06).
This episode offers a sweeping yet grounded exploration of AI’s present and future from the perspective of Bill Dally—a foundational thinker driving NVIDIA’s relentless innovation. Whether discussing hardware breakthroughs, the unpredictable trajectory of AI models, or the changing landscape of education and work, Dally’s blend of technical insight, experience, and candid advice provides listeners with a vivid sense of how the AI revolution is being built from the inside—and where it might go next.