Loading summary
David Bau
Historically, as an engineer, as a computer scientist, it's really been our responsibility to understand the systems that we make, to make sure that they are doing what we want, that they operate correctly. And I feel that the new discipline of machine learning, because it's become so important to just accept these black boxes and use them even though we don't understand them, is leading to really an unhealthy turn in the practice of engineering in computer science. We're training a whole new generation of computer scientists to be comfortable with this idea that they shouldn't really look inside these complicated black boxes.
Jasia Monk
And now the good fight with Jasia Monk.
Podcast Narrator
There is, at the moment, for good reason, a lot of debate about artificial intelligence. How capable are current models? Are we approaching something like artificial general intelligence? And how big are the impacts on the world going to be? Is this, for example, going to lead to a massive loss for jobs, not
Jasia Monk
only for drivers and other kinds of
Podcast Narrator
blue collar, but professions, but also perhaps for a lot of white collar professionals?
Jasia Monk
Well, one thing that strikes me about
Podcast Narrator
this debate is that it's often held without real knowledge of the underlying technology. And so I was really intrigued to learn more about the nuts and bolts of how AI actually works. And when I was recently at a conference about artificial intelligence at Harvard, I had the good fortune of running into David Bao. David was at Google for a long time, is now a computer scientist at Northeastern University, and he is really good at explaining what a neural network is, how it is that you train an AI model, and what that means for how to think about this technology. So I invited him on the podcast and got him to give me and you a 101 introduction into how AI models actually work.
Jasia Monk
David Bau, welcome to the podcast.
David Bau
Thank you for having me.
Jasia Monk
I really look forward to this conversation. We met recently at a workshop about artificial intelligence at Harvard, and I thought that in a conversation we had, you had helped me better to understand the nature and the architecture and the technology of artificial intelligence than anybody had before. So I think, Ford, I would love to talk to you about this on the podcast. Fundamentally, how do current AI models work? When we say they are LLMs, large language models, what does that mean? And how does that distinguish this kind of form of artificial intelligence from other forms that we've historically used?
David Bau
Sure. So they're kind of a generative model, which means that they are just open ended models of a kind of behavior as opposed to being trained to make really narrow decisions. So what was really popular in AI up until recently was to train classifiers to Solve specific problems to help you make a specific decision. But then in recent years it's been more popular or actually kind of more, more amazing to create models that have more open ended goals. So a large language model is pretty simple in concept. Its job is to imitate language, human language. But imitating human language is a lot richer than making a yes or no decision or putting answering a simple multiple choice question or something like that, which is what we used to train AI to do.
Jasia Monk
So perhaps help us understand what these classifiers were to understand sort of what the difference is with these large language models. It sounds like you're saying what the point was to classify into yes or no, into one of four or five different kinds of buckets.
David Bau
Yeah, I'll describe it in the way that I sort of describe the difference between these two types of models to students. So basically, if you train an AI to classify inputs into a bunch of different categories, what you're doing is you're asking them to tell you the difference between. So let me give you a couple examples. You could tell as models to tell you the difference between a picture of a cat and a picture of a dog. Or for a more realistic application, you might ask people, you might ask an AI to tell you the difference between a piece of writing that was written well and a piece of writing that was written badly, or a movie review that was positive or a movie review that was negative. So a company like Yelp might do this to take a look at your reviews to see if you tend to write positive reviews or negative reviews, or if a specific review are positive or negative.
Jasia Monk
What is the startup in the TV show Silicon Valley which seems really stupid and then suddenly becomes important. He's trying to classify whether something's a hot dog or not, I think. Right. And when it supposedly has reported. So that was a classifier.
David Bau
That's right, that's right, that's right. So it was really one of the earliest problems that people put in front of AI. I think the very first neural network that was ever made was a thing called the Perceptron. And it was trained to classify the difference between pictures of boys and pictures of girls. They took a lot of pictures of students and they showed that this little neural network that had, I think, 64 neurons, if they properly configured it, it could tell the difference between these types of pictures. And that was considered an amazing feat at the time. And that same class of problems has been with us for more than 50 years. It's the classical thing that we've been developing the Science of artificial intelligence, neural networks and machine learning to solve. And it's a powerful framework, but it gives the AIs, it gives the machine learning models a lot of space for taking shortcuts. So, for example, if you needed to tell the difference between a cat and a dog, it might be that it's sufficient to not look at every aspect of that picture. Right. You might be able to just look for the tips of the ears and recognize that cats have little pointy ears and dogs tend not to without looking at the rest of the image. So one of the things that classifiers are really good at is they're really good at identifying what the most salient difference is, focusing on that and making a decision based on that, which is great. It leads them to be very accurate, but it also means that they don't necessarily develop a complete understanding of the world. If you invented a picture of a pointy eared dog and you gave it to one of these classifiers that was focusing on the tips of the ears, it would say, ah, that's clearly a cat. It might not recognize that there was something else wrong with the image, even
Jasia Monk
if a rest of the image very clearly looks like a dog. And it's not really a hard case in other ways. Right.
David Bau
It's the kind of thing a classifier would do. Yes.
Jasia Monk
So tell me a little bit about the technology behind it. We may be going too far back now, but is a classifier a similar kind of technology with fewer neurons, whatever that exactly means? Perhaps you can explain that to us. And fewer processing power behind it, or is it a completely different kind of thing? When we move from classifiers to the kinds of forms of large language models we have now, is it sort of building on the same technology, or is it a completely different kind of avenue towards how to create this form of intelligence?
David Bau
Oh, it's basically the same kind of technology. There's. I'd say that there's relatively few major innovations that separate classical classifiers as invented in the 1950s from modern large language models. There's been a lot of gradual, small, clever innovations, but in terms of major, major innovations, there's been relatively few. We're really, we're really going after the problem using the same techniques that we've used since the. Well, since the 1980s is where a lot of the innovations were really established that we're still using today.
Jasia Monk
So tell us about what these techniques are. One idea that often comes up is neurons and neural networks. I can understand what a neural network might be in my brain. I Am far from being a neuroscientist, but I understand that a neuron is, I believe, a cell, unless I'm getting badly wrong. And my brain is a kind of neural network. All of these cells are connected in some complicated way. What does it mean for an AI to have neurons or be a neural network?
David Bau
So there's a popular term that you'll see sometimes called a deep neural network. And what makes a neural network interesting is its depth, is that you don't just. So what a neural network does is it's inspired by the architecture of the human brain. All it does is compute a bunch of numbers. So the input comes into the neural network as words or as images. And the first thing that you do is you just convert it to a bunch of numbers and feed each number into a neuron and then you connect the neurons so that they pass the numbers from one to the other. And if a neuron has a bunch of inputs, it adds the numbers together and does a small bit of computation and then creates another number that feeds onto the next layer. Yeah. So neural networks are just this big mess of neurons that are connected to one another to produce some output that you hope is useful. And if you just created a random neural network, it probably wouldn't do anything useful, but it would do something with your data. And the trick for artificial intelligence, the trick for machine learning, is a procedure that you use to train the neural network to strengthen and weaken all of the connections between the neurons, to transform a random machine, a random function, into something that does something useful. So it's the training process that makes a neural network sort of magical.
Jasia Monk
Okay, so take me a step back, focus. I want to understand the training process. But before that, I now get this image of a bunch of human cells, or I guess a bunch of digital cells. What? Exactly. That means transmitting this information. Right. You see something with your human eye that's a visual stimulus in some kind of way. That stimulus gets translated into a bunch of signals that neurons fire at each other. And then there's some kind of way
Podcast Narrator
of using that information.
Jasia Monk
Help me understand a little bit more how that works in a computer and what the point of these simple calculations that you mentioned is.
David Bau
Right, right. So the simple calculations are probably simpler than you even would imagine as a non technical person. Every neuron is just a sum of all the inputs that come into it. It's a weighted sum where if you just have one neuron that's connected to 1000 inputs, it just adds up all the inputs and then looks at whether the numbers come out to be positive or negative. And if the numbers come out to be positive, the neuron will do one thing in the output. Like, maybe it'll just transmit the sum out. And if it's negative, the neurons will do something different. Maybe they'll just output zero, and then that output just becomes another number that goes into other neurons. And so. So each neuron is an extremely simple step. It's just adding things and then looking at the answer and then producing an output. So you might think, so now I
Jasia Monk
can kind of imagine what it looks like, but now I'm having trouble understanding why it's useful. Right? So you have these very simple calculations. They add up, these sort of. Okay, so how is it that that means that I can. We're jumping a number of steps, but, like, I have trouble relating that to why I can have a conversation with ChatGPT using my voice, and it talks back at me. So I know there's going to be lots of steps to get there. But why is it that this neural network is such a powerful tool, such a useful idea?
David Bau
Right. Why is it so useful? You can answer this kind of two ways, but I have to say, the question you're asking is actually one of the core puzzles behind neural networks. So let me give you a little bit of history, is that neural networks are one of the oldest forms of programming that were ever devised. They were proposed in the 1940s, before digital computers were that widespread. And they've been with us ever since in different guises. And one of the reasons it's taken so long for neural networks to become so prominent in computing is that everybody had the question that you're asking now. You know, even if you can get these things to work, why would we expect them to work? You know, in the 1950s and the 1960s, a scientist named Rosenblatt demonstrated that you could get neural networks to do some useful things. But it never really caught on because everybody has this question, well, you know, there's these neurons that are just passing numbers around. What do these numbers mean? How can we be sure that they do anything useful? You know, is there anything that explains why they do something useful, why, you know, sometimes they don't work? And can we tell the difference? And so. So it really took a long time. So there were a lot of different ways of doing machine learning, where the numbers inside the AI are more understandable. And so for a long time, for many years, the mainline AI scientists believed that you should use one of these other approaches, which were more transparent, which were more designed to be explainable. And it's really just in recent years that we've decided to use neural networks that have this key disadvantage, which is that we don't know what the numbers for the. What the numbers that come out of a neuron are for. We don't know if they have a particular purpose. We don't know if some of the neurons are more important than other ones. We don't know under what conditions they learn something good or learn something bad. It's very opaque. But one of the things that's happened though, is that neural networks work so well, and we have some understanding of some of the things that lead to them working so well. They work so well that the field has really become comfortable with this idea that we should just use these black boxes. They're so useful that maybe it doesn't matter that we don't really understand what the neurons are doing and what they're for or what's being computed inside. Now, one of the reasons I was at this workshop and is that I'm really concerned about this philosophy that we've adopted in the AI industry and in machine learning. You know, historically, as an engineer, as a computer scientist, it's really been our responsibility to understand the systems that we make, to make sure that they're doing what we want, that they operate correctly. And I feel that the new discipline of machine learning, because it's become so important to just accept these black boxes and use them even though we don't understand them, is leading to really an unhealthy turn in the practice of engineering in computer science. We're training a whole new generation of computer scientists to be comfortable with this idea that they shouldn't really look inside these complicated black boxes, that it's not something that is understandable or their responsibility to understand. And I think this is a fundamental error. I think that one of the most important things we should be doing as AI scientists and practitioners is to try to resolve this fundamental problem with large scale neural networks and really try to understand what it is that's making them work and what causes them to work well, work badly, or acquire certain types of behavior.
Jasia Monk
And that's what a lot of your work is about. Right. So your work on interpretability, which I understand is basically an attempt to learn, look under the hood to get a better sense of what is in fact going on in that black box in various ways.
David Bau
That's right. So I work in an area that you could call post hoc interpretability. There's a couple approaches to achieving Interpretability in machine learning. One is to simplify the machine learning models to the point where a person can look at them and understand and explain what each of the steps is doing. But the area that I work on is postdoc interpretability, which is, okay, let's say we didn't do that. Let's say we decided to use a neural network with millions or billions of neurons that are just far too complicated for us to have a preconceived idea of how they work. Can we go in there and analyze the system the same way that a biologist might analyze a complicated emergent biological system and ask, can we understand the structure of these learned computations after the fact, after they're trained, even if we didn't try to restrict them ahead of the time to make them understandable by people?
Jasia Monk
So I'd love to get a little bit more into detail on what you think we can understand and what progress we've made in that field of interpretability in a while. But to go back to the sort of bottom up building of our understanding of these AI models. So we have these neural networks. They're a very simple process. Actually, we don't fully understand why it is that they've proven to be so phenomenally useful. We started out having limited resources, where we had limited amount of neurons in these networks and probably limited data to throw at them. And so we might be able to get them to distinguish between a dog and a cat and between a girl and a boy, and perhaps between something that is a hot dog and something that is not a hot dog. But then somebody says, let's up the scale of ambition here. Let's throw a bunch more data at it. Let's do a bunch of other things. You'll tell me what they are, and perhaps they're going to be able to be these general purpose large language models that aren't just trained to one very, very specific goal, but that can actually help us with a huge variety of tasks, from writing a poem, to summarizing a text, to generating an image, to all of these other kinds of things? So how sort of what technical progress or what changes have allowed us to go there? And how is it again that this neural network suddenly scales to be able to do these kinds of things?
David Bau
Oh yeah, that's great. Okay, so there's a couple things to know and things that readers might have or your listeners might have heard of. But let me first emphasize that these really large neural networks are not that different from the classifiers that we've played with for 50 years, other than the fact that they're bigger. So a language model, even though I've said a language model is a generative model, it's not a classifier. Actually, from a technological point of view, from the way that it's built, a language model is a classifier. It solves a slightly more open ended classification problem than we solved previously, but it fundamentally just solves repeated classification problems. And what is a classification problem? It's this big, big multiple choice question, which is what is the next word?
Jasia Monk
Is it cat or is it dog?
David Bau
Yeah, is the next word cat is the next word dog. But it's not a two way choice that the language model is facing. We typically give a language model a vocabulary of something like 50,000 words and syllables and letters. And we tell the language model, you have this 50,000 way choice, what is the next word? That's the right answer in this context. And so as input, we give the language model all the previous words and we ask it multiple choice, tell me what you think the right next word is. So it's just a classifier, just like we train these classifiers to tell the difference between cats and dogs. But the scale of modern parallel computing GPUs has allowed us to make these classifiers that can do a pretty good job at doing larger scale classification problems. The outputs are bigger, you know, this 50,000 way choice instead of a two way choice or a 10 way choice, which we might have traditionally done. And the inputs are bigger instead of just one image or one sentence. We can feed these models entire books, you know, entire histories of the text for them to look at to decide on what the next word should be. And there's been a couple key architectural innovations that allow these models to consume and learn to use such huge inputs and solve such open ended output problems. But fundamentally they're just neural networks. They're just a bunch of neurons that are connected in the same way that Rosenblatt was connecting neurons in the 1950s.
Jasia Monk
Okay, so you think basically it's the same as what we were doing 80 years ago or something, which is astonishing. But there's been a few technological innovations that allow these models to consume so much information and therefore go beyond the limits of earlier models. Can you give us an example or is there a particularly important innovation in that respect?
David Bau
Sure. I think that one of the things that's happened in recent years is there's been this rise of this one particular neural network architecture strategy called the transformer, and it's really taken over the industry. There used to be a wide variety of different neural network architectures. But more and more we're converging to using transformers for everything. And the fundamental thing that transformers do is that they introduce a form of short term memory that we call attention. And what this means is that instead of the models only consulting with what they learned during training, the models develop this ability to learn from the inputs that they provided. So, for example, let me see if I can distill a simple example here. So, for example, if there is a particular person that you're asking the language model about, then with traditional training, you would expect that language model to only be able to answer questions about that person if information about them appeared in the training data. But in real life, you often have situations where somebody asks you a question about a person and you didn't meet this person in your childhood, you didn't read about them in school, you just met them today, you just had a conversation with them, and then somebody comes and asks you about the person and you have to answer about it now. And this short term reasoning is something that traditional neural networks are not very good at. But what the transformer architecture does is it introduces a special way of connecting the neurons called an attention layer, which allows the network to look back at previous things that happened recently in the recent input and use them as a type of memory and manipulate those memories and reason about them. And so, so this turns out to be so powerful that you can really think of transformers as a different class of neural networks. It was a major innovation.
Jasia Monk
So is the difference if I'm trying to think in terms of my interaction with something like ChatGPT or Claude or Grok or any of the other AI models, if it didn't have a Transformer, it's the problem that basically every question I would put to it, it would answer the way it would have done before I started the conversation. And so it essentially becomes difficult to have a back and forth conversation. And the transformer is what allows it to sort of keep a conversational context in mind and to have a kind of ongoing, progressive conversation. Or did I misunderstand that? And what the transformer allows it to do is really a different thing.
David Bau
I think that's right. So the transformer really has made it possible for you to teach the neural network things by telling it something, rather than being in control of the whole training process. And so, yes, that's what when you have a conversation with a person, you're constantly learning and teaching the other person your ideas, you're sharing your concepts and your you're sharing your understanding of the world with the person and they absorb allows the conversation to proceed. And transformers allow a neural network to develop the ability to have the same kind of conversation, to develop an understanding during the course of a conversation, to learn things in the short run that it uses to make immediate responses, to recall things that happened recently, as opposed to only relying on long term memory. Now, transformers are not the first architecture to try to do this. There were architectures even proposed in the 1980s to do this called recurrent neural networks. You might have heard of lstms, long Short Term memory networks. So this idea that you should have a short term memory, that you should be able to solve problems in this way, is not totally new. But what a transformer does is it makes it really efficient to be able to train networks that can do this. It was an innovation that showed that this old idea could really be made practical and it could be scaled up in a big way.
Jasia Monk
And technically, is it understandable for somebody without a computer science background how it is that a transformer does that? Or do we get into areas that are too complicated to understand?
David Bau
I think that the main thing to understand is that if you were to try to train something with a short term memory, then since short term memory seems such a sequential process, the natural ways that you would end up training a neural network to have short term memory are very sequential. They're very one at a time. Like, you might tell the neural network some things and then immediately turn around and ask it to make a prediction of the thing that you just told it. And then based on that, you might go on to the next step, because time proceeds in a stepwise fashion. And so, so that's so. So that can be done, but that tends to be very slow. And the big innovation around the AI industry has to do with parallel computing. That the reason that training neural networks is so efficient is that we can process many, many inputs and learn many, many things all at the same time in parallel on these marvelous GPU devices. And the recurrent neural network architectures of the past didn't fit very well with this parallel computation model. There are a lot of things in the training of them that were inherently sequential, so. So training them was pretty slow. What a transformer does is it just changes some assumptions in how this memory works that allow it to be paralyzed really well. And I'm not sure it's super interesting to go into the details of how it's parallelized. It does mean that the transformers are a little bit more limited, theoretically. Than the old RNNs. But the limitations are carefully chosen and carefully architected to allow parallelism and to try to cover up for the limitations. There's a concept that transformers have that the old RNNs don't have, which is called a context window. So sometimes if you go and buy an AI product, they'll tell you this product has a certain context window and this other product has a larger context window. You might have to pay more money for something that has a larger context window. And the idea of a context window is something that transformers introduced to allow them to parallelize the training. And what a context window is, it's a fixed length of number of words in the past that the transformer can see when it's trying to remember its short term memory. An RNN has an infinite context window. In principle, it could remember everything in the past since it was created, since it was first turned on. But a transformer will be trained with a fixed context window. If the transformer has a context window of 1,000, it means that after you say 1,001 words, then that very first word that you said is no longer in the short term memory of the transformer. It won't be able to remember that anymore. But that simple limitation ends up being an enabling factor for training. It allows the neurons to be hooked up in a way that the transformer can be trained in this massively parallel way, which is many times more efficient than training an rnn.
Jasia Monk
And the limits of these context windows still become somewhat relevant, I believe today, in terms of technical limitations. Some of the existing large language models, because you can sort of go back and forth for a certain amount of time, and then at some point it sort of loses track of the beginning of your conversation, or if you ask it to do much more complex tasks, then it stays on task, it stays on track for a while, and then it sort of stops being able to retain the information it needed in order to carry it out in the right way. Is that roughly right?
David Bau
That's roughly right. That's right. So there's really two effects here. So one is there's a hard context window where the transformer has no hope of understanding things that are beyond its context window. And then there's a soft decay of its memory. Where these neural networks are statistical machines, they're never perfect. And as the conversation gets longer, as the context gets longer, even for things that are theoretically inside the context window, the transformer will have more difficulty at accurately recalling and processing things that are further in the past.
Jasia Monk
All Right. So I want to go back to the overall architecture of the AI now. So I'm going to create a hypothetical scenario here, David, what if I gave you a billion dollars, or somebody gave you a billion dollars and said, build me an AI, what would you do? So you'd build a neural network, it would have a transformer, all of those. What actually are the steps here? You have to train it, and then once you've trained it, you have to kind of adjust it. What does that mean concretely?
David Bau
Okay, that's great. So modern machine learning really has two steps.
Jasia Monk
And
David Bau
if you gave me a billion dollars to train a neural network, the task in front of me would be, first I'd have to do something called pre training the network. And then the second thing that I'd have to do is I'd have to fine tune the network to have a certain personality to achieve a certain goal that I want the network to help me to do. So this idea, this split between pre training and fine tuning is one of the fundamental pieces of lore, one of the fundamental rules of thumb that we have learned that's quite profound in modern machine learning. And the idea is this, is that if you go straight to trying to train an AI to solve the problem that you care about, then you miss a lot of opportunities to get an AI that has profound smarts, that has profound understanding of the world, because there's a lot of other problems that are unrelated to the problem that you are making the AI for that it could learn from, that it could generalize from. And so what people are realizing is that the way to make an AI is to begin by trying to train it to understand as many things as possible in the world. And then once the AI is really good at modeling a wide variety of interesting problems, then you fine tune the model on solving the particular problem that you care about, and the AI will get a lot of benefit from that. Pre training. So, yeah, so the first step nowadays is to pre train the model on a universal problem. And the universal problem that the entire industry has converged on is to pre train the model on large scale language modeling, basically to be able to imitate text. Which text? To imitate all the texts, all the texts that humanity has ever written may be broadly construed. If the text has images in it, those images can be encoded as little pieces of text, little patches of image words. If there are videos, we could also similarly boil the videos down to a bunch of tokens, and we can train an AI to be able to imitate any content that's been put together by a human in the past.
Jasia Monk
And tell me what training means in a technical, perhaps semi technical sense. Presumably in this sense, when a baby is being trained, what we mean is that the baby has eyes and ears and looks around the world, and the information that's flooding through its brain somehow gets encoded by some mechanistic form into these neurons. And so over time, the kind of stimuli that the baby sees and receives start to train the neural network that is a baby's brain. I'm going to try and do the analogy here for the AI system and say, presumably you can have this neural network and you're just throwing all of this text at it, and somehow that shapes people neural network in a way that may or may not be analogous. What does it mean to train exactly?
David Bau
Yeah. So to train a neural network is actually really simple. So first you need to have a goal, and then once you have a goal, all you do is you expose the neural network to challenges. You give it inputs, and then you have it produce outputs. And then you ask did the output achieve that goal or not? And sometimes the output will have achieved the goal, and sometimes the output will not have achieved the goal. And if it did achieve the goal, then whatever computation the neural network happened to do, in that instance, you strengthen all those neural network connections that led to this positive outcome. If the network didn't achieve the goal, then you go to that computation and you just slightly weaken all of those neural connections that led to this bad outcome. You weaken them, you reverse them, and each time you don't make a huge change in the network. You might change everything by 1% or by a tenth of a percent. But after you've done this thousands of times or millions or billions of times, eventually the network will converge to a pattern of computation that starts being correct more often, that starts being incorrect less often, and that gets increasingly sophisticated in solving harder and harder instances of the goal solving over time. And so this whole process is called backpropagation or gradient descent. And it's really the backbone of, of how machine learning works. It's a deceptively simple process. A really primitive version of it was invented in the 1950s, and then more sophisticated versions of it were developed in the 1980s. And it's such an important process that it remains a really active area of research today. But fundamentally, the technique of backpropagation is really the same as what we've been doing since the 1980s. We've just been making little tweaks on it over time.
Jasia Monk
Very simple Question. So to have this backpropagation mechanism, presumably you have to know when the system is doing something good or bad, right or wrong. But we're not talking here about the post training. I don't think we're talking here about how once you have a large language model that basically has been trained, you then go and give it feedback, depending on its output, et cetera. That's a kind of different thing. So how does the model know when it's doing something right or wrong? It's saying, hey, this is a cat, or something much more complicated. Here's a simple sentence. How does it know this is a good sentence or this is a bad sentence? Actually, this wasn't a cat, it was a dog.
David Bau
Wonderful. So. So the distinction that you're drawing is one of the big fundamental insights, which is the distinction between what's called supervised training and unsupervised training. So in supervised training, you have a pretty clear idea of the problem that you want the network to solve. And I need the network to tell the difference between good restaurant reviews and bad restaurant reviews. And so let me just show the network what good ones are and what bad ones are, and then punish the network every time it makes the wrong choice. So that's the way that we conceived of AI training for a long time. But the problem with it is it's expensive to collect that training data. You have to make all these human assessments. You have to make these judgments about the problem that you want to solve. And so there's only so far that you can go with it. What the big innovation is, is to introduce a different type of goal, which is called an unsupervised training problem. So an unsupervised training problem is a goal where you don't need to label the data. You don't need a person to tell you that this was the right thing or the wrong thing. You come up with a goal that the AI can pursue that is more natural in the world or more ubiquitous. And so language modeling is an unsupervised training goal. This multiple choice question of predicting the next word doesn't require a human expert to come in and label the data saying, this is the right word, that's the wrong word. All we need to do is gather a bunch of text. You know, long ago, Shakespeare said that this was the right next word. Well, we don't need another expert we can trust. Shakespeare. You know, what Shakespeare said was the right next word at that moment is the right one. The New York Times might have said that in Some other context, such and such was the right next word. Or some blogger on the Internet, some random person putting together a webpage, they already made a decision about what the right next word was. And we can have a language model judge itself based on all the text that was written without having to hire a separate new expert to train it on all these things.
Jasia Monk
So in supervised learning, right, I have a database with a hundred positive reviews and 100 negative reviews. Probably it's more like million, but let's say 100, right?
David Bau
Sure.
Jasia Monk
And so you give it one of these reviews, and I just, in the database have a data point that says something like positive or negative. And that's just generated by humans. At some point, some human looked over this and just classified these 200 reviews in a positive and negative. And so the system judges itself against the quote, unquote objective human judgment that is encoded in the thing that it's checking itself against. Right?
David Bau
That's right.
Jasia Monk
How does unsupervised learning sort of check itself at the end? Like you're saying, oh, well, Shakespeare or some blogger, whoever the New York Times journalist has decided what the right next word is. But since this LLM is creating a sentence that's never been there before in the human language in many contexts, how does it know whether it is similar to the kind of word that Shakespeare or the New York Times or the blogger would have written next?
David Bau
So what you typically do. So, okay, I guess there's two things to, to clarify. So one is that one of the fundamental ways, one of the fundamental insights that we came up with to make unsupervised learning work is to recognize that there is no one right answer. What is the next word here? Well, if you had different people confronting the same situation, then even if they're very intelligent and very human, they might choose different words. And you're more accurate thinking of what the right next word is as a distribution of possibilities, maybe 30% of the time you would have chosen this word. 10% of the time you would have chosen some other word. Maybe in the remaining 60% of the time, there's a wide variety of other choices you could have made the thing to say at this moment. And really what you're training, what we discovered was that the right thing to train the AI to do was to understand and model this probability distribution as accurately as possible. Instead of just getting the next word right, it needs to get the probabilities right as much as possible. And there are math ways of writing this down, but that's why these things Are probabilistic machines. It's because they don't output single choices. They output an assessment on what they think the probability should be. Okay, so now how do you know if this is right or not? Well, you know, what we do is we model these things. We measure these things on what's called a holdout set. And it's really simple. All you do is you take a bunch of text that would have been part of your training data, and then you separate it out as a holdout, as a quiz, and you tell the AI system, well, you can train on all this data, but not these 10 pages. These 10 pages are different, and you don't get a chance to have ever seen them. And then after all the training is done, then we go to the model and we ask it, well, look at these 10 pages. Let me give you the first hundred words of the first page, and you tell me what you think the probabilities of the next word is.
Jasia Monk
And the closer it gets to actually predicting the rest of the passage, the better it's learned.
David Bau
That's correct. That's correct. And so this holdout test has for many years been sort of the gold standard for how you measure the success of a machine learning model. Can it correctly predict what the answers are on a piece of data that you held out from training?
Jasia Monk
Okay, so you've done all of this. And this, in a sense, is only the first step of what a lot of models do at the moment. Right. Because then you also, if I understand it rightly, have a bunch of post training or whatever the right term for it is, in which you have a model do stuff, and you give it upwards or downwards, you give it positive or negative reinforcement depending on what it does. How is that different from what we've talked about so far? And how is it that that changes? I mean, you've had this trained thing, you have this huge neural network. And then how is it that this positive or negative reinforcement somehow changes the physical structure of this thing? Because surely it needs to do that for it to change and learn over time and become better adjusted.
David Bau
That's right. The problem with unsupervised training is that the model doesn't learn how to do any one useful thing in particular. So let me give you an example of what kind of thing comes out of unsupervised language modeling. If you go to an unsupervised language model and you try to have a conversation with it, if you say, oh, language model, please tell me what's the capital of Vermont? What do you want the language model to say? You want the language model to say, oh, what? What? What a great question. That's that a lot of people don't know the capital of Vermont, but it's Montpelier. You know, maybe here's a way to remember that. Here's a little bit of information about Montpelier. But if you go to an unsupervised language model and you ask it, what is the capital of Vermont? It will answer you by predicting what it thinks the most likely next word is. And it will say, what is the capital of Colorado? What is the capital of Maine? What is the capital of Wyoming? What is the capital of New York? Right.
Jasia Monk
Because normally when you look at a text, the next word is not necessarily montpellier. It might be in certain kind of contexts, if it's a dialogue in a novel and the person has the right answer. But in other kind of contexts, you may have a list of questions, you may have been using it as an example of philosophical text, et cetera, et cetera.
David Bau
Indeed. And so if you really train the model on all the world's texts, then the most common situation, the most common context for asking a question would be a book of questions. And it will just continue writing that book of questions and it will keep on inventing more and more questions that are similar to the question that you asked. And it's a pretty dissatisfying experience.
Jasia Monk
It sounds like an annoying old philosopher. What is the capital of Vermont? What is the capital of New Mexico?
David Bau
Exactly. It can be pretty fun, but it's not that useful. Now what you can do though, is you can go to one of these pre trained language models and you can say, you know what, it's great that you can imitate every book in the library, but let me give you a set of books that I would like you to be especially good at imitating. And what are these books? Well, I'll write these books. This is a set of books which I've just hired a bunch of professional authors to write. And they are a collection of 100,000 conversations, pieces of dialogue, which are examples of people asking questions and getting their questions answered in a really nice and helpful way. Right? And if you go back to this pre trained language model and you train it on dialogue text, just fine tuning it, which basically means taking these 100,000 pages and just making it the last thing that the networks were trained on, the last thing that it learned, the last thing that was rewarded and published, you know, punished for, then the language models will acquire this bias. They'll tend to imitate the last thing that they saw and they'll now change. If you ask it, what's the capital of Vermont? It will tend to give you a helpful, useful answer. It'll answer you in dialogue, which is remarkable. So this process is called instruction fine tuning. People collect together data sets of useful instruction following behavior. Please answer this question for me. Please do this thing for me. Please do that thing for me. And examples of an AI doing the thing in a smart, useful way. And if you went to a transformer and you just trained it on these thousands of conversations, it might understand the grammar of what you're doing, but it wouldn't be very helpful. It wouldn't know much about the world. But if you go to a large language model that has been trained to imitate every book that's ever been written, every blog post that's ever been posted on the Internet, and then you just as a, as a final fine tuning, you, you go to it and you effectively, you show it some dialogue and you say, you know, what I really want you to learn is to follow this format. You know, while you're, while you're doing next word prediction, you know, just, just do it in a way that answers questions. Then you get this profound thing that happens, which is that not only does it form, does it follow the form of the dialogue, but it seems to be able to exploit this vast array of knowledge that it acquired during pre training of how to imitate all the works of Shakespeare. Now if you go and you ask it a question about Shakespeare, the model tends to be able to answer it, even if the specific dialogue examples you have said nothing about Shakespeare. It will follow the dialogue form, but it will draw on the knowledge that it acquired earlier in pre training. And that's really the magic of modern machine learning, of modern language modeling, is to split. So that's what you do.
Jasia Monk
Thank you so much for listening to this episode of the Good Fight. In the rest of this episode, David
Podcast Narrator
and I go deeper into how it is that Frontier Labs try to make AI models compliant, how it is that they try to make sure that your AI doesn't try to take over the world or try to get inside your psyche and make you do terrible things, can't be used to create bioweapons and so on. We also go back with I think, a firmer footing in the technology now to the much debated question of whether
Jasia Monk
the AI model you use is actually
Podcast Narrator
intelligent and what it would mean for it to be intelligent.
Jasia Monk
To listen to the rest of this
Podcast Narrator
episode to support this podcast.
Jasia Monk
Perhaps.
Podcast Narrator
I feel good that you're putting your support into a project that gives you some joy in your week and does something important in the world. Please go to yashamonk.substack.com the good fight for 25% off. That's Yashamunk. Go substack.com thegood fight.
David Bau
Sam.
Podcast Summary: The Good Fight
Episode: David Bau on How Artificial Intelligence Works
Host: Yascha Mounk
Date: September 30, 2025
In this episode, Yascha Mounk sits down with David Bau, a former Google engineer and current computer science professor at Northeastern University, for a clear, nuanced “AI 101” for a public audience. Together, they dig into what exactly makes modern AI—especially large language models (LLMs)—tick, how these paradigms differ from earlier types of machine learning, and why interpretability and transparency in these systems are such pressing issues. Along the way, Bau breaks down neural networks from first principles, explains the innovation behind transformer architectures, and details how contemporary models are trained to achieve the impressively broad and flexible behaviors we now see.
“We're training a whole new generation of computer scientists to be comfortable with this idea that they shouldn't really look inside these complicated black boxes.”
David Bau expresses concern that modern engineering is moving away from understanding systems deeply, instead embracing the opacity (“black box”) of neural networks.
On Black-Box Models:
“We're training a whole new generation of computer scientists to be comfortable with this idea that they shouldn't really look inside these complicated black boxes.” (00:00, David Bau)
Historical Perspective:
“If you invented a picture of a pointy-eared dog and you gave it to one of these classifiers...it would say, 'ah, that's clearly a cat.'” (07:44, Bau)
Transformers Revolution:
“The transformer really has made it possible for you to teach the neural network things by telling it something, rather than being in control of the whole training process.” (27:05, Bau)
On Learning Processes:
“To train a neural network is actually really simple. So first you need to have a goal... then you expose the neural network to challenges... and you strengthen or weaken neural connections based on positive or negative outcomes.” (37:53, Bau)
Unsupervised Learning:
“Language modeling is an unsupervised training goal. This multiple choice question of predicting the next word doesn't require a human expert to come in and label the data...” (40:53, Bau)
Fine-Tuning and Instruction:
“This process is called instruction fine-tuning...and if you went to a large language model...and show it dialogue...then you get this profound thing that happens...” (50:32, Bau)
The conversation is clear, accessible, but frank about both the power and current drawbacks of modern AI. Bau's tone is thoughtful—appreciative of technological progress but concerned about the profession’s detachment from understanding the very tools it builds. The episode demystifies LLMs in plain language while urging the importance of interpretability—a call to arms for more responsible AI engineering and a better-informed public debate about the technology’s capabilities and risks.
For more on this discussion (including questions of AI safety and whether AIs are “truly intelligent”), listen to the full episode.