David Bau on How Artificial Intelligence Works

Podcast Summary: The Good Fight Episode: David Bau on How Artificial Intelligence Works
Host: Yascha Mounk
Date: September 30, 2025

Overview

In this episode, Yascha Mounk sits down with David Bau, a former Google engineer and current computer science professor at Northeastern University, for a clear, nuanced “AI 101” for a public audience. Together, they dig into what exactly makes modern AI—especially large language models (LLMs)—tick, how these paradigms differ from earlier types of machine learning, and why interpretability and transparency in these systems are such pressing issues. Along the way, Bau breaks down neural networks from first principles, explains the innovation behind transformer architectures, and details how contemporary models are trained to achieve the impressively broad and flexible behaviors we now see.

Key Discussion Points & Insights

1. The Black Box Problem in AI Engineering

Opening Thought (00:00)
David Bau laments the new engineering culture in machine learning:

“We're training a whole new generation of computer scientists to be comfortable with this idea that they shouldn't really look inside these complicated black boxes.”
David Bau expresses concern that modern engineering is moving away from understanding systems deeply, instead embracing the opacity (“black box”) of neural networks.

2. Historical Shift: From Classifiers to Generative Models

Classifiers vs. Generative Models (03:06, 04:23)
- Classifiers: Early AI models focused on narrow yes/no or multi-category decisions (e.g., is this a cat or a dog? Is a review positive or negative?).
  - “You could tell as models to tell you the difference between a picture of a cat and a picture of a dog... That was considered an amazing feat at the time.” (05:31, David Bau)
- Limitations: Classifiers take shortcuts and may not develop full world understanding—they might judge a “cat” just by the shape of the ears.
  - "If you invented a picture of a pointy-eared dog...it would say, 'ah, that's clearly a cat.'" (07:44, Bau)
- Generative Models (LLMs): Shifted to imitation and generation of language, allowing for broader, more flexible problem-solving.

3. Neural Networks Explained

What are “Neurons” in AI? (09:32)
- Bau compares digital neurons to biological ones:
  - Inputs (converted to numbers) feed into digital “neurons”; each neuron performs a simple weighted sum, then passes results to the next layer.
  - “Each neuron is an extremely simple step. It's just adding things and then looking at the answer and then producing an output.” (11:57, Bau)
- Why Do Simple Calculations Work? (13:32)
  - Despite simplicity and opacity, assembling and tuning a massive number of these neurons allows powerful, emergent behaviors.
  - “Neural networks work so well...that the field has really become comfortable with this idea that we should just use these black boxes.” (13:32, Bau)
  - He emphasizes that the core technique—backpropagation—has been used since the 1980s.

4. Why AI is a Black Box and Why That’s Dangerous

Interpretability and Responsibility (13:32–18:24)
- The field has accepted “black box” neural networks, but Bau is alarmed by the loss of clarity:
  - "We're training a whole new generation...that it's not something that is understandable or their responsibility to understand. And I think this is a fundamental error." (13:32, Bau)
- Interpretability Approaches
  - Simplification: Making models small or simple enough to interpret directly.
  - Post-hoc Interpretability: Bau’s research: analyzing complex, already-trained networks “like a biologist” attending to emergent systems, to infer what parts are doing after the fact (18:24).

5. What Made LLMs Possible? The Scale and the Transformer Innovation

From Small Classifiers to Large LLMs (20:47)
- Scale: LLMs are technically "just a big classifier," with more inputs/outputs; instead of picking cat/dog, it chooses the next word from 50,000+ options given vast input context.
  - "From a technological point of view...a language model is a classifier. It solves a slightly more open-ended classification problem." (21:45, Bau)
- Transformers
  - Key Innovation: The transformer architecture allows models to maintain and use short-term memory through “attention,” making long, coherent responses and contextual conversation possible (24:04).
    - “The fundamental thing that transformers do is that they introduce a form of short term memory that we call attention.” (24:04, Bau)
  - Before Transformers: Recurrent Neural Networks (RNNs) like LSTMs tried this but were slow and hard to scale.
  - Transformer’s Context Window:
    - Allows the model to “see” a fixed number of previous words (e.g., 1000), but memory decays with distance (29:05, 33:10).
    - “There’s a hard context window where the transformer has no hope of understanding things that are beyond its context window. And then there’s a soft decay of its memory.” (33:10, Bau)

6. The Training Process, Step by Step

Two Main Phases: Pre-training and Fine-tuning (34:15)
- Pre-training:
  - First, expose the network to the broadest possible problem (e.g., predicting the next word in all the world’s text).
    - “The way to make an AI is to begin by trying to train it to understand as many things as possible in the world.” (34:24, Bau)
- Fine-tuning:
  - Adapt the model toward a specific personality or set of tasks after broad training (34:15-37:08).

7. What Does “Training” Actually Mean?

Mechanics of Learning (37:53)
- The system attempts outputs, receives rewards/punishments (strengthening/weaking neuron connections).
  - “If the network didn't achieve the goal, then you go to that computation and you just slightly weaken all of those neural connections that led to this bad outcome...eventually, the network will converge to a pattern of computation that starts being correct more often.” (37:53, Bau)
- This process is “backpropagation” or “gradient descent.”
Supervised vs Unsupervised Learning (40:53)
- Supervised: Known labels (e.g., “positive” or “negative” reviews) given by people.
- Unsupervised: Models learn patterns from vast, unlabeled data—such as predicting the next word in a sentence—by modeling probability distributions.
  - “All we need to do is gather a bunch of text. You know, long ago, Shakespeare said that this was the right next word… We can have a language model judge itself based on all the text that was written without having to hire a separate new expert…” (40:53, Bau)

8. How Do We Measure AI’s Success?

Holdout Sets (44:32)
- Set aside unseen text before training; after training, see if the AI can accurately predict what comes next in these new passages.
  - “Can it correctly predict what the answers are on a piece of data that you held out from training?” (47:09, Bau)

9. Why Post-training (Fine-Tuning) Is Needed

Limitation of Pure Unsupervised Training (48:23)
- Without specific instruction, the model just predicts what’s most common in the training data—it doesn’t inherently answer questions or follow instructions.
  - “If you go to an unsupervised language model and you ask it, what is the capital of Vermont? It will answer you by predicting what it thinks the most likely next word is. And it will say, what is the capital of Colorado? What is the capital of Maine?...” (48:23, Bau)
- Instruction fine-tuning: Train on specifically curated examples ("dialogues") to bias the model to follow human instructions or answer in helpful ways.
  - “This process is called instruction fine-tuning...and if you went to a large language model that has been trained to imitate every book...and then you just as a final fine tuning...you show it some dialogue and you say, you know, what I really want you to learn is to follow this format...then you get this profound thing that happens...” (50:32, Bau)

Notable Quotes & Memorable Moments

On Black-Box Models:

“We're training a whole new generation of computer scientists to be comfortable with this idea that they shouldn't really look inside these complicated black boxes.” (00:00, David Bau)
Historical Perspective:

“If you invented a picture of a pointy-eared dog and you gave it to one of these classifiers...it would say, 'ah, that's clearly a cat.'” (07:44, Bau)
Transformers Revolution:

“The transformer really has made it possible for you to teach the neural network things by telling it something, rather than being in control of the whole training process.” (27:05, Bau)
On Learning Processes:

“To train a neural network is actually really simple. So first you need to have a goal... then you expose the neural network to challenges... and you strengthen or weaken neural connections based on positive or negative outcomes.” (37:53, Bau)
Unsupervised Learning:

“Language modeling is an unsupervised training goal. This multiple choice question of predicting the next word doesn't require a human expert to come in and label the data...” (40:53, Bau)
Fine-Tuning and Instruction:

“This process is called instruction fine-tuning...and if you went to a large language model...and show it dialogue...then you get this profound thing that happens...” (50:32, Bau)

Important Timestamps

00:00: David Bau opens on the dangers of AI black boxes.
03:06: Bau on difference between traditional classifiers vs. generative LLMs.
09:32: Explaining digital neurons and neural network basics.
13:32: Why neural networks work & the “black box” problem.
18:24: Post-hoc interpretability and explainability in AI.
20:47–24:04: What makes LLMs possible: scale and the “transformer” architecture.
27:05–29:05: How transformers enable real conversational context.
34:15: AI model training steps: pre-training and fine-tuning.
37:53: The mechanics of neural network training ("backpropagation").
40:53–44:32: Supervised vs. unsupervised learning and holdout sets.
48:23–50:32: Why fine-tuning and explicit instruction is required for usefulness.

Overall Tone and Takeaway

The conversation is clear, accessible, but frank about both the power and current drawbacks of modern AI. Bau's tone is thoughtful—appreciative of technological progress but concerned about the profession’s detachment from understanding the very tools it builds. The episode demystifies LLMs in plain language while urging the importance of interpretability—a call to arms for more responsible AI engineering and a better-informed public debate about the technology’s capabilities and risks.

For more on this discussion (including questions of AI safety and whether AIs are “truly intelligent”), listen to the full episode.

Podcast Summary: The Good Fight Episode: David Bau on How Artificial Intelligence Works
Host: Yascha Mounk
Date: September 30, 2025

Overview

Key Discussion Points & Insights

1. The Black Box Problem in AI Engineering

Opening Thought (00:00)
David Bau laments the new engineering culture in machine learning:

“We're training a whole new generation of computer scientists to be comfortable with this idea that they shouldn't really look inside these complicated black boxes.”
David Bau expresses concern that modern engineering is moving away from understanding systems deeply, instead embracing the opacity (“black box”) of neural networks.

2. Historical Shift: From Classifiers to Generative Models

Classifiers vs. Generative Models (03:06, 04:23)
- Classifiers: Early AI models focused on narrow yes/no or multi-category decisions (e.g., is this a cat or a dog? Is a review positive or negative?).
  - “You could tell as models to tell you the difference between a picture of a cat and a picture of a dog... That was considered an amazing feat at the time.” (05:31, David Bau)
- Limitations: Classifiers take shortcuts and may not develop full world understanding—they might judge a “cat” just by the shape of the ears.
  - "If you invented a picture of a pointy-eared dog...it would say, 'ah, that's clearly a cat.'" (07:44, Bau)
- Generative Models (LLMs): Shifted to imitation and generation of language, allowing for broader, more flexible problem-solving.

3. Neural Networks Explained

What are “Neurons” in AI? (09:32)
- Bau compares digital neurons to biological ones:
  - Inputs (converted to numbers) feed into digital “neurons”; each neuron performs a simple weighted sum, then passes results to the next layer.
  - “Each neuron is an extremely simple step. It's just adding things and then looking at the answer and then producing an output.” (11:57, Bau)
- Why Do Simple Calculations Work? (13:32)
  - Despite simplicity and opacity, assembling and tuning a massive number of these neurons allows powerful, emergent behaviors.
  - “Neural networks work so well...that the field has really become comfortable with this idea that we should just use these black boxes.” (13:32, Bau)
  - He emphasizes that the core technique—backpropagation—has been used since the 1980s.

4. Why AI is a Black Box and Why That’s Dangerous

Interpretability and Responsibility (13:32–18:24)
- The field has accepted “black box” neural networks, but Bau is alarmed by the loss of clarity:
  - "We're training a whole new generation...that it's not something that is understandable or their responsibility to understand. And I think this is a fundamental error." (13:32, Bau)
- Interpretability Approaches
  - Simplification: Making models small or simple enough to interpret directly.
  - Post-hoc Interpretability: Bau’s research: analyzing complex, already-trained networks “like a biologist” attending to emergent systems, to infer what parts are doing after the fact (18:24).

5. What Made LLMs Possible? The Scale and the Transformer Innovation

From Small Classifiers to Large LLMs (20:47)
- Scale: LLMs are technically "just a big classifier," with more inputs/outputs; instead of picking cat/dog, it chooses the next word from 50,000+ options given vast input context.
  - "From a technological point of view...a language model is a classifier. It solves a slightly more open-ended classification problem." (21:45, Bau)
- Transformers
  - Key Innovation: The transformer architecture allows models to maintain and use short-term memory through “attention,” making long, coherent responses and contextual conversation possible (24:04).
    - “The fundamental thing that transformers do is that they introduce a form of short term memory that we call attention.” (24:04, Bau)
  - Before Transformers: Recurrent Neural Networks (RNNs) like LSTMs tried this but were slow and hard to scale.
  - Transformer’s Context Window:
    - Allows the model to “see” a fixed number of previous words (e.g., 1000), but memory decays with distance (29:05, 33:10).
    - “There’s a hard context window where the transformer has no hope of understanding things that are beyond its context window. And then there’s a soft decay of its memory.” (33:10, Bau)

6. The Training Process, Step by Step

Two Main Phases: Pre-training and Fine-tuning (34:15)
- Pre-training:
  - First, expose the network to the broadest possible problem (e.g., predicting the next word in all the world’s text).
    - “The way to make an AI is to begin by trying to train it to understand as many things as possible in the world.” (34:24, Bau)
- Fine-tuning:
  - Adapt the model toward a specific personality or set of tasks after broad training (34:15-37:08).

7. What Does “Training” Actually Mean?

Mechanics of Learning (37:53)
- The system attempts outputs, receives rewards/punishments (strengthening/weaking neuron connections).
  - “If the network didn't achieve the goal, then you go to that computation and you just slightly weaken all of those neural connections that led to this bad outcome...eventually, the network will converge to a pattern of computation that starts being correct more often.” (37:53, Bau)
- This process is “backpropagation” or “gradient descent.”
Supervised vs Unsupervised Learning (40:53)
- Supervised: Known labels (e.g., “positive” or “negative” reviews) given by people.
- Unsupervised: Models learn patterns from vast, unlabeled data—such as predicting the next word in a sentence—by modeling probability distributions.
  - “All we need to do is gather a bunch of text. You know, long ago, Shakespeare said that this was the right next word… We can have a language model judge itself based on all the text that was written without having to hire a separate new expert…” (40:53, Bau)

8. How Do We Measure AI’s Success?

Holdout Sets (44:32)
- Set aside unseen text before training; after training, see if the AI can accurately predict what comes next in these new passages.
  - “Can it correctly predict what the answers are on a piece of data that you held out from training?” (47:09, Bau)

9. Why Post-training (Fine-Tuning) Is Needed

Limitation of Pure Unsupervised Training (48:23)
- Without specific instruction, the model just predicts what’s most common in the training data—it doesn’t inherently answer questions or follow instructions.
  - “If you go to an unsupervised language model and you ask it, what is the capital of Vermont? It will answer you by predicting what it thinks the most likely next word is. And it will say, what is the capital of Colorado? What is the capital of Maine?...” (48:23, Bau)
- Instruction fine-tuning: Train on specifically curated examples ("dialogues") to bias the model to follow human instructions or answer in helpful ways.
  - “This process is called instruction fine-tuning...and if you went to a large language model that has been trained to imitate every book...and then you just as a final fine tuning...you show it some dialogue and you say, you know, what I really want you to learn is to follow this format...then you get this profound thing that happens...” (50:32, Bau)

Notable Quotes & Memorable Moments

On Black-Box Models:

“We're training a whole new generation of computer scientists to be comfortable with this idea that they shouldn't really look inside these complicated black boxes.” (00:00, David Bau)
Historical Perspective:

“If you invented a picture of a pointy-eared dog and you gave it to one of these classifiers...it would say, 'ah, that's clearly a cat.'” (07:44, Bau)
Transformers Revolution:

“The transformer really has made it possible for you to teach the neural network things by telling it something, rather than being in control of the whole training process.” (27:05, Bau)
On Learning Processes:

“To train a neural network is actually really simple. So first you need to have a goal... then you expose the neural network to challenges... and you strengthen or weaken neural connections based on positive or negative outcomes.” (37:53, Bau)
Unsupervised Learning:

“Language modeling is an unsupervised training goal. This multiple choice question of predicting the next word doesn't require a human expert to come in and label the data...” (40:53, Bau)
Fine-Tuning and Instruction:

“This process is called instruction fine-tuning...and if you went to a large language model...and show it dialogue...then you get this profound thing that happens...” (50:32, Bau)

Important Timestamps

00:00: David Bau opens on the dangers of AI black boxes.
03:06: Bau on difference between traditional classifiers vs. generative LLMs.
09:32: Explaining digital neurons and neural network basics.
13:32: Why neural networks work & the “black box” problem.
18:24: Post-hoc interpretability and explainability in AI.
20:47–24:04: What makes LLMs possible: scale and the “transformer” architecture.
27:05–29:05: How transformers enable real conversational context.
34:15: AI model training steps: pre-training and fine-tuning.
37:53: The mechanics of neural network training ("backpropagation").
40:53–44:32: Supervised vs. unsupervised learning and holdout sets.
48:23–50:32: Why fine-tuning and explicit instruction is required for usefulness.

Overall Tone and Takeaway

For more on this discussion (including questions of AI safety and whether AIs are “truly intelligent”), listen to the full episode.

wavePod

Summary

Overview

Key Discussion Points & Insights

1. The Black Box Problem in AI Engineering

2. Historical Shift: From Classifiers to Generative Models

3. Neural Networks Explained

4. Why AI is a Black Box and Why That’s Dangerous

5. What Made LLMs Possible? The Scale and the Transformer Innovation

6. The Training Process, Step by Step

7. What Does “Training” Actually Mean?

8. How Do We Measure AI’s Success?

9. Why Post-training (Fine-Tuning) Is Needed

Notable Quotes & Memorable Moments

Important Timestamps

Overall Tone and Takeaway

Transcript

Summary

Overview

Key Discussion Points & Insights

1. The Black Box Problem in AI Engineering

2. Historical Shift: From Classifiers to Generative Models

3. Neural Networks Explained

4. Why AI is a Black Box and Why That’s Dangerous

5. What Made LLMs Possible? The Scale and the Transformer Innovation

6. The Training Process, Step by Step

7. What Does “Training” Actually Mean?

8. How Do We Measure AI’s Success?

9. Why Post-training (Fine-Tuning) Is Needed

Notable Quotes & Memorable Moments

Important Timestamps

Overall Tone and Takeaway