Podcast Summary: The System Behind Self-Driving – Waymo’s Dmitri Dolgov (The a16z Show, April 17, 2026)

Episode Overview

In this episode, Waymo co-CEO Dmitri Dolgov joins guest host John Collison to explore the systems and philosophy powering Waymo’s self-driving cars as they scale to hundreds of thousands of autonomous rides each week. The conversation covers Waymo’s sensor architecture, the centrality of AI and foundation models, the evolution from prototyping to global deployment, the intricacies of ride experience, operational challenges, product design, and the future of fully autonomous driving. Dolgov also reflects on leadership, Google’s innovation culture, and the “nines challenge” of making self-driving safe and real everywhere.

Key Discussion Points & Insights

1. From Research to Real-World Scale

Dolgov: Waymo has transitioned from deep tech R&D into rapid, global scaling and deployment, now providing nearly half a million fully autonomous rides per week in 11 US cities (25:04).
Quote (@00:00):
"We've clearly moved past the stage of scientific research and kind of deep core technology development to this new phase of accelerated global scaling and deployment." – Dmitri Dolgov

2. Waymo’s Technical Architecture: Sensors, Local Compute, and Models

Sensor Suite:
Three core sensing modalities—cameras, LiDAR (lasers), and radar—providing complementary, 360-degree coverage. Microphones are also used, but less central (04:34).
Local Inference:
All real-time driving inference happens on the car; the cloud is only used for non-driving tasks (e.g., after-ride cleaning checks, lost item detection) (06:50).
Quote (@04:34):
"We use three different sensing modalities... cameras, lighters or lasers, and there are raiders. Those are the primary ones... They all have 360 degree coverage around the vehicle." – Dmitri Dolgov

3. AI & Foundation Model Ecosystem

Waymo’s Approach:
Centered on a large, offboard foundation model that “understands how the physical world works and the social aspects of driving” (08:13). This core is fine-tuned into three chief “teachers”: the driver (in-car AI), the simulator (training and evaluation), and the critic (evaluates driving and finds edge cases).
Downstream Students:
Teachers are then distilled into smaller, faster models that operate in real time in the car (11:03).
Quote (@09:46):
"The job of the critic is to find interesting events and then be opinionated about what's good behavior and what's bad behavior." – Dmitri Dolgov

4. End-to-End Learning vs. Modular Systems

E2E vs. Intermediate Representations:
Debate about "pixels in, trajectories out" is oversimplified. While end-to-end models can handle normal cases, safety, scale, and rare edge cases demand intermediate representations—such as objects, roads, signs—for validation and reward specification (14:29, 18:02).
Simulation:
Simulating “just pixels to trajectories” is inefficient; using richer world models is more scalable and enables validation and safety mechanisms (20:45).
Quote (@16:15):
"[Off-the-shelf] model which has nothing to do with driving... in the nominal case, drive pretty darn well. Which is mind blowingly impressive. ...It's orders of magnitude away from what you need." – Dmitri Dolgov

5. The Hardest Problems in Autonomous Driving

Social/Interactive Complexity:
Driving is like a “multi-agent conversation”—social cues, context, geometric and semantic interactions.
Safety and Reward Functions:
Not just about reaching destinations, but driving smoothly, safely, and predictably within social norms (21:37).
Quote (@21:37):
"Safety is the primary focus. But of course we also want to be a smooth driver... and we also want to be a predictable well-behavement so that it can nicely fit into the whole social ecosystem of our roadway." – Dmitri Dolgov
Edge Cases:
Drop-offs, special urban quirks, heavy weather (snow, fog), and unpredictable objects (e.g., debris or flatbed trucks) remain nuanced engineering challenges (23:06, 26:39, 27:12).

6. Scaling and Expansion: Hardware and Global Growth

Gen 4 vs. Gen 5/6:
Gen 4 was specialized for Chandler, AZ; Gen 5 involved a big leap in data, hardware, and a move to central AI models, vastly improving generalizability (28:04, 30:12). Gen 6 is a purpose-built autonomous vehicle with a new, cheaper, more capable sensor stack, flat floors, sliding doors—designed entirely for the passenger (32:14, 36:03).
Hardware Philosophy:
Progress in perception outpaced custom car platforms—Waymo scaled successfully using retrofitted consumer vehicles, then moved to custom vehicles once confidence and de-risking was achieved (34:08, 35:08).
Quote (@36:03):
"The sensors, the hardware, the self driving hardware they're putting on. The Ojai vehicle is the sixth generation. It is very different from the fifth generation. It is simpler, it is more capable, it is much lower cost." – Dmitri Dolgov

7. Sensor Fusion: LiDAR vs. Radar vs. Camera

Each Sensor’s Role:
LiDAR offers high-res mapping but can struggle in fog; radar is lower-res but sees through adverse weather; cameras provide visual details. Fusing modalities enhances robustness (38:16, 39:05, 39:24).
No “Sensor Switching”:
All modalities are continuously fused in encoders for a holistic, robust view. There’s no explicit switch-over between sensors based on conditions (40:05).
Quote (@39:07):
"Radar has much lower resolution but because of the physics of degrades, much better in adverse weather conditions. So fog, snow, heavy rain." – Dmitri Dolgov

8. Emergent Behavior & Notable AI Moments

Emergent Intelligence:
Dolgov recounts when the system detected a pedestrian hidden behind a bus by bouncing LiDAR under the vehicle—something he didn’t expect, demonstrating powerful sensor fusion and representation (43:51).
Quote (@45:07):
"You saw the pedestrian on the other side of the bus... what actually turned out was happening is that our peripheral lighters bounced under the bus and there was just a little bit of very, very noisy reflection of the movement of the person's feet." – Dmitri Dolgov

9. Operational Infrastructure at Scale

Depot Automation:
Fleets autonomously report for charging/cleaning; most processes are automated except physical cleaning and plugging in chargers (52:01).
Customer Behavior:
Most customers treat Waymo vehicles well, though outliers exist (e.g., college towns at night) (54:30).

10. Product, Expansion, and the Future of Autonomy

Scaling Metrics:
As of recording: ~3,000 cars, ~500,000 rides/week, >4 million miles/week, 11 US cities (47:25). Rapid scaling—most recent expansion: 4 new cities in 1 day (47:55).
Personal Cars:
Waymo expects, in the future, that models will be available for personally owned vehicles, not just ride hailing in dense locations (55:44).
Second-Order Effects:
More autonomous vehicles will lead to smoother traffic, reduced need for parking lots, and transformation of urban space (56:25).
Quote (@58:22):
"Imagine what you can do with your favorite city in the world if you don't have to spend that money, that huge fraction of it, on just keeping these chunks of metal sitting around." – Dmitri Dolgov

11. AI, Industry Evolution, and Google's Culture

The “Nines” Challenge:
The move from research prototypes to ultra-reliable real-world systems is exponentially hard—the last “nine” (decimal place of reliability) is a 10x challenge each time (59:27).
Did Google Start Too Early?
Dolgov says no—iterative learning across “waves” (ImageNet, transformers, VLMs) has been essential. There’s “no silver bullet,” and early investment was necessary for the eventual technological curve (59:27).
Google’s Cultural Edge:
Google’s culture cultivates vision, technical depth, and the stamina to persist through the long arc of R&D experiments (62:10).

Memorable Quotes & Moment Highlights

Building Foundation Models:
"Building a big model that understands how the physical world works and understands the important properties of what it means to drive, the social aspects of driving and what it means to be a good driver as opposed to a bad one."
— Dmitri Dolgov (@00:00, 08:13)
On Emergent AI Moments:
"It just kind of blew my mind."
— Dmitri Dolgov describing the pedestrian-behind-bus detection (@46:11)
Pragmatic Philosophy:
"As long as the sixth generation [vehicle] where it made sense to go and spend all this effort into the custom."
— Dmitri Dolgov (@35:17)
Driver Assist vs. Full Autonomy:
"I see it just as fundamentally two different problems. There's driver assist systems and then there is full autonomy. And I think it's deceptive to think of them as kind of incremental on one spectrum of complexity."
— Dmitri Dolgov (@50:24)

Useful Timestamps for Reference

[04:34] – Sensor architecture and local inference explained
[08:13] – Foundation models, drivers, simulator, and critic roles
[14:29] – End-to-end vs modular debate
[18:02] – The limits of camera-only, imitative learning, need for richer world models
[23:06] – Nuance in drop-off and urban driving challenges
[25:04] – Transition from technology to deployment focus
[28:04] – Hardware platform evolution and scaling to new cities
[32:14, 36:03] – Gen 6 custom vehicle and new sensor stack
[39:07, 40:05] – Sensor fusion in adverse conditions
[43:51] – Emergent behavior: detecting the unseen pedestrian
[47:25] – Live scaling numbers: rides, miles, cities
[52:01] – Operational infrastructure at depots
[56:25] – City/traffic transformation from mass autonomy
[59:27] – The “nines” and why there’s no “magic moment” for autonomy
[62:10] – Google’s role; talent and cultural strengths

Tone & Language

The tone is direct, technical, but approachable—often peppered with genuine enthusiasm (especially from Dolgov) when discussing emergent AI moments and the operational realities of scale. The discussion remains candid about the challenges, trade-offs, and evolution from research prototypes to real-world deployments.

Conclusion

This episode provides an in-depth, highly insightful look at how Waymo’s autonomous driving system works—from hardware, sensors, and architecture to AI models, simulation, and operational scale. Dolgov emphasizes the complexity of not just perceiving the world but fitting into its social norms, and the exponential effort behind each incremental gain in safety and generalizability. The conversation paints a realistic, optimistic, and deeply knowledgeable picture of why, after decades of work, autonomy is poised to transform both technology and the physical layout of our cities.

Podcast Summary: The System Behind Self-Driving – Waymo’s Dmitri Dolgov (The a16z Show, April 17, 2026)

Episode Overview

Key Discussion Points & Insights

1. From Research to Real-World Scale

Dolgov: Waymo has transitioned from deep tech R&D into rapid, global scaling and deployment, now providing nearly half a million fully autonomous rides per week in 11 US cities (25:04).
Quote (@00:00):
"We've clearly moved past the stage of scientific research and kind of deep core technology development to this new phase of accelerated global scaling and deployment." – Dmitri Dolgov

2. Waymo’s Technical Architecture: Sensors, Local Compute, and Models

Sensor Suite:
Three core sensing modalities—cameras, LiDAR (lasers), and radar—providing complementary, 360-degree coverage. Microphones are also used, but less central (04:34).
Local Inference:
All real-time driving inference happens on the car; the cloud is only used for non-driving tasks (e.g., after-ride cleaning checks, lost item detection) (06:50).
Quote (@04:34):
"We use three different sensing modalities... cameras, lighters or lasers, and there are raiders. Those are the primary ones... They all have 360 degree coverage around the vehicle." – Dmitri Dolgov

3. AI & Foundation Model Ecosystem

Waymo’s Approach:
Centered on a large, offboard foundation model that “understands how the physical world works and the social aspects of driving” (08:13). This core is fine-tuned into three chief “teachers”: the driver (in-car AI), the simulator (training and evaluation), and the critic (evaluates driving and finds edge cases).
Downstream Students:
Teachers are then distilled into smaller, faster models that operate in real time in the car (11:03).
Quote (@09:46):
"The job of the critic is to find interesting events and then be opinionated about what's good behavior and what's bad behavior." – Dmitri Dolgov

4. End-to-End Learning vs. Modular Systems

E2E vs. Intermediate Representations:
Debate about "pixels in, trajectories out" is oversimplified. While end-to-end models can handle normal cases, safety, scale, and rare edge cases demand intermediate representations—such as objects, roads, signs—for validation and reward specification (14:29, 18:02).
Simulation:
Simulating “just pixels to trajectories” is inefficient; using richer world models is more scalable and enables validation and safety mechanisms (20:45).
Quote (@16:15):
"[Off-the-shelf] model which has nothing to do with driving... in the nominal case, drive pretty darn well. Which is mind blowingly impressive. ...It's orders of magnitude away from what you need." – Dmitri Dolgov

5. The Hardest Problems in Autonomous Driving

Social/Interactive Complexity:
Driving is like a “multi-agent conversation”—social cues, context, geometric and semantic interactions.
Safety and Reward Functions:
Not just about reaching destinations, but driving smoothly, safely, and predictably within social norms (21:37).
Quote (@21:37):
"Safety is the primary focus. But of course we also want to be a smooth driver... and we also want to be a predictable well-behavement so that it can nicely fit into the whole social ecosystem of our roadway." – Dmitri Dolgov
Edge Cases:
Drop-offs, special urban quirks, heavy weather (snow, fog), and unpredictable objects (e.g., debris or flatbed trucks) remain nuanced engineering challenges (23:06, 26:39, 27:12).

6. Scaling and Expansion: Hardware and Global Growth

Gen 4 vs. Gen 5/6:
Gen 4 was specialized for Chandler, AZ; Gen 5 involved a big leap in data, hardware, and a move to central AI models, vastly improving generalizability (28:04, 30:12). Gen 6 is a purpose-built autonomous vehicle with a new, cheaper, more capable sensor stack, flat floors, sliding doors—designed entirely for the passenger (32:14, 36:03).
Hardware Philosophy:
Progress in perception outpaced custom car platforms—Waymo scaled successfully using retrofitted consumer vehicles, then moved to custom vehicles once confidence and de-risking was achieved (34:08, 35:08).
Quote (@36:03):
"The sensors, the hardware, the self driving hardware they're putting on. The Ojai vehicle is the sixth generation. It is very different from the fifth generation. It is simpler, it is more capable, it is much lower cost." – Dmitri Dolgov

7. Sensor Fusion: LiDAR vs. Radar vs. Camera

Each Sensor’s Role:
LiDAR offers high-res mapping but can struggle in fog; radar is lower-res but sees through adverse weather; cameras provide visual details. Fusing modalities enhances robustness (38:16, 39:05, 39:24).
No “Sensor Switching”:
All modalities are continuously fused in encoders for a holistic, robust view. There’s no explicit switch-over between sensors based on conditions (40:05).
Quote (@39:07):
"Radar has much lower resolution but because of the physics of degrades, much better in adverse weather conditions. So fog, snow, heavy rain." – Dmitri Dolgov

8. Emergent Behavior & Notable AI Moments

Emergent Intelligence:
Dolgov recounts when the system detected a pedestrian hidden behind a bus by bouncing LiDAR under the vehicle—something he didn’t expect, demonstrating powerful sensor fusion and representation (43:51).
Quote (@45:07):
"You saw the pedestrian on the other side of the bus... what actually turned out was happening is that our peripheral lighters bounced under the bus and there was just a little bit of very, very noisy reflection of the movement of the person's feet." – Dmitri Dolgov

9. Operational Infrastructure at Scale

Depot Automation:
Fleets autonomously report for charging/cleaning; most processes are automated except physical cleaning and plugging in chargers (52:01).
Customer Behavior:
Most customers treat Waymo vehicles well, though outliers exist (e.g., college towns at night) (54:30).

10. Product, Expansion, and the Future of Autonomy

Scaling Metrics:
As of recording: ~3,000 cars, ~500,000 rides/week, >4 million miles/week, 11 US cities (47:25). Rapid scaling—most recent expansion: 4 new cities in 1 day (47:55).
Personal Cars:
Waymo expects, in the future, that models will be available for personally owned vehicles, not just ride hailing in dense locations (55:44).
Second-Order Effects:
More autonomous vehicles will lead to smoother traffic, reduced need for parking lots, and transformation of urban space (56:25).
Quote (@58:22):
"Imagine what you can do with your favorite city in the world if you don't have to spend that money, that huge fraction of it, on just keeping these chunks of metal sitting around." – Dmitri Dolgov

11. AI, Industry Evolution, and Google's Culture

The “Nines” Challenge:
The move from research prototypes to ultra-reliable real-world systems is exponentially hard—the last “nine” (decimal place of reliability) is a 10x challenge each time (59:27).
Did Google Start Too Early?
Dolgov says no—iterative learning across “waves” (ImageNet, transformers, VLMs) has been essential. There’s “no silver bullet,” and early investment was necessary for the eventual technological curve (59:27).
Google’s Cultural Edge:
Google’s culture cultivates vision, technical depth, and the stamina to persist through the long arc of R&D experiments (62:10).

Memorable Quotes & Moment Highlights

Building Foundation Models:
"Building a big model that understands how the physical world works and understands the important properties of what it means to drive, the social aspects of driving and what it means to be a good driver as opposed to a bad one."
— Dmitri Dolgov (@00:00, 08:13)
On Emergent AI Moments:
"It just kind of blew my mind."
— Dmitri Dolgov describing the pedestrian-behind-bus detection (@46:11)
Pragmatic Philosophy:
"As long as the sixth generation [vehicle] where it made sense to go and spend all this effort into the custom."
— Dmitri Dolgov (@35:17)
Driver Assist vs. Full Autonomy:
"I see it just as fundamentally two different problems. There's driver assist systems and then there is full autonomy. And I think it's deceptive to think of them as kind of incremental on one spectrum of complexity."
— Dmitri Dolgov (@50:24)

Useful Timestamps for Reference

[04:34] – Sensor architecture and local inference explained
[08:13] – Foundation models, drivers, simulator, and critic roles
[14:29] – End-to-end vs modular debate
[18:02] – The limits of camera-only, imitative learning, need for richer world models
[23:06] – Nuance in drop-off and urban driving challenges
[25:04] – Transition from technology to deployment focus
[28:04] – Hardware platform evolution and scaling to new cities
[32:14, 36:03] – Gen 6 custom vehicle and new sensor stack
[39:07, 40:05] – Sensor fusion in adverse conditions
[43:51] – Emergent behavior: detecting the unseen pedestrian
[47:25] – Live scaling numbers: rides, miles, cities
[52:01] – Operational infrastructure at depots
[56:25] – City/traffic transformation from mass autonomy
[59:27] – The “nines” and why there’s no “magic moment” for autonomy
[62:10] – Google’s role; talent and cultural strengths

The System Behind Self-Driving: Waymo’s Dmitri Dolgov

Summary

Podcast Summary: The System Behind Self-Driving – Waymo’s Dmitri Dolgov (The a16z Show, April 17, 2026)

Episode Overview

Key Discussion Points & Insights

1. From Research to Real-World Scale

2. Waymo’s Technical Architecture: Sensors, Local Compute, and Models

3. AI & Foundation Model Ecosystem

4. End-to-End Learning vs. Modular Systems

5. The Hardest Problems in Autonomous Driving

6. Scaling and Expansion: Hardware and Global Growth

7. Sensor Fusion: LiDAR vs. Radar vs. Camera

8. Emergent Behavior & Notable AI Moments

9. Operational Infrastructure at Scale

10. Product, Expansion, and the Future of Autonomy

11. AI, Industry Evolution, and Google's Culture

Memorable Quotes & Moment Highlights

Useful Timestamps for Reference

Tone & Language

Conclusion

Transcript

Summary

Podcast Summary: The System Behind Self-Driving – Waymo’s Dmitri Dolgov (The a16z Show, April 17, 2026)

Episode Overview

Key Discussion Points & Insights

1. From Research to Real-World Scale

2. Waymo’s Technical Architecture: Sensors, Local Compute, and Models

3. AI & Foundation Model Ecosystem

4. End-to-End Learning vs. Modular Systems

5. The Hardest Problems in Autonomous Driving

6. Scaling and Expansion: Hardware and Global Growth

7. Sensor Fusion: LiDAR vs. Radar vs. Camera

8. Emergent Behavior & Notable AI Moments

9. Operational Infrastructure at Scale

10. Product, Expansion, and the Future of Autonomy

11. AI, Industry Evolution, and Google's Culture

Memorable Quotes & Moment Highlights

Useful Timestamps for Reference

Tone & Language

Conclusion