
Waymo is now delivering hundreds of thousands of fully autonomous rides each week — but getting there required more than better models. It meant building a complete system for training, evaluating, and deploying a driver in the real world. In this episode — originally aired on the Cheeky Pint podcast — Waymo Co-CEO Dmitri Dolgov joins John Collison to break down how self-driving actually works today: from sensor fusion across LiDAR, radar, and cameras, to simulation, “critic” models, and the role of AI in decision-making. They also explore why full autonomy is fundamentally different from driver-assist, what it takes to scale globally, and how recent advances in AI are reshaping the path forward.
Loading summary
Dmitry Dolgov
When you're driving around or being driven around, say, you know, we think about what we're building as a driver. I can imagine building a big model that understands how the physical world works and understands the important properties of what it means to drive, the social aspects of driving and what it means to be a good driver as opposed to a bad one. I would say that we've clearly moved past the stage of scientific research and kind of deep core technology technology development to this new phase of accelerated global scaling and deployment.
Podcast Host
Waymo is now doing nearly half a million fully autonomous rides a week across multiple cities. A shift from long term research to real world scale. In this episode originally aired on the Cheeky Pint podcast, Waymo co CEO Dmitry Dolgov joins John Collison to break down how they built the system behind it from the sensor stack in why LiDAR still matters to the role of simulation and critic models in training the AI. They also get into why driver assist won't naturally evolve into full autonomy, what it takes to scale globally, and how the product itself is changing from custom built vehicles to entirely new economies of ride hailing.
John Collison
Dmitry Dalgov is co CEO of Waymo. He joined Google's self driving car project in 2009 as one of its first engineers and was repeatedly promoted until he took it over in 2021. Waymo is Google's most successful moonshot and now provides over 500,000 fully autonomous rides each week. Cheers, by the way.
Dmitry Dolgov
Yeah, cheers.
John Collison
You grew up in Russia, right?
Dmitry Dolgov
Yes, I grew up in Russia, yeah. Then I was actually Soviet Union.
John Collison
Right, exactly.
Dmitry Dolgov
My dad is a physicist. So the Soviet Union started falling apart and then he had a position, a visiting position in university in Kyoto University for a year. We moved there as a family and then he went to Berkeley and I kind of tagged along and then I ran out of. I graduated from high school. I was thinking about the next thing I wanted to do and I really like that technical school in Russia.
John Collison
The Russians are serious about the physics.
Dmitry Dolgov
They are, they are. So I went back to Russia and I got my bachelor's and master's.
John Collison
What year was this that you went back to Russia?
Dmitry Dolgov
1994. Okay.
John Collison
So that was kind of almost peak Russian optimism in a sense where it was opening up.
Dmitry Dolgov
It was. Yeah, yeah. No, I actually remember talking to my mom about it. And you know, of course my parents grew up in the Soviet Union. They've seen it. You know, they were born right before the war and then they saw, you know, they lived through some really tough times and I remember talking to my mom and saying she, you know, in fact, I got my green card here in the US before I went back and she insisted that I do it. And I was actually, at the time, I wasn't thinking of coming back. But I know I was pretty excited about where Russia is and the trajectory it's on. And, you know, being nine, young and naive, I was like, there's no turning back.
John Collison
And so why did you decide to come back? There's more of a play by play than that.
Dmitry Dolgov
School. Yeah, yeah, no school. It was pretty clear to me like I wanted to continue studying math and computer science. And while the undergrad and master's that I got in physics and applied math, that I think was still an incredibly strong kind of foundational school of Russian math and science graduate school, it was very clear to me that the best way to do it was in the us So I came back.
John Collison
I'm struck by the founders of the two most valuable UK companies are Russian math nerds who both went to the same school. Nikolai at Revolut and Alex Gerko at xgx. But yeah, it's a, it's a strong diaspora.
Dmitry Dolgov
There's a company not far from here where one of the founders also has a similar pedigree. A company that we're closely related to.
John Collison
Exactly. You had a classic engineering interview question of what happens when I type google.com and hit enter as Talk me through whatever you like, HTTP and DNS and bg. You can go down to whatever level of stack you want. Do you want to maybe just describe when I take a ride in away mode today, what's happening at a technical level? Like what is the architecture?
Dmitry Dolgov
Let me answer your question. What's happening in real time? But this is going to be only a part of the story because we're going to be talking about kind of the inference, the real time inference part of it. And if we want to have a deeper, richer technical conversation, I think it would be interesting also to zoom out and talk about kind of the entire ecosystem of what goes into building, evaluating and deploying the Waymo driver. But when you're driving around or being driven around, say, you know, we think about what we're building as a driver, Obviously it's not a car, so it has a number of sensors that are positioned around the vehicle. We use three different sensing modalities. There's cameras, there's lighters or lasers, and there are raiders. Those are the primary ones. They're also microphones, directional microphone arrays, but those are the Primary three for sensing the world. They all have very nicely complementary physical properties. They all have 360 degree coverage around the vehicle. So the Waymo driver sees 360 all the time. So all of the data goes into a computer, you would expect and they're the software that process. Now it's all AI, specialized AI in the physical world. So it processes the sensor data nowadays talk about it in the using AI terminology as encoders that take this data in. And then there's the kind of the decoder, the action, the generative part, if you will, in the car. And the generative task there is to figure out how to drive. And that is of course connected through kind of a specialized interface to the car where we can actuate the vehicle. And that's why you see the steering wheel turn and it drives you around.
John Collison
Okay, so I get into my car, there's three main families of sensors, lidar, radar and cameras. And then it is using that to first build a model of what's going on in the world, where are all the other cars and things like that. And then as you say, make decisions and then actuate that with the car. That is the system that you're living in. Is all that inference done locally or presumably yes. Nothing's in the cloud, Nothing real time?
Dmitry Dolgov
Nothing real time the cloud. And there are some things that can happen in the cloud, but they're not required. Got it.
John Collison
What's an example of a nice to have that happens in the cloud?
Dmitry Dolgov
You can imagine a situation where we do some of it is not directly related to the Tesco driving, but say after you leave the car, we want to check that the car is not dirty, you didn't leave anything there. If you did leave an item. Well, if you left in a mess, then I want to send the car to one of our depots, get it cleaned up. If you left an item there on your phone, all right, we want to detect that and then send it to our listened phone and let you know so that we do by asking a model that actually lives off board as opposed to having to put it on the car because it's not a real time task related to the driving. So that's one example of something.
John Collison
There are all these debates that go on on Twitter around self driving. So I can think of, you know, end to end versus the more kind of modular approach. There's cameras only versus array of sensors and I can't tell. Are these debates actually interesting to an expert in the field or do you think these are just settled matters? And they're just grist for the algorithm.
Dmitry Dolgov
I understand where the questions are coming from. I do find that kind of often the way they're posed and the way the debate happens is losing a lot of the nuance and a lot of detail that really matters are to me the most interesting technical questions are in that level. Because the way we think about building the Waymo driver, it starts with a large off board foundation model. I can imagine building a big model that understands how the physical world works and understands the important properties of what it means to drive, the social aspects of driving and what it means to be a good driver as opposed to a bad one. So that's the foundation. Then we specialize it into. Let me call you three main offboard teachers. There are still large, high capacity offboard models. There's the Waymo driver, there is the simulator, and then there is the critic. Right. And those then get distilled into smaller models that you can run inference on faster. So the Waymo driver becomes the backbone, the mal backbone of what's in the car. The simulator, of course, is what powers our synthetic generative environment that can run on the cloud. 4 Training and for evaluation in closed load system.
John Collison
And the critic, does simulator ever run locally?
Dmitry Dolgov
No, no, it doesn't. However, what I think is interesting in a way, the way the decoder works, the way the model works, if you think about the generative task in the simulator of kind of creating those realistic worlds and how other people behave, how cars, pedestrians, cyclists in order, and the task that you have to solve on the car in real time, there is this fundamental shared capability of understanding how these objects relate to each other and predicting what they might do in the future. If you are running on the car and then generating those, you know, some sampling those probabilistic behaviors in the simulator. So it's different model, but there is, you know, this is why the shared foundation model is able to power both. And similarly, if you think about the critic, like the job of the critic is to find interesting events and then be opinionated about what's good behavior and what's bad behavior. Similar fundamental understanding, right. If you're running inference on the car, you still have to figure out which of the multiple hypotheses of these future worlds you want to take action to steer it towards.
John Collison
Okay. And these are all downstream of the same foundation model.
Dmitry Dolgov
That's right. So you start with the foundation model, then you specialize in fine tune still offboard model. Those are the teachers. And then you distill each one of the teachers kind of distill trains its own student, the driver, the simulator, the critic.
John Collison
Yes, you started working on self driving 20 years ago. As you think about the tech evolution, is this just the scaling laws story where we had to be able to throw enough compute at it? Were there architectural approaches we needed to wait to have be invented? Was it just a story of we needed 20 years of going down the wrong cul de sacs before we eventually arrived at the right approach? Knowing what you know now, could you have a successful Waymo in market in 2015 or was there some enabling technology?
Dmitry Dolgov
No. Technology breakthroughs that happen over the years were critically important, primarily in AI, but also in other areas like compute, heavy compute that need to run.
John Collison
Now.
Dmitry Dolgov
I wouldn't characterize it as going a thousand different dead ends and then having to retract and then finding the one right path. I would characterize it as iterative learning and evolution. And then transformers came around. But transformers, for example, are very general. Architecture powers, LLMs powers our models. But how you apply them to that space, I think this is where you
John Collison
didn't just fall out of transformers.
Dmitry Dolgov
Exactly right. And of course people like to talk about architectures, but architecture is important. But really a lot of it comes down primarily to your metrics, to your evaluation mechanisms, to all of the training recipes and of course new data.
John Collison
Yes, lms are good as text or tokens specifically, and obviously perform best at domains that have some kind of single corpus of text they can work on, like coding, where it's very helpful that everything was just kind of textual already. And part of the success has been creating textual representations for domains so that we can then, you know, put LLMs against them. Can you describe how you encode the world that you're seeing? I mean, are you just building a 3D model, like a 3D bitmap essentially, or.
Dmitry Dolgov
So this is where I think get a bit into the question of what is the interface between the encoder and the decoder parts? And I think that touches also on the thing you flagged earlier, where people like to debate end to end or not end to end. So the way let's talk a little bit about end to end and then get back to what is the interface between those two. So when you say end to end, what do we mean? We mean that it is some large ML model. Typically you don't build them monolithically. You have different parts and different subgraphs. But what's important is that you can propagate back props gradient and the loss function all through the different layers. So every layer you can learn the weights and the representations that matter for the final task. You don't force it through some narrow funnel between, let's say the encoder and the decoder.
John Collison
Yeah, I think I have a simple view of end to end being pixels go in and car actions come out. Which may be a bit of an oversimplification, but.
Dmitry Dolgov
Yeah, yeah, that's exactly right. And this is kind of the basic vanilla version of it. Right. If you think about the. What will it take to build the driver that's capable of fully autonomous operations? You think about this entire ecosystem of the driver, the simulator, the critic. If that's all you do, pixels in, trajectories out. It becomes very difficult to do all of those three and achieve the high level of safety and performance that we require. And it becomes very difficult to kind of do it at scale. However, if that's. It's kind of a very easy way to get started. Right. You collect some data, kind of like to the LLM world. Right. The easiest thing you can do is have pick a model. The easiest way to get started nowadays would be just take a vlm. It already has a language aligned camera encoder and then it has a decoder that can predict, generate text and you can fine tune it and say, hey, instead of text, generate trajectories. Very, very doable. In fact, a while ago published a paper called EMMA that did exactly that. And it will actually mean in the nominal case, drive pretty darn well. Which is mind blowingly impressive.
John Collison
That is very funny. Yeah.
Dmitry Dolgov
And I mean there's something to you.
John Collison
You're saying you can take an off the shelf model which has nothing to do with driving to start with, and you'll get these good results.
Dmitry Dolgov
That's right.
John Collison
Yeah.
Dmitry Dolgov
You get in the nominal case. Yeah. I just want to be clear. It's orders of magnitude away from what you need. Yeah.
John Collison
You should not try it on the streets, but it works. For example, if you're talking horse, it's impressive that it's talking, you know.
Dmitry Dolgov
Exactly, exactly. And you can actually. The product that you wanted to build was maybe a driver assist system, not a fully autonomous system, then maybe that's all you need to do. And then for that you don't need all this other machinery of the simulator and the critic because the number of nines is drastically low. But this is interesting because there is some intuition behind why that works. If you think about the hard parts of driving, it's not unlike having a conversation, except if in the LLM world, you're modeling language or maybe modeling a dialogue in the space of sentences and words. What makes driving hard is also this multi agent, social interactive part of it. If I do something that's going to affect you, it's going to affect somebody else. And the driving history matters. It's not local and just geometric. Context matters, semantics matters. But it's in a different. It's not in the language of words, it's in the language of kind of body language, if you will. Right. And we see that empirically validated if you do this approach. Okay, so then let's say we build this thing just cameras, camera, encoder, pixels go in, trajectory go out. The quality is sufficient to drive in the normal case. It's not sufficient to deal with the long tail of all the edge cases and hit the high bar of superhuman safety that we require. So then you start asking the question, what else do you need?
John Collison
Yes.
Dmitry Dolgov
And if all you did was kind of observing how other people drive when you trained this system, maybe observing just passively how people drive and how they interact, maybe also driving the car yourself and then using imitative learning to train it in mind that that's not enough. You have to do something in closed loop. You have to, you know, you have to do things like rlft, which is also, you know, parallel to what we see.
John Collison
Rlft.
Dmitry Dolgov
Rlft. Reinforcement learning based fine tuning.
John Collison
Okay. Yeah. Yes.
Dmitry Dolgov
Similar to the reinforcement learning with human feedback in the LLM world. Right. You want to do maybe closed loop, proper closed loop driving where you explore all kinds of different situations and then you give it a reward signal to kind of keep it in distribution. For that then you need a realistic simulator. Right. You also, if you want to have a good RL system, you need to have an opinion for the reward function. This is where the critic comes in. If you have a purely end to end system, let's look at the simulator. And what do you do? You have to, you're then constrained to just go from pixels to trajectory. Right. That's all you can run the system on. Right. And it's a very high dimensional space. So it's a hard problem to generate everything. But even if you solve that, it just becomes incredibly inefficient to run it in the full way of pixels to trajectories and simulation for training or for evaluation. So this is when intermediate representations come in. There are some intermediate representations in the world in this task, in the physical world we know are correct. They're not sufficient, but they're not generality limiting. There's an object here, there's a concept of a road, there's signs, there's speed limits. So this is where augmenting that learned representation, those learned embeddings from the encoder decoder with that more structured representation is what we do. And we find that this kind of gives us additional knobs to simulate in that space, just pixels to trajectories. It allows us to have additional safety validation layers in real time. And it also allows us gives us additional mechanisms to specify the reward function for evaluation, the critic or for training. So this is again we've gone kind of full circle of is it intent? Yes, it is, yes. But if you want to do it at scale for full autonomy, it's augmented with all of this other stuff that's
John Collison
very interesting on the simulating point. It's just very hard to simulate for an end to end model because it's easier to deal in intermediate representations rather than coming up with the pixel perfect view of the world.
Dmitry Dolgov
You need both.
John Collison
Yeah.
Dmitry Dolgov
So having end to end architecture that's augmented with that structure allows you to kind of play in both of those worlds.
John Collison
Yeah, yeah, yeah. What are you looking to do as a self driving car? I mean it sounds funny, but I think people maybe don't realize that there are many different things that you're looking to solve for where you're looking to get the person to their destination, you're looking to get them there reasonably promptly, but also drive quite smoothly and also have many lines of safety and also not annoy other drivers and get honked at. And, and, and so what are some of the reward functions or kind of things you're optimizing for that maybe are not obvious to people?
Dmitry Dolgov
So safety is the primary focus. But of course we also want to be a smooth driver so that for both people in the car and other actors. And we also want to be a predictable well behavement so that it can nicely fit into the whole social ecosystem of our roadway. Smart.
John Collison
It seems like one of the issues that has quickly emerged with self driving is the fact that people can't have nice things or not everyone is nice to the robots. And so whether you're driving through a dodgy area or getting blocked or maybe I'm not gonna drop you off here, maybe I'm gonna go around the block and drop you somewhere better. But all of these, as you say, kind of other human issues, how do you go about solving this?
Dmitry Dolgov
A lot of the ones that you mentioned are just things that we need to work on and understanding honestly that if we're not dropping you off, we're exactly where you want it to be dropped off or we don't give you a good interface to tell us that's on us, make it better.
John Collison
It feels like the drop off is actually a pretty nuanced part of the, the self driving journey. Like the highway stuff and the, you know, the 35 mile an hour roads like that is all nailed. But there's just like a lot of nuance in the drop off experience.
Dmitry Dolgov
I'd say they're all hard. You picked freeways and you picked drop offs for different reasons. Right. For drop offs there's. You're absolutely right. There are a few things that are maybe not obvious. You just think about this problem, but it's understanding where you want to go and making it as convenient as possible for you and pick ups and things from drop. It's not exactly symmetrical. But then I was also understanding the context of the situation where you, you know, where do you stop? You don't want to block a driveway, you don't want to, you know, double park. Although in some cases where if it's a quick one, maybe it's okay. So there's a lot of nuance that goes into doing that well, so that it's a smoothless, frictionless experience for the rider as well as other folks. Freeways, for most of the time, not much happens. They're very well structured because we design them that way. But there is still that long tail of really complicated stuff that happens where the consequences of bad event are much more severe. Right. Speed is much higher. Everything is quadratic in speed, so. But we see a lot of stuff. Imagine grills falling off of freeways. Imagine people getting into accidents and kind of spinning out of control. You see one of those flatbed trucks
John Collison
with just like a bunch of stuff piled in it and you're driving behind us. I don't know. I always find it very nerve wracking. Looks big.
Dmitry Dolgov
I know. Yeah. And we've seen them leave a trail.
John Collison
Yes, yes.
Dmitry Dolgov
Yeah.
John Collison
Okay. So there's a different set of problems, but I feel like the general sentiment with Waymo is that the driving has mostly now been solved by you guys. And it's kind of a question of scaling up and maybe some super long tail stuff. Really snowy conditions. Is that your sense internally or is there actually much more nuance to it than that?
Dmitry Dolgov
I would say the. Yeah, it's not like we're done with engineering. I would say that we've clearly moved past the stage of scientific research and kind of deep core technology development to this new phase of accelerated global scaling and deployment. We still have work to do, but I don't see today any limitations or any gaps in the core technology.
John Collison
The driving is good enough now.
Dmitry Dolgov
Well, the core technology I think is good enough that I can't think of any aspect of driving that is not supported by the fundamental technology. Now that said, there is a lot of work to do in specialization and invalidation before we can deploy responsibly.
John Collison
Right.
Dmitry Dolgov
We're not driving everywhere in the world. We are planning to start operating in London and in Tokyo this year. Do we have a driver that you are using today in San Francisco that we can just plop down in London and go no. But what we're seeing is incredibly encouraging from the perspective of like is the core technology there. So now it's a matter of collecting the data, doing some specialization and validation. Signs are different in both of those places. People drive on the other side of the road, but that's actually not that hard for computers. And core technology generalizes really well, but it's still work that you have to do.
John Collison
What generalizes least?
Dmitry Dolgov
Well, increasingly we're finding especially now that we're able to kind of hook the Waymo AI to the AI in the digital world and the VLMs and kind of inherit the general world knowledge from VLMs. We're seeing really strong results from like zero shot or few shot learning because of that general knowledge that we bring. But there are a few things like say cold weather, cold winter weather where it affects the entire stack. Right. So it's not just the AI, but you actually have to.
John Collison
Hardware.
Dmitry Dolgov
Yeah, you need the hardware. You need to have the proper cleaning solution, heating elements in it. And then you think about things that are completely solvable. But computers like motion control on slippery surfaces. Right. So that takes a bunch of work. You don't get that for free from just pulling it some VLM encoder.
John Collison
Was it the case? I mean, my impression, not knowing anything is that in the early days there was maybe a lot of San Francisco specific work or Phoenix specific work in the early markets, whether it be mapping or something else, and that you guys seem to either have solved that in generalizing it or just scaled up your ability to do the city specific work. What enabled the kind of the rapid city expansion?
Dmitry Dolgov
We usually think about it kind of the capability of the wingman driver as well as deployment, not primarily indirectly in that space of cities or zip codes. I think about the operating domain and then the freeways, cold weather, freeways, cold weather, snow, rain, fog, density, et cetera, et cetera. And then that's what we are building, that's where we're evaluating. And then that maps to city, like particular city, be within the operating domain or outside of it. But where, if we rewind history a little bit, our initial deployment in where we started offering a fully autonomous commercial service for the first time was in 2020 in Chandler, Arizona. And that was on what we called the fourth generation of the Waymo driver. This was if you remember the Pacifica minivans with different hardware, different software there. We were super focused on kind of doing the whole thing end to end, learn how to build the driver, evaluate it, deploy regularly, operate it end to end 247 with customers, learn from the customers. And then we're very focused on that operating domain of mostly Chandler, which is a medium low complexity one. Then when we made the jump to the fifth generation of our system, this is what's on IBIS today, we really wanted to take a huge bite out of that operating domain. We collected data all over the United States, all different states, different cities. And we chose to deploy in the hardest parts of San Francisco, hardest parts of Phoenix. We made a big jump on the hardware side and most importantly on the software, the AI side. And I would say that was the big discontinuous jump. And that's what you're seeing now after we've scaled up and iterated on the all of the aspects of building and deploying driver. This is now why you're seeing us kind of go in parallel and scaling in the US and so driver version
John Collison
5 was just a much more generalizable stack than version 4. And what was it about? Was it just that it had been trained on a much wider data set?
Dmitry Dolgov
It was when we made this big bet on AI there was a lot more, you know, kind of little AI models and ML models in the fourth generation gotta made a much bigger bet and jump to kind of AI as the backbone for the fifth generation.
John Collison
AI is the backbone as the core engine. As in, you're saying that Gen 4 had lots of small little AI subsystems, for example.
Dmitry Dolgov
Okay, yeah, yeah. And that's been. So we made that jump and we've been iterating and improving the model since then.
John Collison
Can we talk about hardware a second? So lots of hardware questions, but one is maybe everyone in this space has a very charismatic demo of a vehicle that is custom made for self driving. And so it's often the van with the no Steering wheel, seats facing in both directions. You guys have one. Tesla has the steering wheel less. Cyber Cab Cruise had the cruise origin. And yet we're still driving in Jaguars that have a steering wheel in the front and are pretty similar to consumer cars. And it's interesting to me because if we were Talking about this 10 years ago, we might say, well, yeah, developing a custom car that's relatively straightforward. We know how to put a bunch of sensors on a new car, but the software will take a long time. And what's interesting is we've made huge progress in the software, but interestingly, the cars are still derivatives of cars that people are driving. And so I'm curious why you just think the custom hardware has not happened as of 2026. It's obviously a small improvement compared to Waymo is the big improvement, but it's just interesting that it still hasn't happened.
Dmitry Dolgov
Well, I'd say our sixth generation of the vehicle and the driver is our version of that.
John Collison
Oh, no, I know. It is.
Dmitry Dolgov
Ohio platform. Right. So that is, you know, still has the, you know, we can talk about, you know, whether you want to have the seats pointed backwards or not. I actually think it looks nice in a demo, but practically speaking, not the way to go. But that is, it is a custom designed vehicle and it is. We put a lot of thought into, you know, moving away from a car that's designed around the driver.
John Collison
Yes.
Dmitry Dolgov
To a car that's designed around passenger. And it's much more spacious, but it's happening. It's not open to the public yet, but I took a ride in it the other day fully autonomously. And that's coming this year.
John Collison
Yes. How much better is it as a passenger experience?
Dmitry Dolgov
You'll tell me once you give it a try. I love it. So it's all about the space and the convenience of ingress and egress and the screens and the interface of the pasture. So we put a lot of thought into every aspect of it. It has sliding doors, it's very easy to get in. It has a flat floor. It is, yeah. If you sit on the back, you can like fully stretch out and there's so much space there. And it looks from the outside it looks fairly big, but the actual footprint of that is not bigger. Barely, barely, barely. Larger than the ipace. So it's kind of amazing that you walk in, it feels like you're in a living room.
John Collison
Yes. I guess my question is just Waymo does 25 million rides a year. Run rate.
Dmitry Dolgov
Ish.
John Collison
With the Jaguar, I pace and it's interesting that so much scaling has happened with self driving so far on the old retrofitter. Maybe that's to be expected, I think.
Dmitry Dolgov
Well, it matches the high. I don't think it's a given. You're right. But if you think about the value proposition, of course there is the safety of it. You don't have to worry about it. There's also the privacy being in the car by yourself, maybe with other folks, but not having to share the space with another human. Right. Maybe.
John Collison
No, we haven't. Great products.
Dmitry Dolgov
Yeah. But I guess this is why we're seeing such consistency. Drives well, very predictable. And you can go beyond that. Right. And you specialize even more to make the experience even more magical around the rider. But I guess it would have been disappointing if without the specialized car. And I think I would have been surprised if we leveled off at some other much lower level of customer adoption. Because a car seems like more of an optimization improvement. But the core of the value proposition comes from those other factors.
John Collison
Yes, yes. I guess just take risk on one thing at a time. We'll start by doing the software layer and then we'll build a specialized car or something like that. That's right.
Dmitry Dolgov
Yeah, yeah, yeah. It's also, I mean, as you said, it's a big investment. So you have to like you de risk the fundamentals. And you know, throughout our history we were very focused on setting the most, you know, the biggest goal for the company to de risk the most important questions. Right. We talked about the third generation where we wanted to deploy something and go end to end. We talked about what was the goal with the fourth generation, sorry, the fifth generation. And then there's the sixth generation. Right. So as long as the sixth generation where it made sense to go and spend all this effort into the custom
John Collison
and sixth generation is both a custom vehicle. Is it also a new generation of the driving stack?
Dmitry Dolgov
Yeah, it is. The new hardware.
John Collison
Yeah, yep.
Dmitry Dolgov
The sensors, the hardware, the self driving hardware they're putting on. The Ojai vehicle is the sixth generation. It is very different from the fifth generation. It is simpler, it is more capable, it is much lower cost. It's like a fraction of the cost. It's comparable to what you would get like a fancy ADAS system nowadays, the driver assist system, the software is pretty much the same. So that's another. So when we talk about generalizability of the Waymo driver, we talk about weather conditions, we talk about cities, but it also generalizes well to different vehicle platforms and different sensor configurations.
John Collison
Okay, so Gen 6 is a new vehicle and a new sensor stack but a similar, it's almost a TikTok cycle happening here. It's a similar software.
Dmitry Dolgov
That's right, that's right. And then we're going to put the sixth generation Waymo driver on other vehicle platforms like the Hyundai Ioniq that's coming, you know, later in the year.
John Collison
What is different about the 6th generation hardware stack and how did you make it cheaper?
Dmitry Dolgov
Systole has the same three sensing modalities but we've made significant optimizations in all three. So unification, simplification and there's just the kind of. Just writing the. Yeah.
John Collison
Is it a classic case of manufacturing scale where we're not doing a lot more.
Dmitry Dolgov
Well, scale hasn't fully come into place but all of those if you think about the kind of the supply chain the industries cameras is pretty mature. Radars way many years ago used to be bulky, complex, very expensive when we were putting them on planes. But then we started putting them on cars. Now you can get a decent automotive radar for tons of dollars. There is a variant of the automotive radar, it's called imaging radar. It gives you a richer. So that is also has come down in cost drastically but it's a little bit behind your standard automotive radars. Lidars are following the same very predictable, very well known trend. So we're writing that and we're also learning from the previous generation to just make improvements and simplifications and optimizations.
John Collison
Tyler, very silly question. What are lidars versus radars better at in a self driving company?
Dmitry Dolgov
Things lighter.
John Collison
Are they complimentary?
Dmitry Dolgov
They're very complimentary, yeah. You know, it's all blasting, you know, Effectively like you know, blasting, you know, photons out there and then they bounce off of something, they come back. You know, you measure what comes back. The frequencies are very different. So laser gives you it's very, very high resolution. So you can think of it as like a laser beam that goes out, spins around, it shoots out millions of these laser pulses per second and then each one comes back and you can, you're kind of sampling the 3D structure of the world with very high resolution
John Collison
lidar for very fine grained mapping.
Dmitry Dolgov
That's right. Radar has much lower resolution but because of the physics of degrades, much better in adverse weather conditions. So fog, snow, heavy rain.
John Collison
So it's not gonna be occluded by particles between it and the target.
Dmitry Dolgov
So imagine driving in super dense fog.
John Collison
Yes.
Dmitry Dolgov
We're close to San Francisco so probably don't have to think that hard. It can be really hard to see. So cameras degrade. Laser, depending on kind of the size of the particulates, can, can degrade better or worse than camera. Radar is not well affected. So you can imagine driving on a freeway. Then radar will give you really good returns for cars that are absolutely invisible in the camera space.
John Collison
That's interesting. So does that mean there are some environments where you'll be relying significantly more on radar, but the performance is good enough?
Dmitry Dolgov
Well, it's a combination of the sensors, right. So we rely on, you know, each one is noisy. Right. How the noise characteristics show up in different environments is different. But it is. I mean, it's not like we switch from one to another. It's not like we estimate what's happening with the world through cameras and through radars and through lighter. And then we compare. No, they're like, there's an encoder for camera, there's an encoder for lighter, there's encoder. And they all go into the system that gives you jointly the best view of what's happening in the world. So if you are, you know, if it's a nice bright sunny day, cameras are very valuable. If it's pitch dark or you have like sun in your face or you're blinded by the headlights from oncoming car, then camera will degrade. There's still some noisy signal, but it will degrade, yes. And LIDAR is completely unaffected.
John Collison
Are there technical problems that are your. Or you're still chasing or you are particularly interested in solving, even if they're kind of niche for the. We really want to have driving when it's actually snowing nailed or steep hills in San Francisco or are there problems you've been very interested in historically or still are?
Dmitry Dolgov
I'm super excited right now about the accelerating global expansion. More cities in the United States and going internationally. So being. I don't understand, I'm not answering your question about the technology. I'll come back to that. But really, that's the thing that I'm today most excited about. Just getting to a place where any major metropolitan area, you can fly into the airport and then take a Waymo and go anywhere you want to go. That is insanely exciting to me right now. So then technically what. What I'm most excited about is all of the rapid progress in AI and the world models, the foundational model work. And it is just such a massive boost to how much we can simplify the system, how much we can bring down the cost, and how we can Scale globally. And there's like some magic that happens that I don't think I would have anticipated, you know, a few years ago. So that I find from the technical perspective, just insanely thrilling.
John Collison
Yes. When you talk about kind of the progress in AI, what are the most fun parts of it for you these days?
Dmitry Dolgov
I think it's seeing the capability and the scaling laws from this approach of starting with that cornerstone of the foundational model and then specializing to T shirts and then distilling. You get such big wins in performance across the board. You invest something into the architecture or get better at data or training recipe, and then you invest at that early stage, and then it has massive amplification and ripple effects. So that that is, in some ways, is kind of magical. And then I guess then you see it on the car. And I've had some moments where car does something and you look at a log, and I've been surprised. It does things that I didn't think it was capable of doing. So it's that it's just when you
John Collison
see emergent behavior, that's kind of a proud moment.
Dmitry Dolgov
One example. Yeah. It's when you build a system and then you think you understand how it works and you understand fully the limits of its capability and performance, and then it does something kind of almost magical. It's exhilarating. So one example I can give you, I think I've shared some videos of that publicly in some talks. Was this example where the situation happened in San Francisco or fairly benign situation? We're at an intersection, our light is red, there's cross traffic. A bus goes by and it stops partially blocking our light, turns green. So we start to go, we're nudging around the bus. And then you see a pedestrian being detected on the other side of the bus. Right. And then your car responds appropriately. It slows down, goes a little bit wider. And then a pedestrian actually emerges from the bus, and we go on our own way. So the first time I looked at that log, what's going on here? I know we have pretty darn good sensors and the software is very capable, but we don't see through stuff.
John Collison
Right.
Dmitry Dolgov
That's not how cameras or lighters and radars work. Right.
John Collison
I saw the pedestrian through the bus.
Dmitry Dolgov
You saw the pedestrian on the other side of the bus. And it's not like you look at the windows, you're like, okay, radar shooting this massive metal box. Look at the sensor data, and radar shouldn't be able to go through it. Camera, you can't see in the camera because there's reflections and there's people on the bus. So it's not like you can see through the windows. So what is going on? Maybe it's noise or some coincidence. And first time I saw it, I couldn't actually believe it. I was like, no, no, there's something doesn't smell right. So what actually turned out was happening is that our peripheral lighters bounced under the bus and there was just a little bit of very, very noisy reflection of the movement of the person's feet. That was enough for the AI models that hey, likely there's a pedestrian there and I'm going to, I detect it as such. And, and moreover there's enough data there to predict what they're going to do. And it just kind of blew my mind.
John Collison
Is this the perfect example to explain what we were talking about earlier? The value of one fusion across a sensor suite? But then secondly, building, I mean relatedly building an intermediate representation of what's going on where if you're just dealing with pixels, I mean the person behind the bus does not exist in pixel space. And so you need to have some representation of the world that exists to be able to reason about the person behind the bus.
Dmitry Dolgov
I think it's an example where giving it kind of using that intermediate representation to boost the level of performance of all parts of the model is what's happening here. Just imagine solving this problem with a black box purely open loop imitative system. Is it impossible? No. In practice, what would it take to achieve that level of performance? Very, very difficult.
John Collison
What metrics can you share on just where the business is at today in terms of rides, revenues, cars on the roads.
Dmitry Dolgov
We have about 3,000 cars on the roads. We're doing about half a million rides per week. That translates to about over 4 million fully autonomous miles per week. We are operating in a fully autonomous mode in 11 cities in the US and 10 of those we have riders. Riders.
John Collison
What's the ghost city?
Dmitry Dolgov
The ghost cities, Nashville, we just started there. So we just opened it up to riders in four new cities in one day. That was one of those little but super exciting moments where I thought back to the history, like how long did it take us from the first time we started fully autonomous rider only operation to the first time we had external riders in four cities? It was about eight years. And then just the other week we just launched four in one day.
John Collison
Yes, yes. It seems now clear that in 15 years most miles that are driven will be autonomous. Like there'll be some burning Puritan. There's lots of old cars on the road. I think it'll actually take a little while. And some of that will be by level 4, level 5 systems expanding in new cities and that expansion continuing. Some of it will be, you referenced existing driver assist systems and kind of getting up to level two and level three and existing systems across current car brands getting more and more capable. What do you think that working your way up from the lower levels versus working your way expanding from existing products like Waymo. What'll that convergence look like? Because we're going to eat it from both sides.
Dmitry Dolgov
I don't believe we will. And I actually think this, that's a great answer. Cars will get smarter. There's going to be advances in driver assist systems and there is at the same time, from level four autonomy, there is simplification and the sensors of today are not gonna be the sensors of tomorrow. So they'll be much more integrated, they'll be simpler, there'll be much lower cost. So from that perspective they're gonna, you know, there is a path of convergence. And there's also a path of convergence from the product lines, ride hailing and what you can take a ride through the Waymo app today. Eventually they'll be on your personal car. So that I see and talk about the technology and I see it just as fundamentally two different problems. There's driver assist systems and then there is full autonomy. And I think it's deceptive to think of them as kind of incremental on one spectrum of complexity.
John Collison
Okay, but you think one cannot work one's way up from driver assist systems to full self driving. You think you have to start building a full self driving system.
Dmitry Dolgov
You have to drive tackle. If I think about the hardest parts of building a fully autonomous rider only system, they are very different from what you do for a driver assist system. And of course some work in the space helps you. I don't want to say you can't make the jump, but it is a qualitative jump.
John Collison
Yes. When can I buy Waymo so that, that I don't need to wait for it? When I want to go, I can just like when I'm ready I can walk out the door and it's there.
Dmitry Dolgov
I'm not going to give you a date today, but you're not the first person to bring this up as a.
John Collison
That's my product request.
Dmitry Dolgov
As a product request. Yeah, duly noted. Okay, I'll add it to the list.
John Collison
Just, you know that waiting for the car, it'd be nice just in the garage there and you keep your stuff in it and everything. It's not the first time you've heard that request.
Dmitry Dolgov
So
John Collison
how it seems to me operationally very intensive and very hard, like a self driving car is actually not self driving. It takes a village. You have all of the human operator ready to step in. And you know, there was that Thundering Herd incident that you guys talked about in San Francisco that kind of highlighted that for people. And then there's just like keeping the cars clean and keeping everything running in that regard. And so can you describe just what the operational infrastructure that sits behind Waymo looks like?
Dmitry Dolgov
Sure. I will say that we are overall in all of those areas on a path of increasing efficiency and automation. So the number of manual steps that that one had to do five years ago to launch a Waymo versus where we are today is drastically different. But nowadays, if you look at one of our depots as a fully automatically orchestrated dance of autonomous vehicles. So the way it looks, what it looks like today is cars will automatically go on there to pick up their riders, serve their trips. If for some reason they need to come back, maybe they're low on energy, maybe somebody left a mess in the car, they will automatically come to the depot if it is. So cleaning today is a manual process. So it'll get flagged. And the car we have fleet management systems say hey, car number, you know, 378 needs cleaning. And we'll actually, on the sensor dome we're able to, you know, display icon. So we'll show you like a little emoji. Yeah. And you know, there's people whose job it is to clean the cars. They'll come and, you know, clean it up. If that's, you know, cleaning is not required and it's just charging, you know, we'll say pull automatically, pull into a charging sole and we'll say, hey, you know, I need charging. We don't yet have automated charging. In the future you can imagine that being fully automated, right. But you know, a person will come in and you know, plug in a cable and, and they'll say, hey, now I'm ready to go. And it will get unplugged and the car will pull out of its parking stall and then go on its merry way.
John Collison
One of the new Porsches, I think it is, has inductive charging, just like your iPhone, where you just drive over the charging mat. I was amazed that that works at car scale. But presumably in the future they'll just be able to drive onto the charging mat. Or do you think just Robotic plug in will be easier.
Dmitry Dolgov
We'll see. We'll see. I don't know. I think there's some questions about efficiency and how that plays into the overall cost and which will be, you know, most cost beneficial remains to be seen.
John Collison
I think how well behaved are the wayo riding population in terms of not leaving a mess in the car.
Dmitry Dolgov
And we have wonderful riders, the most amazing customers in the world generally. I would say they are very good. I think, you know, there is something about, you know, I talked about not having, you know, a person in the car, it's not somebody else's car. In some ways you kind of like want to preserve the. I think generally people want to kind of preserve the nice aspects of it. And I kind of think of it
John Collison
as it's so clean to begin with.
Dmitry Dolgov
I know, yeah, it's kind of like, you know, I think that that's the general trend that we see. Right. And it's like because there's not somebody else's space, you know, you're in it, it feels like it's your own.
John Collison
Yes.
Dmitry Dolgov
Right. So you don't like want to mess up, you know, your own space. So I think, I mean, I don't to want to speculate too much on the psychology of the thing. However, I will say that it varies and you can imagine a college town on a Saturday night and that's a different distribution.
John Collison
Yes, yes. Will I be able to get Waymo at any address that has USPS service in the us or will there be some head tail dynamic where Ketchikan, Alaska is just never worth it?
Dmitry Dolgov
Eventually it will. Absolutely right. There's no doubt in my mind. I think it's just a matter of when and what modality would make the most commercial sense versus privately owned for ride. It's not a technical problem. Technology is solved. But then if you're in the middle of nowhere and there's just not enough density of the trips, does it make sense for the right hailing service that WIM was running to have cars on standby? Probably not. Right. They can be deployed somewhere else and you probably don't want a horribly bad eta. And this is where a personally owned vehicle that is equipped with the Waymo driver is maybe how you will see it materialized.
John Collison
Relatedly, what will the second order effects of say majority autonomous traffic be? It feels like a lot of things will work better where as you say, when someone merges into a lane very poorly and everyone all the way back has to slam on the brakes, that's kind of anti Switching. And so it feels like higher quality and more pro social driving will just, I mean, basically reduce traffic a little bit, even for the same number of cars on the road. But presumably there'll be other second order effects like we'll want higher throughput traffic lights and. Yeah, how else will things change?
Dmitry Dolgov
So the first thing I think, you know, that you mentioned is I think that's a huge deal. I just need to think about traffic jams. Yep. And what's that saying? The Navy SEALs slow as smooth and smooth as fast.
John Collison
Right.
Dmitry Dolgov
That's what like traffic jams are like. You accelerate abruptly, then you come to a stop and sometimes you have a traffic jam. Like what happened? Well, you know, an old lady crossed the road three hours ago and we still have the standing wave. Right. So if everybody was kind of a smooth, predictable driver and a consistent driver, and you would still have those, those traffic jams at the time off, but then the time constant to clean it out I think would be very different. But longer term, things like parking lots. Right. Now, if you look at what is our most interesting pieces of land allocated to it's parking lots, it's garages. And why is that? Well, because again, your car is just sitting there 90% of the time. Right. If more cars become fully autonomous, then there's no need of that. Right. And then imagine, just imagine what you can do with your favorite city in the world if you don't have to spend that money, that huge fraction of it, on just keeping these chunks of metal sitting around.
John Collison
Yeah. I don't think people often realize how big a deal parking minimums are for the layout of the urban landscape. The coffee shop here where I am, would like to have outdoor seating, but can't because it would reclaim parking spots.
Dmitry Dolgov
Yeah, wouldn't it be wonderful?
John Collison
I have a few more questions, but I'm curious to talk about Google's relationship with self driving, where again, it feels like right now Waymo is, aside from everything else AI related, kind of the most exciting thing happening at Google. But it was a very long journey to get here. I mean, I feel like you could say that Google almost started working on it too early because you were saying there's been a bunch of recent enabling technologies and so did it require Google starting when it did so early? Or could one have spun up this project in 2015? 2020, and then how did Google keep the faith when it almost felt like it was perennially two years away?
Dmitry Dolgov
Yeah, no, on the latter part I just have to give credit and huge kudos and gratitude To Larry and Sergey and Alphabet Leadership Center Company. It is part of the culture and the DNA of the company is to have that vision and have the stamina and conviction to go the distance. So to the other part of the question, Was it too early? I don't know. I think what we've been seeing, clearly all of the breakthroughs that we've seen over the years have changed how we're building the system. But the complexity of the problem is such that you need to go through these iterative cycles. Right. And we've seen many waves of technology. There's breakthroughs in 2013, ImageNet came around. There's this narrative, okay, that is the right time to start a BSL driving company. Then transformers came around and VLMs and all of those are super powerful and you have applications in other spaces in the digital world. They certainly have an impact on our AI and the physical world. They're no silver bullets. They kind of drastically reshape that early part of the curve. It's always been the nature of this problem. It's very easy to get started. It's deceptively easy to get started. But it is super hard to go the full distance. And it's the number of nines, right, that you have to. There's the standard engineering rule of thumb that every next nine takes 10x more. So I, yeah, maybe there is a more optimal path, but I don't see there's some magical moment where the true complexity of the problem goes away and then you can just take some off the shelf components and you're a business. If that were the case, then I think the industry would look very different today. Yeah.
John Collison
Yeah.
Dmitry Dolgov
Last question I have.
John Collison
You've been promoted a lot at Google. It feels like Google really recognized your talents. Just what do you think Google does like? Google is famously one of the very best in the world at technical talent and say the current AI wave more broadly happening is either stuff happening at Google or generally Google alumni. But just what have you observed firsthand from how Google does this so well?
Dmitry Dolgov
Yeah, I would say Google, that culture of Google of not accepting the status quo, having a big vision and investing in technical talent and the people who can go the distance and realize the vision that is part of the culture. I think this is what you're seeing. And with the breakthroughs in AI in the digital world and all of the early investments in Transformers and other fundamental technologies, you know, quantum computing and, you know, I guess we're not, not unlike those efforts as well. Thank you. Yeah,
Podcast Host
thanks for listening. To this episode of the A16Z podcast. If you like this episode, be sure to like, comment, subscribe, leave us a rating, or review, and share it with your friends and family. For more episodes, go to YouTube, Apple Podcasts, and Spotify. Follow us on X16Z and subscribe to our substack@a16z.substack.com thanks again for listening and I'll see you in the next episode. This information is for educational purposes only and is not a recommendation to buy, hold, or sell any investment or financial product. This podcast has been produced by a third party and may include paid promotional advertisements, other company references, and individuals unaffiliated with A16Z. Such advertisements, companies and individuals are not endorsed by AH Capital Management, LLC, A16Z or any of its affiliates. Information is from sources deemed reliable on the date of publication, but A16Z does not guarantee its accuracy.
In this episode, Waymo co-CEO Dmitri Dolgov joins guest host John Collison to explore the systems and philosophy powering Waymo’s self-driving cars as they scale to hundreds of thousands of autonomous rides each week. The conversation covers Waymo’s sensor architecture, the centrality of AI and foundation models, the evolution from prototyping to global deployment, the intricacies of ride experience, operational challenges, product design, and the future of fully autonomous driving. Dolgov also reflects on leadership, Google’s innovation culture, and the “nines challenge” of making self-driving safe and real everywhere.
Social/Interactive Complexity:
Driving is like a “multi-agent conversation”—social cues, context, geometric and semantic interactions.
Safety and Reward Functions:
Not just about reaching destinations, but driving smoothly, safely, and predictably within social norms (21:37).
Quote (@21:37):
"Safety is the primary focus. But of course we also want to be a smooth driver... and we also want to be a predictable well-behavement so that it can nicely fit into the whole social ecosystem of our roadway." – Dmitri Dolgov
Edge Cases:
Drop-offs, special urban quirks, heavy weather (snow, fog), and unpredictable objects (e.g., debris or flatbed trucks) remain nuanced engineering challenges (23:06, 26:39, 27:12).
Building Foundation Models:
"Building a big model that understands how the physical world works and understands the important properties of what it means to drive, the social aspects of driving and what it means to be a good driver as opposed to a bad one."
— Dmitri Dolgov (@00:00, 08:13)
On Emergent AI Moments:
"It just kind of blew my mind."
— Dmitri Dolgov describing the pedestrian-behind-bus detection (@46:11)
Pragmatic Philosophy:
"As long as the sixth generation [vehicle] where it made sense to go and spend all this effort into the custom."
— Dmitri Dolgov (@35:17)
Driver Assist vs. Full Autonomy:
"I see it just as fundamentally two different problems. There's driver assist systems and then there is full autonomy. And I think it's deceptive to think of them as kind of incremental on one spectrum of complexity."
— Dmitri Dolgov (@50:24)
The tone is direct, technical, but approachable—often peppered with genuine enthusiasm (especially from Dolgov) when discussing emergent AI moments and the operational realities of scale. The discussion remains candid about the challenges, trade-offs, and evolution from research prototypes to real-world deployments.
This episode provides an in-depth, highly insightful look at how Waymo’s autonomous driving system works—from hardware, sensors, and architecture to AI models, simulation, and operational scale. Dolgov emphasizes the complexity of not just perceiving the world but fitting into its social norms, and the exponential effort behind each incremental gain in safety and generalizability. The conversation paints a realistic, optimistic, and deeply knowledgeable picture of why, after decades of work, autonomy is poised to transform both technology and the physical layout of our cities.