
Loading summary
A
Dmitry Dolgov is co CEO of Waymo. He joined Google's self driving car project in 2009 as one of its first engineers and was repeatedly promoted until he took it over in 2021. Waymo is Google's most successful moonshot and now provides over 500,000 fully autonomous rides each week. Cheers, by the way.
B
Yeah, cheers.
A
You grew up in Russia, right?
B
Yes, I grew up in Russia, yeah. Then I was actually Soviet Union.
A
Right, right, exactly.
B
My dad is a physicist. So yeah, the Soviet Union started falling apart and then he had a position, a visiting position in university in Kyoto University for a year. We moved there as a family and then he went to Berkeley and I kind of tagged along and then I ran out of. I graduated from high school. I was thinking about the next thing I wanted to do and I really like that technical school in Russia.
A
The Russians are serious about their physics.
B
They are, they are. So I went back to Russia and I got my bachelor's and master's.
A
In what year was this that you went back to Russia?
B
1994.
A
Okay, so that was kind of almost peak Russian optimism in a sense where he was opening eyes.
B
It was. Yeah, yeah. No, I actually remember talking to my mom about it. And of course my parents grew up in the Soviet Union. They've seen it. They were born right before the war and then they saw. They lived through some really tough times. And I remember talking to my mom and saying, in fact, I got my green card here in the US before I went back and she insisted that I do it. I was actually, at the time, I wasn't thinking of coming back, but I know I was pretty excited about where Russia is and the trajectory it's on. And, you know, being young and naive, I was like, there's no turning back.
A
And so why did you decide to come back? There's more of a play by play
B
than, yeah, no school. It was pretty clear to me like I wanted to continue studying math and computer science. And while the undergrad and master's that I got in physics and applied math, that I think was still an incredibly strong kind of foundational school of Russian math and science graduate school. It was very clear to me that the best way to do it was in the us So I came back.
A
I'm struck by the founders of the two most valuable UK companies are Russian math nerds who both went to the same school. Nikolai at Revolut and Alex Gurko at xtx. But yeah, it's a strong diaspora.
B
There's a company not far from here, where one of the founders also has, you know, similar pedigree. A company that we're closely related to.
A
Exactly. You know, the classic engineering interview question of, you know, what happens when I type google.com and hit enter as, you know, talk me through, you know, whatever you like, you know, HTTP and DNS and you know, bg. You can go down to whatever level of stack you want. Do you want to maybe just describe when I take a ride in away mode today, what's happening at a technical level? Like what is the architecture?
B
Let me answer your question. What's happening in real time? But this is going to be only a part of the story because we're going to be talking about kind of the inference, the real time inference part of it. And if we want to have a deeper, richer technical conversation, it. I think it would be interesting also to zoom out and talk about the entire ecosystem of what goes into building, evaluating and deploying the Waymo driver. But when you're driving around or being driven around, we think about what we're building as a driver. Obviously it's not a car. So it has a number of sensors that are positioned around the vehicle. We use three different sensing modalities. There's cameras, there's lighters or lasers and radars. Those are the primary ones. They're also microphones, directional microphone arrays, but those are the primary three for sensing the world. They all have very nicely complementary physical properties. They all have 360 degree coverage around the vehicle. So the Waymo driver sees 360 all the time. So all of the data goes into a computer, you would expect. And they're the software that process. Now it's all AI, specialized AI in the physical world. So it processes the sensor data. Nowadays talk about it using AI terminology as encoders that take this data in. And then there's the kind of the decoder, the action, the generative part, if you will, in the car. And the generative task there is to figure out how to drive. And that is of course connected through kind of a specialized interface to the car where we can actuate the vehicle. And that's why you see the steering wheel turn and it drives you around.
A
Okay, so I get into my car, there's three main families of sensors, lidar, radar and cameras. And then it is using that to first build a model of what's going on in the world, where are all the other cars and things like that. And then, as you say, make decisions and then actuate that with the car. That is the, the system that you're living in. And is all that inference done locally or presumably yes. Nothing's in the cloud. Nothing real time.
B
Nothing real time in the cloud. And there are some things that can happen in the cloud, but they're not required.
A
Got it. What's an example of a nice to have that happens in the cloud?
B
You can imagine a situation where we do some of it is not directly related to the task of driving, but say after you leave the car, we want to check that the car is not dirty, you didn't leave anything there. If you did leave an item. Well, if you left in a mess, then I want to send the car to one of our depots, get it cleaned up. If you left an item there, maybe on your phone or we want to detect that and then send it to our lost and phone and let you know. So that we do with asking a model that actually lives off board as opposed to having to put it on the car. Right. Because it's not a real time task related to the driving. So that's one example of something that
A
there are all these debates that go on on Twitter around self driving. So I can think of end to end versus the more kind of modular approach. There's cameras only versus array of sensors and I can't tell. Are these debates actually interesting to an expert in the field or do you think these are just settled matters and they're just grist for the algorithm?
B
I understand where the questions are coming from. I do find that often the way they're posed and the way the debate happens is losing a lot of the nuance and a lot of detail that really matters are to me, the most interesting technical questions are in that level. Because the way we think about building the Waymo driver, it starts with a large off board foundation model. I can imagine building a big model that understands how the physical world works and understands the important properties of what it means to drive, the social aspects of driving, and what it means to be a good driver as opposed to a bad one. So that's the foundation. Then we specialize it into. What are we calling it? Three main offboard teachers. There are still large, high capacity offboard models. There's the Waymo driver, there is the simulator, and then there's the critic. Right. And those then get distilled into smaller models that you can run inference on faster. So the Waymo driver becomes the backbone, the male backbone of what's in the car. The simulator, of course, is what powers our synthetic generative environment that can run on the cloud for training. And for evaluation in close load system. And the critic, does the simulator ever run locally?
A
No.
B
No, it doesn't.
A
Yeah.
B
However, what I think is interesting in a way, the way the decoder works, the way the model works, if you think about the generative task in the simulator of kind of creating those realistic worlds and how other people behave, how cars, pedestrians, cyclists, in order, and the task that you have to solve on the car in real time, there is this fundamental shared capability of understanding how these objects relate to each other and predicting what they might do in the future. If you are running on the car and then generating those, you know, some sampling those probabilistic behaviors in the simulator. So it's different model, but there is, you know, this is why the shared foundation model is able to power both.
A
Got it.
B
And similarly, if you think about the critic, like the job of the critic is to find interesting events and then be opinionated about what's good behavior and what's bad behavior. Similar fundamental understanding. Right. If you're running inference on the car, you still have to figure out which of the multiple hypotheses of these future worlds you want to take action to steer towards.
A
Okay. And these are all downstream of the same foundation model.
B
That's right. So start with the foundation model.
A
Yep.
B
Then you specialize in fine tune, still have board model. Those are the teachers. And then you distill. Each one of the teachers kind of distill trains its own student.
A
Yes.
B
The driver, the simulator, the critic.
A
Yes. You started working on self driving 20 years ago. As you think about the tech evolution, is this just a scaling laws story where we had to be able to throw enough compute at us? Were there architectural approaches we needed to wait to have be invented? Was it just a story of we needed 20 years of going down the wrong cul de sacs before we eventually arrived at the right approach? Could you, knowing what you know now, could you have a successful Waymo in market in 2015 or was there some enabling technology?
B
No. Technology breakthroughs that happened over the years were critically important, primarily in AI, but also in other areas like compute, heavy compute that you now. I wouldn't characterize it as like going a thousand different dead ends and then having to retract and then finding the one right path. I would characterize it as iterative learning and evolution. And then transformers came around. But transformers, for example, are very General. Architecture powers LLMs, powers our models. But how you apply them to that space, I think this is where it
A
didn't just fall out of transformers.
B
Exactly. Right. And of course people like to talk about architectures, but architecture is important. But really a lot of it comes down primarily to your metrics, to your evaluation mechanisms, to all of the training recipes, and of course, new data.
A
Yes, LLMs are good as text or tokens specifically, and obviously perform best at domains that have some kind of single corpus of text. They can work on, like coding, where it's very helpful that everything was just kind of textual already. And part of the success has been creating textual representations for domains such that we can then put LLMs against them. Can you describe how you encode the world that you're seeing? I mean, are you just building a 3D bitmap, essentially?
B
So this is where I think we get a bit into. This question of what is the interface between the encoder and the decoder parts? And I think that touches also on the thing you flagged earlier, where people like to debate end to end or not end to end. So the way let's make talk a little bit about end to end and then get back to what is the interface between those two? So when we say end to end, what do we mean? We mean that it is some large ML model. Typically you don't build them monolithically. You have different parts and different subgraphs. But what's important is that you can propagate back props, gradient and the loss function all through the different layers. So every layer you can learn the weights and the representations that matter for the final task. You don't force it through some narrow funnel between, let's say, the encoder and the decoder.
A
Yeah, I think of a simple view of end to end being pixels go in and car actions come out, which may be a bit of an oversimplification, but.
B
Yeah, yeah, that's exactly right. And this is kind of the basic vanilla version of it. Right. If you think about what will it take to build the driver that's capable of fully autonomous operations, you think about this entire ecosystem of the driver, the simulator, the critic. If that's all you do, pixels in, trajectories out, it becomes very difficult to do all of those three and achieve the high level of safety and performance that we require. And it becomes very difficult to kind of do it at scale. However, if that's kind of a very easy way to get started, Right. You collect some data, kind of like analogy to the LLM world. Right. The easiest thing you can do is have pick a model. The easiest way to get started nowadays would be just take a vlm. It already has a language Aligned camera, encoder, and then it has a decoder that can predict, generate text. You can fine tune it and say, hey, instead of text, generate trajectories. Very, very doable. In fact, a little while ago we published a paper called EMMA that did exactly that. It will actually in the nominal case, drive pretty darn well, which is mind blowingly impressive.
A
That is very funny. Yeah.
B
And I mean, there's something to it.
A
You're saying you can take an off the shelf model which has nothing to do with driving to start with, and you'll get these good results.
B
That's right. You get in the normal case. Yeah. I just want to be clear. It's orders of magnitude away from what you need. Yeah.
A
You should not try it on the street, but it works. For example, it's like you're talking horse. It's impressive that it's talking, you know.
B
Exactly, exactly. And you can actually. If the product that you wanted to build was maybe a driver assist system, not a fully autonomous system, then maybe that's all you need to do. And then for that you don't need all this other machinery of the simulator and the critic because the number of nines is drastically lower. But this is interesting because there is some intuition behind why that works. If you think about the hard parts of driving, it's not unlike having a conversation, except if in the LLM world you're modeling language or maybe modeling a dialogue in the space of sentences and words. What makes driving hard is also this multi agent, social interactive part of it. If I do something that's going to affect you, it's going to affect somebody else. And the history matters. It's not local and just geometric. Context matters, semantics matters. But it's in a different, it's not in the language of words, in language kind of body language, if you will. Right. And we see that empirically validated if you do this approach. Okay, so then let's say we build this thing just cameras, camera, encoder, pixels go in, trajectory go out. The quality is sufficient to drive in the normal case. It's not sufficient to deal with the long tail of all the edge cases and hit the high bar of superhuman safety that we require. So then you start asking the question, what else do you need?
A
Yes.
B
And if all you did was kind of observing how other people drive when you trained the system, maybe observing just passively how people drive and how they interact, maybe also driving the car yourself and then using imitative learning to train it in mind that that's not enough. You have to do something in closed loop, you have to do things like rlft, which is also parallel to what we see.
A
Rlft.
B
Rlft. Reinforcement learning based fine tuning.
A
Okay.
B
Yeah.
A
Yes.
B
Similar to the reinforcement learning with human feedback in the LLM world. Right. You want to do maybe closed loop, proper closed loop driving, where you explore all kinds of different situations and then you give it a reward signal to kind of keep it in distribution. For that, then you need a realistic simulator. Right. You also, if you want to have a good RL system, you need to have an opinion for the reward function. This is where the critic comes in. Right. If you have a purely end to end system, let's look at the simulator. Now what do you do? You're then constrained to just go from pixels to traject. Right. That's all you can run the system on. Right. And it's a very high dimensional space, so it's a hard problem to generate everything. But even if you solve that, it just becomes incredibly inefficient to run it in the full way of pixels to trajectories and simulation for training or for evaluation. So this is when intermediate representations come in. There are some intermediate representations in the world in this task, in the physical world we know are correct. They're not sufficient, but they're not generality limiting.
A
Right.
B
You know, there's an object here, there's a concept of a road, there's signs, there's speed limits. So this is where augmenting that learned representation, those learned embeddings from the encoder decoder with that more structured representation is what we do. And we find that this kind of gives us additional knobs to simulate in that space. Just pixels to trajectories. It allows us to have additional safety validation layers in real time. And it also allows us, gives us additional mechanisms to specify the reward function for evaluation, the critic or for training. So this is again, we've gone kind of full circle of is it intent? Yes, it is, yes. But if you want to do it at scale for full autonomy, it's augmented with all of this other stuff that's
A
very interesting on the simulating point. It's just very hard to simulate for an end to end model because it's easier to deal in end to end, it's easier to deal in intermediate representations rather than coming up with a pixel perfect view of the world.
B
You need both.
A
Yeah.
B
So having end to end architecture that's augmented with that structure allows you to kind of play in both of those worlds.
A
Yeah, yeah, yeah. What are you looking to do as a self driving car? I mean it sounds funny, but I think people maybe don't realize that there are many different things that you're looking to solve for where you're looking to get the person to their destination, you're looking to get them there reasonably promptly, but also drive quite smoothly and also have many lines of safety and also not annoy other drivers and get honked at. And, and, and so what are some of the reward functions or kind of things you're optimizing for that maybe are not obvious to people?
B
So safety is the primary focus. But of course we also want to be a smooth driver so that for both people in the car and other actors and we also want to be a predictable, well behaved one so that it can nicely fit into the whole social ecosystem of our roadways.
A
It seems like one of the issues that has quickly emerged with self driving is the fact that people can't have nice things or not everyone is nice to the robots. And so whether you're driving through a dodgy area or getting blocked or maybe I'm not going to drop you off here, maybe I'm going to go around the block and drop you somewhere better. But all of these as you say, kind of other human issues, how do you go about solving those?
B
A lot of the ones that you mentioned are just things that we need to work on and understanding honestly that if we're not dropping you off exactly where you want it to be dropped off or we don't give you a good interface to tell us that's on us, gotta make it better.
A
It feels like the drop off is actually a pretty nuanced part of the self driving journey like the highway stuff and the 35 mile an hour roads that is all nailed, but there's just a lot of nuance in the drop off experience.
B
I'd say they're all hard. You picked freeways and you picked drop offs for different reasons. For drop offs, you're absolutely right. There are a few things that are maybe not obvious. You just think about this problem, but it's understanding where you want to go and making it as convenient as possible for you. And pickups from drop. It's not exactly symmetrical, but then also understanding the context of the situation. Where do you stop? You don't want to block a driveway, you don't want to double park, although in some cases where if it's a quick one, maybe it's okay. So there's a lot of nuance that goes into doing that well, so that it's smooth, less frictionless experience for the rider as well as other folks.
A
Yeah.
B
Freeways for most of the time, not much happens. They're very well structured because we design them that way. But there is still that long tail of really complicated stuff that happens where the consequences of bad event are much more severe. Right. The speed is much higher, Everything is quadratic in speed. But we see a lot of stuff. Imagine grills falling off of freeways. Imagine people getting into accidents and kind of spinning out of control. You see one of those flatbed trucks
A
with just a bunch of stuff piled in it and you're driving behind us. I know. I always find it very nerve wracking. Looks a bit.
B
I know, yeah. And we've seen them leave a trail.
A
Yes, yes. Yeah. Okay. So it's a different set of problems, but I feel like the general sentiment with Waymo is that the driving has mostly now been solved by you guys. And it's kind of a question of scaling up and maybe some super long tail stuff, really snowy conditions. Is that your sense internally or is there actually much more nuance to it than that?
B
I would say the. Yeah, it's not like we're done with engineering. I would say that we've clearly moved past the stage of scientific research and kind of deep core technology development to this new phase of accelerated global scaling and deployment. We still have work to do, but I don't see today any limitations or any gaps in the core technology.
A
The driving is good enough now.
B
Well, the core technology I think is good enough that I can't think of any aspect of driving that is not supported by the fundamental technology. Now that said, there is a lot of work to do in specialization and invalidation before we can deploy responsibly. We're not driving everywhere in the world. We are planning to start operating in London and in Tokyo this year. Do we have a driver that you are using today in San Francisco that we can just plop down in London and go, no. But what we're seeing is incredibly encouraging from the perspective of is the core technology there. So now it's a matter of collecting the data, doing some specialization and validation and you can use. Signs are different in both of those places. People drive on the other side of the road. But that's actually not that hard for computers. Right. And core technology generalizes really well, but there's still work that you have to do.
A
What generalizes least?
B
Well, increasingly we're finding, especially now that we're able to kind of hook the Waymo AI to the AI in the digital world and the VLMs and kind of inherit the general world knowledge from VLMs. We're seeing really strong results from like zero shot or few shot learning because of that general knowledge that we bring. But there are a few things like say cold weather, cold winter weather, where it affects the entire stack. Right. So it's not just the AI, but you actually have to. Hardware. Yeah, you need the hardware, you need to have the proper cleaning solution, heating elements in it. And then you think about things that are completely solvable by computers like motion control on slippery surfaces. Right. So that takes a bunch of work. You don't get that for free from just pulling it some VLM decoder.
A
Was it the case? I mean my impression, not knowing anything, is that in the early days there was maybe a lot of San Francisco specific work or Phoenix specific work in the early markets, whether it be mapping or something else. And that you guys seem to either have solved that in generalizing it or just scaled up your ability to do the city specific work. What enabled the kind of the rapid city expansion?
B
We usually think about it the capability of the wingman driver as well as deployment, not primarily and directly in that space of cities or zip codes. We think about the operating domain and then freeways, cold weather, freeways, cold weather, snow, rain, fog, density, et cetera, et cetera. And then that, that's what we are building, that's where we're evaluating. And then that maps to city like particular city, be within the operating domain or outside of it. Right. So where, if we provide history a little bit. Our initial deployment in where we started offering a fully autonomous commercial service for the first time was in 2020 in Chandler, Arizona. And that was on what we called the fourth generation of the Waymo driver. This was the, if you remember the Pacifica minivans with different hardware, different software there we were super focused on doing the whole thing end to end, learn how to build the driver, evaluate it, deploy regularly, operate it end to end 247 with customers, learn from the customers. And then we're very focused on that operating domain of mostly Chandler, which is a medium low complexity one. Then when we made the jump to the fifth generation of our system, this is what's on the IBIS today. We really wanted to take a huge bite out of that operating domain. And we collected data all over the United States, all different states, different cities. When we chose to deploy in the hardest parts of San Francisco, hardest parts of Phoenix, we made a big jump on the hardware side and most importantly on the software, the AI side. And I would say that was the big discontinuous jump. And that's what you're seeing now. After we've scaled up and iterated on all of the aspects of building and deploying driver, this is now why you're seeing us kind of go in parallel and scaling in the US.
A
So driver version 5 was just a much more generalizable stack than version 4. And what was it about it? Was it just that it had been trained on a much wider data set?
B
It was when we made this big bet on AI, there was a lot more little AI models and ML models In the fourth generation we made a much bigger bet and jump to kind of AI as the backbone for the fifth generation.
A
AI is the backbone as the core engine. As in you're saying that gen 4 had lots of small little AI subsystems for example.
B
And that's been. So we made that jump and we've been iterating and improving the model since then.
A
As we're seeing with Waymo rolling out widespread autonomy, it has second order changes on the entire system, in this case traffic patterns or other drivers behavior or eventually how cities are laid out. And autonomous systems are coming in many domains in commerce. Soon agents are going to be transacting without human intervention. We're basically getting driverless commerce. And Stripe is building the economic infrastructure for AI. And as part of that we're letting payments be initiated by humans or by agents. So if you want to sell to agents or if you want to let your agent spend money all around the web, check out Stripe's Agent E Commerce suite. Can we talk about hardware a second? So lots of hardware questions, but one is maybe everyone in this space has a very charismatic demo of a vehicle that is custom made for self driving. And so it's often the van with the no steering wheel seats facing in both directions. You guys have one. Tesla has the steering wheel less Cyber Cab Cruise had the cruise origin. And yet we're still driving in Jaguars that have a steering wheel in the front and are pretty similar to consumer cars. And it's interesting to me because if we were Talking about this 10 years ago, we might say, well yeah, developing a custom car that's relatively straightforward. We know how to put a bunch of sensors on a new car, but the software will take a long time. And what's interesting is we've made huge progress in the software, but interestingly the cars are still derivatives of cars that people are driving. And so I'm curious why you just think the custom Hardware has not happened as of 2026. It's obviously a small improvement compared to Waymo is the big improvement, but it's just interesting that it still hasn't happened.
B
Well, let's say our sixth generation of the vehicle and the driver is our version of that.
A
Oh, no, I know. It is OHI platform.
B
Right. So that is, you know, it still has the, you know, we can talk about, you know, whether you want to have the seats pointed backwards or not. Actually, I think it looks nice in the demo, but practically speaking, maybe not the way to go. But that is, it is a custom designed vehicle and it is. We put a lot of thought into moving away from a car that's designed around the driver.
A
Yes.
B
To a car that's designed around passenger. And it's much more spacious and it's happening. It's not open to the public yet, but I took a ride in it the other day fully autonomously. And that's coming this year.
A
Yes. How much better is it as a passenger experience?
B
You'll tell me once you give it a try. I love it. So it's all about the space and the convenience of ingress and egress and the screens and the interface of the passenger. So we put a lot of thought into every aspect of it. It has sliding doors, it's very easy to get in. It has a flat floor. It is, yeah. If you sit in the back, you can like fully stretch out and there's so much space there. And it looks, from the outside it looks fairly big. But the actual footprint of that is not bigger. Barely. Barely. Barely larger than the I pace. So it's kind of amazing that you walk in, it feels like you're in a living room.
A
Yes. I guess my question is just Waymo does 25 million rides a year. Run rate ish with the Jaguar I pace. And it's interesting that so much scaling has happened with self driving so far on the old retrofitter. Maybe that's to be expected.
B
Well, it matches the high. I don't think it's a given. You're right. But if you think about the value proposition, of course there is the safety of it. You don't have to worry about it. There's also the privacy being in the car by yourself, maybe with other folks, but not having to share the space with another human. Maybe a great product. Yeah. But I guess this is why we're seeing such consistency. Car drives well, very predictable. And you can go beyond that. Right. And specialize even more to make the experience even more magical. Around the rider. But I guess it would have been disappointing if without the Specialized car. And I think I would have been surprised if we leveled off at some other much lower level of customer adoption because a car seems like more of an optimization improvement. But the core of the value proposition comes from those other factors.
A
Yes, yes. I guess just take risk on one thing at a time. We'll start by doing the software layer and then we'll build the Specialized car or something like that. That's right, yeah, yeah.
B
It's also, I mean, as you said, it's a big investment. So you have to like you de risk the fundamentals.
A
Yes.
B
And you know, throughout our history we were very focused on setting the most, you know, the biggest goal for the company to de risk the most important questions.
A
Right.
B
We talked about the third generation where we wanted to deploy something and go end to end. We talked about what was the goal with the fourth generation, sorry, the fifth generation. And then there's the sixth generation. So as long as the sixth generation, where it made sense to go spend all this effort into the custom and
A
sixth generation is both a custom vehicle, is it also a new generation of the driving stack?
B
It is the new hardware, the sensors, the hardware, the self driving hardware they're putting on. The Ojai vehicle is the sixth generation. It is very different from the fifth generation. It is simpler, it is more capable, it is much lower cost. It's a fraction of the cost. It's comparable to what you would get like a fancy ADAS system nowadays. The driver assist system, the software is pretty much the same. So when we talk about generalizability of the Waymo driver, we talk about weather conditions, we talk about cities, but it also generalizes well to different vehicle platforms and different sensor configurations.
A
Okay, so Gen 6 is a new vehicle and a new sensor stack, but a similar. It's almost a TikTok cycle happening here. It's a similar software.
B
That's right.
A
That's right.
B
And then we're going to put the sixth generation Waymo driver on other vehicle platforms like the Hyundai Ioniq that's coming, you know, later in the year.
A
What is different about the 6th generation hardware stack and how did you make it cheaper?
B
Systole has the same three sensing modalities, but we've made significant optimizations in all three. So unification, simplification and there's just the kind of. Just writing the. Yeah.
A
Is it a classic case of manufacturing scale where we're not.
B
Well, scale hasn't fully come into place, but all of those, if you Think about the kind of the supply chains industries. Cameras is pretty mature. Radars way many years ago used to be bulky, complex, very expensive when we were putting them on planes. But then we started putting them on cars. Now you can get a decent automotive radar for tons of dollars. There is a variant of the automotive raider, it's called the imaging radar. It gives you a richer so that is also has come down in cost drastically but it's a little bit behind your standard automotive radars. Lidars are following the same very predictable, very well known trend. So we're riding that and we're also learning from the previous generation to just make improvements and simplifications and optimizations.
A
Sarah, very silly question. What are lidars versus radars better at in a self driving context? Lighter, Are they complementary?
B
They're very complimentary. Yeah, you know, it's all blasting, you know, Effectively like you know, blasting, you know, photons out there and then they bounce off of something, they come back. You know, you measure what comes back. The frequencies are very different. So laser gives you its very, very high resolution. So you can think of it as like a laser beam that goes out, spins around, it shoots out millions of these laser pulses per second and then each one comes back and you can, you're kind of sampling the 3D structure of the world with very high resolution
A
lidar for very fine grained mapping.
B
That's right. Radar has much lower resolution, but because of the physics of degrades much better in adverse weather conditions. So fog, snow, heavy rain.
A
So it's not gonna be occluded by particles between it and the target.
B
So imagine driving in super dense fog.
A
Yes.
B
We're close to San Francisco, so probably don't have to think that hard. It can be really hard to see. So cameras degrade laser depending on kind of the size of the particulates can degrade better or worse than camera. Radar is not well affected. So you can imagine driving on a freeway. Then radar will give you really good returns for cars that are absolutely invisible in the camera space.
A
That's interesting. So does that mean there are some environments where you'll be relying significantly more on radar but the performance is good enough?
B
Well, it's a combination of the sensors. Right. So we rely on each one is noisy. Right. How the noise characteristics show up in different environments is different. But it is, I mean it's not like we switch from one to another. It's not like we estimate what's happening with the world through cameras and through radars and through lighter. And then we Compare. No, they're like, there's an encoder for camera, there's an encoder for lighter, there's a Commodore. And they all go into the system that gives you jointly the best view of what's happening in the world. So if you're, you know, if it's a nice bright, sunny day, cameras are very valuable. If it's pitch dark or you have like sun in your face, or you're blinded by the headlights from oncoming car, then camera will degrade. There's still some noisy signal, but it will degrade.
A
Yes.
B
And LIDAR is completely unaffected. Right.
A
Are there technical problems that are your. Or you're just, you're still chasing or you are particularly interested in solving, even if they're kind of niche for the, you know, we just, we really want to have, you know, driving when it's actually snowing nailed or steep hills in San Francisco or, you know, are there problems you've been very interested in historically or still are?
B
I'm super excited right now about the accelerating global expansion, more cities in the United States and going internationally. So being. I don't understand, I'm not answering your question about the technology. I'll come back to that. But really, that's the thing that I'm today most excited about. Just getting to a place where any major metropolitan area, you can fly into the airport and then take awaymo and go anywhere you want to go. That is insanely exciting to me right now. So then technically, what that I'm most excited about is all of the rapid progress in AI and the world models, the foundational model work, and it is just such a massive boost to how much we can simplify the system, how much we can bring down the cost, and how we can scale globally. And there's some magic that happens that I don't think I would have anticipated a few years ago. So that I find from the technical perspective just insanely thrilling.
A
Yes. When you talk about the progress in AI, what are the most fun parts of it for you these days?
B
I think it's seeing the capability and the scaling laws from this approach of starting with that cornerstone of the foundational model and then specializing to T shirts and then distilling. You get such big wins in performance across the board. You invest something into the architecture or get better at data or training recipe, and then you invest at that early stage and then it just has massive amplification and ripple effects. So that is in some ways is kind of magical. And then I guess then you see it on the Car. And I've had some moments where car does something and you look at a log, and I've been surprised. Like, it does things that I didn't think it was capable of doing. So it's that. It's that when you see emergent behavior,
A
that's kind of a proud moment.
B
One example. Yeah. You know, it's, you know, when you build a system and then, you know, you think you understand, you know, how it works, and you understand fully, you know, the limits of its capability and performance. And then it does something, you know, kind of almost magical.
A
Yes.
B
It's exhilarating. Yes. One example I can give you, and I think I've shared some videos of that publicly in some talks, was this example where the situation happened in San Francisco. Fairly benign situation. We're at an intersection, our light is red. There's cross traffic. A bus goes by, and it stops partially blocking our light, turns green. So we start to go, we're nudging around the bus, and then you see a pedestrian being detected on the other side of the bus. Right. And then car responds appropriately. It slows down, goes a little bit wider, and then a pedestrian actually emerges from the bus and we go on our own way. So the first time I looked at that log, what's going on here? I know we have pretty darn good sensors, and the software is very capable. We don't see through stuff. Right. Okay. That's not how cameras or lidars and radars work. Right.
A
I saw the pedestrian through the bus.
B
You saw the pedestrian on the other side of the bus. And it's not like, you know, you look at the windows, you're like, okay, you know, radars. Shouldn't this massive metal box look at the sensor data? And, like, radar shouldn't be able to go through it. Right. You know, camera. Like, you can't see in the camera because, you know, there's reflections and there's people on the bus. So it's not like you can see through the windows. So, like, what is going on? Maybe it's, you know, noise or some coincidence. And first time I saw it, I couldn't actually believe it. I was like, no, no, there's something. Right. So what actually turned out was happening is that our peripheral lighters bounced under the bus, and there was just a little bit of very, very noisy reflection of the movement of the person's feet. That was enough for the AI models that, hey, likely there's a pedestrian there, and I'm going to detect it as such. And moreover, there's enough Data there to predict what they're going to do?
A
Yes.
B
And it just kind of blew my mind.
A
Is this the perfect example to explain what we were talking about earlier? The value of one fusion across a sensor suite. But then secondly building, I mean relatedly building an intermediate representation of what's going on where if you're just dealing with pixels, I mean the person behind the bus does not exist in pixel space. And so you need to have some representation of the world that exists to be able to reason about the person behind the bus.
B
I think it's an example where giving it kind of using that intermediate representation to boost the level of performance of all parts of the model is what's happening here. Just imagine solving this problem with a black box purely open loop imitative system. Is it impossible? No. In practice, what would it take to achieve that level of performance? Yes, very, very difficult.
A
What metrics can you share on just where the business is at today in terms of rides, revenues, cars on the roads.
B
We have about 3,000 cars on the roads. We're doing about half a million rides per week. That translates to about over 4 million fully autonomous miles per week. We are operating in a fully autonomous mode in 11 cities in the US and 10 of those we have riders, public riders.
A
What's the ghost city?
B
The ghost cities, Nashville, we just started there. So we just opened it up to riders in four new cities in one day. That was one of those little but super exciting moments where I thought back to the history. How long did it take us from the first time we started fully autonomous rider only operation to the first time we had external riders in four cities? That's about eight years. And then just the other week we just launched four in one day.
A
Yes, yes. It seems now clear that in 15 years most miles that are driven will be autonomous. Like there'll be some burning period and you know, there's lots of old cars on the road. Like I think it'll actually take a little while. And some of that will be by level 4, level 5 systems expanding in new cities and that expansion continuing. Some of it will be, you referenced existing driver assist systems and kind of getting up to level two and level three and existing systems across current car brands getting more and more capable. What do you think that working your way up from the lower levels versus working your way expanding from existing products like Waymo, what like convergence look like? Because we're going to eat it from both sides.
B
I don't believe we will. And I actually think this, that's a great answer. Cars will get smarter. There's going to be advances in driver assist systems and there is at the same time, from level four autonomy, there is simplification and the sensors of today are not going to be the sensors of tomorrow. So they'll be much more integrated, they'll be simpler, there'll be much lower cost. So from that perspective, they're going to. There is a path of convergence and there's also a path of convergence from the product lines, ride hailing and what you can take a ride through the Waymo app today. Eventually they'll be on your personal car. So that I see you talk about the technology and I see it just as fundamentally to different problems. There's driver assist systems and then there is full autonomy. And I think it's deceptive to think of them as kind of incremental on one spectrum of complexity.
A
Okay, but you think one cannot work one's way up from driver assist systems to full self driving. You think you have to start building a full self driving system.
B
You have to tackle. If I think about the hardest parts of building a fully autonomous rider only system, they are very different from what you do for a driver assist system. And of course some work in the space helps you. I don't want to say you can't make the jump, but it is a qualitative jump.
A
Yes. When can I buy a Waymo so that I don't need to wait for it? When I want to go, I can just like when I'm ready I can walk out the door and it's there.
B
I'm not going to give you a date today, but you're not the first person to bring this up as a.
A
That's my product request.
B
As a product request. Yeah, duly noted. Okay, I'll add it to the list.
A
Just, you know that waiting for the car, it'd be nice just in the garage there and keep your stuff in it and everything. It's not the first time you've heard that request.
B
So
A
how it seems to me operationally very intensive and very hard. Like a self driving car is actually not self driving. It takes a village. You have all of the human operator ready to step in. And you know there was that Thundering Herd incident that you guys talked about in San Francisco that kind of highlighted that for people. And then there's just like keeping the cars clean and keeping everything running in that regard. And so can you describe just what the operational infrastructure that sits behind Waymo looks like?
B
Sure. I will say that we are overall in all of those areas on a path of increasing efficiency and automation. So the number of manual steps that one had to do five years ago to launch a Waymo versus where we are today is drastically different. But nowadays, if you look at one of our depots as a fully automatically orchestrated dance of autonomous vehicles. So the way it looks, what it looks like today is cars will automatically go on there to pick up their riders, serve their trips. If for some reason they need to come back, maybe they're low on energy, maybe somebody left a mess in the car, they will automatically come to the depot if it is. So cleaning today is a manual process, so it'll get flagged in the car. We have fleet management systems say, hey, car number 378 needs cleaning. And we'll actually, on the sensor dome, we're able to display icons. So we'll show you a little emoji. And there's people whose job it is to clean the car, still come and clean it up. If that's cleaning is not required and it's just charging, we'll automatically pull into a charging sole and we'll say, hey, I need charging. We don't yet have automated charging. In the future, you can imagine that being fully automated. But a person will come in and plug in a cable. The car will charge and say, hey, now I'm ready to go. And it will get unplugged and the car will pull out of its parking stall and go on its merry way.
A
One of the new Porsches, I think it is, has inductive charging, just like your iPhone, where you just drive over the charging mat. I was amazed that that works at car scale. But presumably in the future they'll just be able to drive onto the charging mat. Or do you think just robotic plug in will be easier?
B
We'll see. Yeah, we'll see. I don't know. I think there's some questions about efficiency and how that plays into the overall cost and which one will be most cost beneficial remains to be seen.
A
I think how well behaved are the Waymo riding population in terms of not leaving a mess in the car.
B
We have wonderful riders, the most amazing customers in the world. Generally, I would say they are very good. I think there is something about. I talked about not having a person in the car. It's not somebody else's car. In some ways you kind of want to preserve the. I think generally people want to kind of preserve the nice aspects of it and kind of think of it as
A
it's so clean to begin with.
B
I know, yeah, it's kind of like I think that that's the general trend that we see. Right. And it's like because there's not somebody else's space, you're in it. It feels like it's your own. So you don't want to mess up your own space. I think I don't want to speculate too much on the psychology of the thing. However, I will say that it varies and you can imagine a college town on a Saturday night and that's a different distribution.
A
Yes. Yes. Will I be able to get a Waymo at any address that has USPS service in the us or will there be some head tail dynamic where Ketchikan, Alaska is just never worth it?
B
Eventually it will. Absolutely. There's no doubt in my mind. I think it's just a matter of when and what modality would make the most commercial sense. Is it rideshare versus privately owned for ride? It's not a technical problem. Technology is solved. But then if you're in the middle of nowhere and there's just not enough density of the trips, does it make sense for the ride hailing service that WIM was running to have cars on standby? Probably not. They can be deployed somewhere else and you probably don't want a horribly bad eta. And this is where a personally owned vehicle that is equipped with the Waymo driver is maybe how you will see it materialized.
A
Relatedly, what will the second order effects of say, majority autonomous traffic be? It feels like a lot of things will work better where as you say, when someone merges into a lane very poorly and everyone all the way back has to stand on the brakes, that's antisocial behavior. And so it feels like higher quality and more pro social driving will just, I mean, basically reduce traffic a little bit, even for the same number of cars on the road. But presumably there'll be other second order effects like we'll want higher throughput traffic lights and. Yeah, how else will things change?
B
So the first thing I think that you mentioned is that's a huge deal. I just need to think about traffic jams. Yep. And what's that saying? The Navy SEALs slowest smooth and smoothest fast. That's what like traffic jams are like. You accelerate abruptly, then you come to a stop and sometimes you have a traffic jam. Like what happened? Well, you know, an old lady crossed the road three hours ago and we still have the standing wave there. Right. So if everybody was kind of a smooth, predictable driver and a consistent driver and you would still have those traffic jams at the time of. But then the time constant to clean it out I think would be very different. But longer term, things like parking lots. Right? Now, if you look at what is our most interesting pieces of land allocated to it's parking lots, it's garages. And why is that? Well, because again, your car is just sitting there 90% of the time. Right. If more cars become fully autonomous, then there's no need of that. Right. And then imagine, just imagine what you can do with your favorite city in the world if you don't have to spend that money, that huge fraction of it, on just keeping these chunks of metal sitting around.
A
Yeah. I don't think people often realize how big a deal parking minimums are for the layout of the urban landscape. The coffee shop near where I am would like to have outdoor seating, but can't because it would reclaim parking spots.
B
Yeah, wouldn't it be wonderful?
A
I only have a few more questions, but I'm curious to talk about Google's relationship with self driving, where again, it feels like right now Waymo is, aside from everything else, AI related, kind of the most exciting thing happening at Google. But it was a very long journey to get here. I mean, I feel like you could say that Google almost started working on it too early because you were saying there's been a bunch of recent enabling technologies and so did it require Google starting when it did so early? Or could one have spun up this project in 2015, 2020 and then how did Google keep the faith when it almost felt like it was perennially two years away?
B
Yeah, no, on the latter part I just have to give credit and huge kudos and gratitude to Larry and Sergey and Alphabet Leadership Center Company. It is part of the culture and the DNA of the company is to have that vision and have the stamina and conviction to go the distance. So to the other part of the question, Was it too early? I don't know. I think what we've been seeing, clearly all of the breakthroughs that we've seen over the years have changed how we're building the system. But the complexity of the problem is such that you need to go through these iterate cycles. Right. It's not still. And we've seen many waves of technology. There's breakthroughs in 2013, ImageNet came around and there's this narrative, okay, that is the right time to start a BSL driving company. Then Transformers came around and VLMs and all of those are super powerful and you have applications in other spaces like in the AI and the digital world. They certainly have an impact on our AI and the physical world. But there are no silver bullets. They drastically reshape that early part of the curve. It's always been the nature of this problem. It's very easy to get started. It's deceptively easy to get started. But it is super hard to go the full distance and get edge case domain. It's the number of knives that you have to. There's the standard engineering rule of thumb that every next nine takes 10x more. So maybe there is a more optimal path but I don't see there's some magical moment where the true complexity of the problem goes away and then you can just take some off the shelf components and you're a business. If that were the case, then I think the industry would look very different today. Yeah.
A
Yeah. Last question.
B
I have.
A
You've been promoted a lot at Google. It feels like Google really recognized your talents. Just what do you think Google does like Google does? Famously one of the very best in the world at technical talent and say the current AI wave more broadly happening is either stuff happening at Google or generally Google alumni. But just what have you observed firsthand from how Google does this so well?
B
Yeah, I would say Google, that culture of Google of not accepting the status quo, having a big vision and investing in technical talent, the people who can go the distance and realize the vision that is part of the culture. I think this is what you're seeing and with the breakthroughs in AI in the digital world and all of the early investments in Transformers and other fundamental technologies, quantum computing and I guess we're not unlike those efforts as well.
A
Good measure. Thank you.
B
Thank you.
Date: March 24, 2026
Guest: Dmitri Dolgov, co-CEO of Waymo
Host: John Collison (Stripe)
In this engaging conversation, John Collison sits down with Dmitri Dolgov, co-CEO of Waymo, at a pivotal moment for autonomous vehicles. They cover Waymo’s evolution over two decades from moonshot project to large-scale deployment, the technical breakthroughs that powered this journey, the nuanced reality of operating self-driving fleets in diverse cities, and the interplay between software, hardware, and human expectations in making autonomy a reality.
Dolgov brings deep technical and business perspective, peppered with war stories and the hard-learned nuances behind massive AI-driven systems. The pair discuss the technical backbone of Waymo’s “driver,” the arc of foundational models and sensor architecture, as well as operational challenges and societal impacts.
“What makes driving hard is also this multi-agent, social, interactive part of it...context matters, semantics matters...in the language of body language, if you will.”
— Dmitri Dolgov (15:05)
“We're no longer in the phase of core science research, it's about accelerated global scaling and deployment now.”
— Dmitri Dolgov (23:44)
“Our peripheral lidars bounced under the bus—very, very noisy reflection of…feet. That was enough for the AI models…blew my mind.”
— Dmitri Dolgov (44:31)
“You have to tackle…the hardest parts of building a…system…a qualitative jump.”
— Dmitri Dolgov (49:54)
“If more cars become fully autonomous, then there's no need [for parking lots]…imagine what you can do with your favorite city if you don't have to keep these chunks of metal sitting around.”
— Dmitri Dolgov (57:42)
“It is part of the culture and the DNA [at Google]…to have that vision and stamina and conviction to go the distance.”
— Dmitri Dolgov (58:47)
In the span of an hour, Dmitri Dolgov convincingly paints self-driving not as a solved "problem" for science, but as an ongoing, nuanced systems engineering and scaling challenge. With the foundational AI breakthroughs now generalized, Waymo shifts toward relentless expansion—urban by urban, edge case by edge case—on the road to making autonomy commonplace and reshaping our cities.
Waymo’s story is both a lesson in the patience of moonshots and proof that real-world AI progress requires vision, iteration, and relentless pursuit of “the next nine.”