Loading summary
A
Welcome to the Practical AI Podcast where we break down the real world applications of artificial intelligence and how it's shaping the way we live, work and create. Our goal is to help make AI technology practical, productive and accessible to everyone. Whether you're a developer, business leader, or just curious about the tech behind the buzz, you're in the right place. Be sure to connect with us on LinkedIn X or Bluesky to stay up to date with episode drops behind the scenes and AI insights. You can learn more at Practical AI fm. Now onto the show.
B
Well, friends, when you're building and shipping AI products at scale, there's one constant complexity. Yes, you're wrangling models, data pipelines, deployment, infrastructure, and then someone says, let's turn this into a business. Cue the chaos. That's where Shopify steps in. Whether you're spinning up a storefront for your AI powered app or launching a brand around the tools you built, Shopify is the commerce platform trusted by millions of businesses and 10% of all US E commerce. From names like Mattel, Gymshark to founders just like you, with literally hundreds of ready to use templates, powerful built in marketing tools and AI that writes product descriptions for you, headlines, even polishes your product photography. Shopify doesn't just get you selling, it makes you look good doing it. And we love it. We use it here at Changelog. Check us out. Merch change james.com that's our storefront. And it handles the heavy lifting too. Payments, inventory, returns, shipping, even global logistics. It's like having an ops team built into your stack to help you sell. So if you're ready to sell, you are ready for Shopify. Sign up now for your $1 per month trial and start selling today at shopify.com PracticalAI Again, that is shopify.com PracticalAI.
C
Welcome to another episode of the Practical AI Podcast. This is Daniel Whitenack. I am CEO at Prediction Guard and I'm joined as always by my co host Chris Benson, who is a principal AI research engineer at Lockheed Martin. How you doing, Chris?
D
Doing great today, Daniel. Lots of, as always, lots of AI and autonomy to talk about. And you know what? We have Waymo to talk about as well.
C
We have Waymo to talk about. Yeah. Speaking of Waymo, we're very excited to welcome back Drago Engelov, who is the vice president and head of the AI Foundations team at Waymo. Welcome, Drago.
E
Thank you guys. It's great to be back after five years or so, right?
C
After five years. Yeah. We were commenting before we started the recording that that the last episode with Drago was on September 1st of 2020. So that was episode 103. So a few things have changed in the world generally, but certainly in relation to AI, I'm wondering if you could maybe just catch us up at a high level. Drago, in terms of driverless cars, autonomous vehicles, how do you see the world differently now than you did in 2020?
E
So one thing I would say is In October of 2020, we opened our Waymo One service in Phoenix, East Valley, to everybody. So just one month after we talked. But since then we have launched and scaled quite dramatically in now five major metros. And this is San Francisco, Los Angeles, Phoenix, Atlanta and Austin. And we are also serving hundreds of thousands of rides a week to paying customers. We are expanding, we announced expansion to at least half a dozen or no more cities that will be going on through next year, and we may announce yet more in the cities we were at. We continue reporting the safety performance of our autonomous driver. And we are over 100 million autonomous miles driven on the road at this point. So it's fairly statistically significant. And in those miles, our safety study at close to 100 million miles showed that we are five times less likely to get into accidents with critical injuries and over 10 times, I think, 12, potentially less likely to get into collisions or injured pedestrians. So that has been happening and we are on to doing more and more right now. I think we work on improving the driver further. We have sixth generation vehicle coming up. We have started partnering with different companies. For example, we're partnering with Uber in Austin, Atlanta, so our vehicles show up on their app in those cities. We have partnered actually with Lyft in Nashville, if I believe, and we partnered with DoorDash to explore delivery. So we are exploring and expanding the scope and the partnerships that we are doing as well. But I think in 25, I would say a lot more people have had and continue having the opportunity to try Waymo. I'm quite a convert myself. To me, probably the aha moment, the big aha moment was in 22 when I got riding in San Francisco by myself, fully autonomously. And so since then, it took some time for more people to get exposed. But now I think the phenomenon is out there. And I think also the autonomous vehicle industry went through like cycles. There was certainly around 22, 23 time of pessimism in autonomous vehicles. But I think through our success through generative AI and I think there's other companies now, it's again a very lively space. There is others that are also trying to push what's possible with autonomous driving and robotics. So it's again, very, very happening place. And yeah, we are contributing probably I would like to think the most advanced version of an embodied physical AI today that you can do without.
D
That's fantastic. I got to say, as a native Atlantan, I'm so happy that you guys are in my city. And we're a very, very car centric city as well. You know, you really have to have a vehicle to get around. And I noticed, you know, as you were naming the cities that you guys are in, that tended to be the case in terms of Variado. Does that play into any of the, that you guys think about testing in terms of being, you know, like Atlanta? Traffic for its size is notoriously bad. And I would love to see ever more wayos and other autonomous vehicles here because I am terrified of all the drivers around me with our daily collage of, of traffic accidents and stuff like that. So I keep telling everyone, just wait, autonomous vehicles are coming. I'm kind of curious how you pick these different testing cities that you guys engage in and what are some of the things that you're testing for that maybe those locations are particularly apt for helping out on.
E
So, I mean, it's a bit a combination of both technical and business reasons. I think we are trying to do large metros where, you know, autonomous vehicles can be a big market and help a lot of people. So that's one. And also we've intentionally been growing our odd, so to speak, operational design domain. Our first service, Waymo one in Phoenix, East Valley, Chandler, that's maybe a bit suburban with up to 45 hours arterial roads. And we learned to master it and then went to San Francisco. There is dense urban with fog and some rain and hills and windy roads and narrow roads and, you know, tons of pedestrians downtown. So we dealt with that and then we started expanding. I think some of this is Atlanta is a big city also different state. There's some differences across the various states and both how people instrument the roadway and how people drive. Right. So we're spreading geographically more and more. I think also we're spreading to other domains. A few that are really top of mind is highways. We have been working on highways for a long time. We've gotten to a certain point with highways generally to have a good taxi service, you need highways. And turns out that's a very fascinating, interesting problem. They're difficult because whenever you move at high enough speeds, like 65 miles an hour or so, the consequences of any mistake are really high and many things can happen and so it pushes your robustness and safety capability there. So we've been doing highways, but one thing I did do is I rode. Now we can give highway rides to employees and I rode one to Millbrae station to get to the airport and it's fantastic. So hope to be able to bring it in the future to more and more people. I think that will make the service a lot more useful. Also we announced that we will drive in other cities that have snow. So potentially even in 26. Right. So our sixth generation platform is designed after the Jaguar. Right. It's a zeekr vehicle and geely and that zeekr is designed with a hardware suite to be able to handle snow. And we are also heading out to other countries. So we announced that we intend to launch driverless capabilities in London next year. And London is left side driving city and so is Tokyo where we currently have vehicles and we're testing. Right. So you can see we're trying to cover little by little the operational design domain of most large metros with all of their properties. We're of course also in Texas, that's its own unique state, but we started with more southern states, large metros. So you don't have to worry about snow at least. Like you want to tackle these challenges in some order, not just try to do everything at the same time. It's very difficult to validate your ability to do well in everything all at the same time. Right. So we're just taking, we're kind of mixing what is. What makes business sense with actually expanding the capabilities to become truly global driver.
C
And you started, you mentioned the driver, the car. I'm wondering if for those out there, those listening, which this is kind of maybe hard to do just from an audio standpoint, but if you kind of imagine the driverless car as a system in 2025, how would you kind of describe that architecture, that system? What are the kind of main components like I imagine, you know, the sensors, the actual car, the compute, like how what does that system look like in 2025? Just at a high level. And then of course, I'm sure we'll get into some of the, the modeling things and foundation models and all of those things.
E
But I mean, the car is, you know, ultimately it's a robot on wheels. Right. The main distinguishing capabilities are that it has a set of sensors, in our case, camera, lidar, radar and microphones. Microphones are quite helpful for many things including listening to sirens. Right. And occasional instructions. Then you have compute on the car. It's a non trivial amount of compute. It's more than you can put in a phone. Right. And all our vehicles are electric. That was an explicit choice of the company. I personally quite proud of this choice. I think that's good for the environment to actually have such cars and I think can accelerate, I think transition to more electric vehicles, which I think is good personally. And so they have this robot on wheels with computer, 10 sensors and then you have actuators. Right. And then there is a lot of system design engineering to make sure, you know, steering and brakes and all these things. They need redundancy and robustness to make sure that if any system goes wrong or we need to think also if compute parts of it can go down that you have contingencies. It needs to be designed with redundancy. You need to think of what if you know, steering wheel column like the. There can be also issues with steering. What is the redundancy? So for autonomous vehicle you need to think additionally and build these things into the hardware. So it's a, it's a robot designed for safe transportation from the ground up. Even though we're built using, we're just extending existing platforms and we work with the various automakers to do this as you're doing this.
D
And you guys have progressed over these five years since we last talked. One of the challenges is probably not every person out there is a Chris or a Daniel who's, who's very invested in this kind of technology. You know, going forward you have a lot of people out there. Here in the south, we joke that, you know, every, every other driver thinks they're a NASCAR driver and stuff and, and that that notion of control and safety and, and you know, the general population may not have as much confidence in some of these technologies because they're not following it closely and living it the way you do all the time. How do you, how do you approach that and how has that changed over these last five years since we talked to you last in terms of getting buy in from the public and getting them feeling, you know, like it's as you talk about the safety statistics, which are amazing, but getting them to really feel that deep down, you know, inside that they can, that they know they can trust and believe in this mode and that it is in fact much, much safer than what they are typically doing on a day to day basis.
E
So there is, you know, people do not feel statistics. It's hard, right. Because they're product of many, many rides. You're doing 10 or even 100 safely is not enough. I think what people feel is when they get into the vehicles and this worked for me in my HA moment, even though even before. And also my wife and friends of mine, people get comfortable really, really fast. You need to pass a certain bar where they feel, okay, this thing actually is a really, really good driver. My mother in law sat in it just a few weeks ago for the first time and she, she throws around, she's like, this car drives much better than me, right? And once she thinks this way she's, she's immediately at ease, I think. And I think people relax after the first several minutes are very exciting. And then they relax and enjoy the experience and mind whatever they like to mind. Either the environment or their phone or other things. People get really used to it. If you cross this threshold of can I trust you? I think your driving immediately shows this. Now us in the industry also understand that coming back to statistics, you need to back it up with regards to backing up Waymo. We believe in transparency and we're quite open with the incidents that happen. We file the details and we also track the statistics and do our best estimate. We have a great safety team, they publish these reports. In them we evaluate and try to estimate how are we doing compared to a fleet of human taxi drivers or human drivers driving in our area that we are handling. And this is both by us, but also there are studies done by insurance companies who of course want to quantify this very well. And so there's a Swiss REST study also proving our numbers. They also believe we significantly can decrease claims of different kinds for injuries, for accidents and so on as well. So that's another external validation for the kind of thing we provide. So that's what I would say to people now. You know, it's a process. You need to work with the local communities, you need to work with police, you need to work with, you know, the various city stewards, officers, we train a lot of people, we engage with them, we work overtime. I think you can see that in the cities we have been over time, I believe generally the trust in us increases and I think that the satisfaction of Waymos by the users, if you look at the apps like in the stores, so I think on the App Store we had a five star rating, right. So there is a lot of, a bit of almost like people that would just use Waymos now if they could. And that's a testament to the value that people see in the rides. But it drives of course to safety and ultimately engaging these people Getting them comfortable. Often when people experience us, many of them become converts. So I encourage people try it. You may be the next convert if you have not yet. I personally love it. I take it as much as I can, and it's always a pleasure working on a product you enjoy yourself. So I feel blessed that way. Foreign.
B
It is time to let go of the old way of exploring your data. It's holding you back, but what exactly is the old way? Well, I'm here with Mark Dupuis, co founder and CEO of fabi, a collaborative analytics platform designed to help data explorers like yourself. So Mark, tell me about this old way.
F
So the old way, Adam, if you're a product manager or a founder and you're trying to get insights from your data, you're wrestling with your postgres instance or Snowflake or your spreadsheets. Or if you are and you don't maybe even have the support of a data analyst or data scientist to help you with that word. Or if you are, for example, a data scientist or engineer or analyst, you're wrestling with a bunch of different tools, local Jupyter, notebooks, Google Colab, or even your legacy BI to try to build these dashboards that someone may or may not go and look at. And in this new way that we're building at Babby, we are creating this all in one environment where product managers and founders can very quickly go and explore data regardless of where it is, right? So it can be in a spreadsheet, it can be an airtable, it can be in postgres Snowflake. Really easy to do everything from an ad hoc analysis to much more advanced analysis if again you're more experienced. So with Python built in, Python built in right there in our AI assistant, you can move very quickly through advanced analysis. And a really cool part is that you can go from ad hoc analysis and data science to publishing these as interactive data apps and dashboards, or better yet, at delivering insights as automated workflows to meet your stakeholders where they are in say, Slack or email or spreadsheets. So if this is something that you're experiencing, if you're a founder or product manager trying to get more from from your data or for your data team today, and you're just underwater and feel like you're wrestling with your legacy BI tools and notebooks, come check out the new way and come try out fabi.
E
There you go.
B
Well, friends, if you're trying to get more insights from your data, stop wrestling with it, start exploring it the new way. With Fabi Learn more and get started for free at Fabi AI. That's F A B I A I again, fabby AI.
C
Well, Drago, I, I understand that every driverless car company is going to have a different, you know, approach to modeling and all of those sorts of things. You've talked a little bit about the, the hardware and the car, but just I, I think it would be good for people to understand. We talk about this driver or you, you mentioned the, the driver people might have in their mind. Because we do talk a lot about models now after the generative AI boom, that there's this model that can reason and blah, blah, blah. And so people might have this view of like there is a model that drives the car. Could you help us really break down like in 2025? Is this a system of models, models that do different things, a kind of combination of different types of models and even non AI pieces? Could you just help us kind of generally understand how that works?
E
So when you think of the stack, right, let's talk first about what it needs to do. It needs to perceive the environment using the sensors. It needs to build some representation of this environment. It needs to use this representation of the environment to make a set of decisions. And so traditionally, I mean, autonomous vehicles are around a long time. WAMO is around over 15 years already, right? So it's a rapidly developing technology space. But traditionally you can think of there's this. Historically people thought, okay, there are these models, there's a perception model that builds a representation of the world that can be useful for certain things. And then there is some kind of behavior prediction and planning module that reasons what we could do. And potentially some people like to also reason what others could do to cross reference our behavior with the other folks and then based on all this information, eventually select promising decisions. So that's what the stack normally does. Now there's different ways to implement it. Generally the trend has been to have few and in some cases people claim they have one. Large models, AI models on the car. And you can say ML or AI. For a while it was called ML. When the models became big enough, people called it AI, right? So you have these large AI models on the car, a few or one depending on the various companies, and they're connected in certain ways. You can train them end to end or not. That's also an option different companies can choose. The two are orthogonal concepts where they have modules and where they can train them. End to end is different concepts, right? So it can be structured end to end. So essentially have models Entering end to end. These are two and so different companies on this very core's taxonomy falls somewhere in this bucket. Right. And I think Waymo, I mean always has used AI or ML since I've been there and it's been the backbone of our tech. I think over time our models have streamlined and become fewer and fewer. I can say that I think offboard, what my team does is build this large foundation models for way more that are not limited by how much compute or latency constraints you have. And they can be quite helpful to essentially curate data or teach the models that actually run on the car in the simulator. We can get to simulators later. So we have experience with most aspects of these options, whether it's end to end and whether they're structured or not. Right. I think offboard, I can definitely tell we've explored a lot with large vision language models. That's one of the latest technologies that's relevant to us. I think in the field of robotics people talk also about vision language action models, because you can tie in one model both understanding vision and language inputs and potentially ask for certain action as outputs, which is ultimately what the robot needs to generate. So that's an exciting area that has developed in 25. I think in our foundation model, Waymo foundation model, we combine like benefits of these vision language models, but also combine it with some bespoke Waymo architecture innovations. I think in areas such as fusing these new modalities that vision language models typically are not trained on, like LIDAR and radar is one. Another one is modeling the evolution, future potential evolutions of the world. There is some interesting way more technology and how to do this well that we also use. But we fuse all of this and VLM technology, world knowledge from also other basis, whether it's a world model or vision language model, into something that then is able to do well on autonomous driving tasks. So that's offboard. On board. We don't typically talk exactly what is there, but I think we're trying to get state of the art the best architectures that we believe solve the problem and put them together on the car. I think it's a really, really high bar to have a model perform in all the conditions and all the situations we need it to. Right. And so we also have some notion of, as you know, VLMs also have this weakness of hallucination. So we have the safety harness around them to prevent facilitation, to double check what they are predicting. Right. So we also have that aspect in our stack as well. Which we have worked on historically. So that's what I can say on a high level. I hope that's not too scattered. Maybe you guys, if you want anything specific, we can discuss that in a little bit more detail.
D
So I do have a follow up to that. And recognizing that you're not able to get into the specifics of how, of this, of the architectural decisions and model decisions that Waymo is, is engaged in, if you could abstract it a little bit and maybe just talk about the space a little bit. You know, I'm curious as you talk about world models, you know, and having representation of the environment that brings in not only AI, but the notion of simulation as, as, you know, one of the tools in the tool chest, if you will. I suspect, like we have a lot of listeners that, that are hearing lots of different AI use cases in general, but may not have as much expertise and autonomy. And so as you talk about that, that notion of, of representation of that environment, could you talk a little bit about like what that problem looks like and what are different things that you might think of to solve it without having to get into how you guys have done it, but just kind of like what is that juxtaposition of simulation, AI and representation of the world in the environment around you look like?
E
So maybe, I mean, simulation, if we're going to go there, maybe I can just juxtapose two things there. I like saying this historically, I've been doing this for a while. There are two main problems in autonomy. One is to build this onboard driver, and another one is to test and validate this onboard driver. And both are really, really hard problems. And people usually talk about the first one. But I think imagine there is some collection of models and you need to prove that it's safe enough to put them out in the real world. That's in itself a really challenging problem. Arguably no simpler than putting the first model together. And that one ultimately, because you need to be a bit more exhaustive, potentially it takes even longer time to build the full recipe to validate things properly. These are the two problems now in autonomy. What is different maybe than the standard AI models is there's a few things. One is ultimately output actions that are commands to a robot that are a different type of data than traditionally say text and images. I think that's one. Another one is we operate under strict latency constraints. You need to react quickly for us. What is also interesting in AV is this is probably the first serious domain where we had to really learn how to interact with humans. In the same environment, it's highly interactive, multi agent setup. Then we have additionally, if you choose to add additional sensors and cameras, we have a lot more modalities coming in and we have a ton of data. So essentially the way to think of it is imagine you get maybe billions of sensor reading per second or even tens of billions, a lot. And you need to make decision, you need to have a context of many seconds of this sensor inputs, maybe a dozen cameras, half a dozen lidar and radar. And so you need to collect maybe five to 10 seconds, some can argue 20, 30 of context to make a decision. And the decision is fairly low dimensional. It's like okay steering or acceleration, but the inputs are incredibly bulky. And so you need to somehow learn the mapping from this extremely high dimensional space, representational space, to decisions. That's very hard, right? Under latency constraints, under safety, critical constraints. That's what makes our domain interesting. Now a lot of the things that work in machine learning in one domain transfer to the other. So yes, there is for example, very similar scaling law findings that if you have cutting edge architectures and you do proper studies and scaling and you have a lot more data and compute and you feed it to these architectures. And now for every class of algorithms, there's a bit different scaling laws. But even the simpler imitative algorithms that people also did in language predict next token, we can predict next action. There is these direct parallels. You can do reinforcement learning in language, we can do reinforcement learning in our simulator. Right? These are the parallels. But how exactly things translate is interesting. The ideas translate. The implementation is a little more creative than the usual just staying on the Internet because there is a bit of a domain jump to the real world. So that's interesting. The other part is compared to say language LLMs, you can actually we have a paper motion LLM from two or three years ago where the idea was, hey, why don't we talk tokenize motions to make them like language? And it's a very, turns out it's a very effective idea. Now it models that architecture which is very LLM inspired. It models future interactions of agents in the environment very well. You can think of agents talk to each other with these motions. They execute simultaneously in an environment. And now you can leverage the machinery. We have this paper, it's quite effective. That's an example of this. Now one other interesting point though is text is its own simulator. Essentially, you know, you speak text to each other. That's the full environment. You spit out text tokens, text tokens and text tokens in our case we, well, we predict actions, we execute actions. Imagine now. But you need the simulator because now based on these actions, you need to envision what the whole environment looks like and how your whatever hundreds of millions to billions of sensor points look like. So now you need something that generates them as you act so you can test yourself how you behave over time as you make decisions at fairly high frequency. Then there is a known problem which is called covariate shift. Essentially decisions can take you to places you may not have seen before in the data and there you may have particular failure things that you may not observe unless you push yourself and drive on policy to those places in the data. But to drive there now you need to simulate it. The simulator needs to be realistic enough where you don't go somewhere else entirely as opposed to the actual place you will end up with decision making. So that's another very interesting point. Simulation is hard if you want robust testing. Simply having drivers on the road is not a particularly scalable solution if you want to keep iterating on your stack because some of the events happen once in a million miles or more and you would much rather test them in the simulator. But for the simulator now you have to solve this problem, which is interesting and challenging. So that's unique in our domain.
B
What if AI agents could work together just like developers do? That's exactly what agency is making possible. Spelled AGN tcy Agency is now an open source collective under the Linux foundation, building the Internet of agents. This is a global collaboration layer where the agents can discover each other, connect and execute multi agent workflows across any framework. Everything engineers need to build and deploy multi agent software is now available to anyone building on Agency, including trusted identity and access management, open standards for agent discovery, agent agent communication protocols, and modular pieces you can remix for scalable systems. This is a true collaboration from Cisco, Dell, Google Cloud, Red Hat, Oracle and more than 75 other companies all contributing to the next gen AI stack. The code, the specs, the services they're dropping, no strings attached. Visit agency.org that agntcy.org to learn more and get involved again, that's agency agntcy.org.
C
Well Drago, I'm really intrigued by how you kind of helped me form a mental model for the types of problems that you're that are part of the research in this area. I would definitely encourage our listeners to go check out waymo.com research. There's a bunch of papers there that people can find and you know, read, but also there's Even Waymo open data set which supports research and autonomous driving. So that, that's really cool to see. It's amazing. I'm wondering, Drago, as, as you look at this kind of, I see all sorts of things from you know, scene editing to forecasting and planning to, you.
E
Know, did I mention you need to embody the agents in the simulator too? They're not deterministic.
C
Oh yeah.
E
Doing different things. You need to, well, guide the agents to react to you in reasonable ways as well. Otherwise you know, they'll be reacting to an empty spot where you're no longer even if you collected the situation with your sensors, as you start deviating from it in the similarities, still need the agents to do reasonable things. Right?
C
Yeah, yeah, yeah, yeah, yeah, that makes sense. And I guess that really kind of gets to my question a little bit, which is, I assume over the last five years, as we haven't chatted, there's been a lot of progress in certain areas and maybe certain challenges that are kind of holdouts that remain very, very challenging and maybe not as much progress as made. So in this kind of autonomous driving research world, can you paint in broad strokes kind of where there has been very rapid progress as things have advanced and maybe some of those of the like the hardest problems to solve that still remain kind of at arm's length, if you will.
E
I mean I would say one thing for folks that especially closer to robotics, they will see just like the field of AI is going through some crazy inflection point of I mean both methods people develop and popularity. I think the same is true in robotics and the same is true in AV. I've been in the space over 10 years now just doing AVs and I would say every couple years our capabilities with AI and machine learning dramatically expand due to innovations and this innovation train has not stopped. So like where we are five years later compared to five years before in terms of modeling, I think is still huge improvements possible. I think we're moving more and more to machine learning power stacks and I think ultimately understanding how to leverage, I mean data driven, elegantly scalably handle this problem with data driven solutions. And so that's been generally an evolution and I think we understand how models behave better. I think these latest architectures and the scaling that we mentioned is really interesting domain. We started studying it for example for a while back. So there's this paper we have for example of scaling laws of motionlm architecture. So it's an LLM like architecture. So you say, oh well, what are its scaling loss how does it compare to LLMs? We have a tech report on this for example. Still similar kind of learnings transfer as LLMs, but there's some bespoke really interesting things. For example, for that architecture, improving what's called open loop prediction performance seems to correlate to improving closed loop performance. That's not always true. We see different scaling factors compared to language like motion. Space is nowhere near as diverse as language tokens. So we need actually for the same set of parameters model, we need a lot more data of examples of how the world behaves to scale. These are interesting findings generally. Right, but so that's one I think now as the architectures keep evolving, now there's diffusion and autoregressive models and now how to each compare and how do they compare in open loop and closed loop. These are all very interesting areas people are studying. I think generally there's this question lately as well of how do you build the best simulator with machine learning and what kind of models are there? And most recently there's some groundbreaking work like the GENIE model by Google. I don't know if you guys saw it, it's a controllable video, essentially you can give motion controls and it dreams the video close to real time of what it should look like. So essentially you're controlling the world, you're imagining a bit. And you can do this in real time or you can do it of course off board to or offline with even larger potentially models. And so now these models are pre trained on large amount of video and text and so they capture a lot of knowledge of how the real world behaves. And it somewhat complements the knowledge that vision language models capture from the Internet corpuses. And so how do these two relate? How do you mix them? Which one is beneficial for which type of tasks? These are all interesting capabilities that people are doing. And maybe one other interesting topic is there's a lot of talk about architectures for robots that are some combination of system two and system one architecture. You guys may have heard it. Now we know that large models are more capable when trained on more data and more compute, but in latency sensitive situations if they're too big, you can't run them in real time. So now the question is okay, well what if you have a real time model that handles most cases but then you have a slower model that does better high level reasoning that runs at some slower hertz that helps guide and understand additionally and provide this to the fast model well needed while still keeping this reflexive capability. Someone jumps in front of you, you still respond, right? Like, these are interesting questions in our domain as well. So there's many actually. It's a really, really fascinating time. And I think we're studying a lot of these questions, just as the whole field is, and we have some very interesting findings, some of them not published yet. Generally, I would encourage people come join us. You can, well, you know, contribute to the premier embodiment of physical AI currently out there and you can do interesting research. Right.
C
Sounds like fun.
E
But yes, these are all fascinating topics. And of course, how to control hallucinations in all these models. How do you determine when these models are out of domain and potentially making clear mistakes? Right. This can happen. We have research experience with VLMs, like many of the current ones, but we have a paper called EMMA where we tried to fine tune VLM for driving tasks. Got a bunch of learnings. It can be quite good, but it has limitations too. So how do you overcome these limitations with additional system design is very interesting.
D
I'm curious as we're talking about this and just I'm really enjoying the conversation and I work for another company in Autonomy, but in a. In a slightly different context. And I'm curious. One of the things that is popular in the industry I'm in right now is solving for swarming behaviors. As you're talking about many autonomous vehicles that are. That are having to collaborate in certain ways, I'm curious, from your take, that may or may not be an interesting problem for Waymo. I don't know what your thinking is on that, but I would love to know, when you look at that space, what are some of the things that you think about and are interesting to you about the notion of many autonomous vehicles collaborating together?
E
That's been a very interesting area that actually there was earlier research that I was impressed with where people proved that if you can control groups of vehicles, you can improve traffic flow. So to me, we are not exactly swarming yet. Autonomous vehicles, they're still a subset, a relatively small subset of the whole traffic. So it's mostly when I think of swarming, I imagine say a crowd of 200 people on Halloween all around the car and stuff like this. That's swarming. Or you go to downtown after a Giants game and they're exiting and that is swarming. Right. They're the human agents, so to speak, more prone these days to swarming than AVs. Still, maybe we'll get more prominent. I think when you think of coordinating multiple AVs in our domain already, they do send each Other valuable information. For example, if one of our vehicles encounters some very complex construction, it can help pass information about it to the others. If we encounter potentially slowdowns or vehicles getting stuck, that kind of information can be passed. I think controlling jointly vehicles starts becoming interesting now that we're getting to some kind of scale. I think one of the interesting domains where this is interesting is when you want to charge them. So imagine you need to charge now hundreds of vehicles in the location. How do you control all these vehicles so that they all get to the right place and don't block each other? And it's all very efficient. That's one example of where you're fairly swarmed. It's your own warehouse. Right. Or a garage where this comes up and then down the line. Potentially there is opportunities to improve traffic flow for everyone. But that's still maybe in the future.
C
Yeah, well, you took us right there, Drago, as we're kind of getting close to an end here. I'd love to talk about that future. And we were talking beforehand and I was saying I'd love for you to share just what you're excited about. And that could be of course in general related to driverless research. It could be kind of in the AI ecosystem generally something that you're excited about as you look forward to or are thinking about a lot. Does anything stand out so that we can ask you about it? Hopefully not in five years from now, but maybe maybe the next time you're on in less than five years we can ask you about it.
E
Sounds good. Well, I'm around so I could come probably faster than in five years time in a waymo. Yeah, potentially, yes. I think maybe let's go in a couple areas. First maybe is to parallel this chat we had earlier. Maybe first about the product and then a bit about the AI. I think in terms of the product, I am excited in a way with the safety studies we've shown these are significant improvements over the baseline. And I think we've shown it already at scale with some fairly starts to become fairly good confidence or some statistical significance at this point. And maybe your listeners, I'm not sure they understand but. But even just on the US roads alone, I'm not talking world roads, US roads. 40,000 people die every year from accidents. That's a lot. And I think these gains are starting to become somewhat meaningful. So it starts becoming thinking, hey, maybe we have a mandate to expand, we should be expanding. It will save people's lives. And you think about it. And then the question is how Can I contribute to expanding? I mean, ignore all the. Of course I believe it's a great service. A lot of people love it for a lot of good reasons. We could potentially go into some reasons people found why they love it. Right. But like, I think even just from the, from the mandate, okay, you know, it's helping in meaningful way. And I think being out there can make quite a dent against some of these numbers. And so, yes, I would love it to expand more. Now we're doing that. I think to me then the question is, what can I do to contribute to it? And I think one of the most scalable solutions to tackling dozens of new cities and conditions and countries is machine learning and AI. And so now for me, what I'm excited about is harness all the positive, latest trends, I think, for me more directly first into the Waymo foundation model work we're doing where we can directly experiment and deploy them and then try to push more and more of them to contribute similar benefits to the main production systems, which is the onboard driver and the simulator. So that's what they think about. Now, more specifically, if you want to go into AI techniques, I think this question of, okay, how do I endow vision language models with more modalities is a fascinating one. We actually have some good results already. How do you expand to new modalities, say, lidar and radar? How do you connect it to actions, the model? What's an effective way to do this while preserving all the world knowledge that's present in the model that you're trying to build? On top of is an interesting model and system design challenge. And then what I'm also excited is building the simulator, I think as realistic, as scalable as possible. I think the modern technologies like the GENIE model that I mentioned, these world models that are still relatively few and far between, but I think it's a ton of labs that are working on them today. I think taking that kind of technology and build the most generalizable possible simulator with it, I think is fascinating. Now, the interesting thing is you could do that, but they can still be very expensive to run. So you still need to show this is not just enough to show that it can handle very realistic, interesting cases. You still need to show how you can run it without breaking the bank. The amount of simulation Waymo does today to ensure that we're safe, we run like millions of virtual miles every day. That's a lot of things to simulate potentially with so many sensors on board and so on. So there needs to be. There's Some very interesting question in that space, how do we get the maximum possible simulator realism and how do we get the maximal possible scalable simulator? And there's a very interesting mix of technologies getting involved to do that.
C
It's awesome. Well, I'm certainly excited about that. Like I say, I encourage our listeners to check out Waymo's research page. Lots of amazing stuff to explore there.
E
And folks can see our history. Right? Like I think you can see the kind of work and papers people did from I think 2019 to now. And there's almost 100 papers there now. And maybe it's not 100 only because we may not have uploaded the most recent ones. I'll try to make sure we do soon if we're missing any. So if the readers go there, they can see the full set.
C
That sounds great. Well, thank you for joining us again, Drago. It was a real pleasure to have you on the show again. And let's not make it five years next time. We'll try to get you on and hear the update sooner than that for sure.
D
Don't be a stranger.
E
Thank you guys. Pleasure to be on the show.
A
Alright, that's our show for this week. If you haven't checked out our website, head to PracticalAI FM and be sure to connect with us on LinkedIn X or BlueSky. You'll see us posting insights related to the latest AI developments and we would love for you to join the conversation. Thanks to our partner Prediction Guard for providing operational support for the show. Check them out@prictionsguard.com also thanks to Breakmaster Cylinder for the Beats and to you for listening. That's all for now, but you'll hear from us again next week.
In this episode, hosts Daniel Whitenack (CEO at Prediction Guard) and Chris Benson (Principal AI Research Engineer at Lockheed Martin) welcome Drago Engelov, Vice President and Head of AI Foundations at Waymo, to discuss the current state and future direction of autonomous vehicle (AV) research at Waymo. Drago reflects on developments since his last appearance five years ago, delving deep into technical, operational, and societal aspects of AVs, safety, public trust, modeling advances, simulation, and the promise of machine learning for global-scale deployment.
Quote:
“I think in 25, I would say a lot more people have had and continue having the opportunity to try Waymo. …We are contributing probably, I would like to think, the most advanced version of an embodied physical AI today that you can do without.”
— Drago Engelov ([06:14])
Quote:
“You want to tackle these challenges in some order, not just try to do everything at the same time.”
— Drago Engelov ([10:57])
Quote:
“It’s a robot designed for safe transportation from the ground up…they need redundancy and robustness to make sure that if any system goes wrong…you have contingencies.”
— Drago Engelov ([12:18])
Quote:
“People do not feel statistics…What people feel is when they get into the vehicles…My mother-in-law sat in it just a few weeks ago…‘This car drives much better than me.’”
— Drago Engelov ([14:54])
Quote:
“Generally the trend has been to have few, and in some cases people claim they have one, large AI models on the car…We also have some notion of, as you know, VLMs also have this weakness of hallucination. So we have the safety harness around them…”
— Drago Engelov ([22:39], [25:22])
Quote:
“Simply having drivers on the road is not a particularly scalable solution if you want to keep iterating on your stack because some events happen once in a million miles or more and you would much rather test them in the simulator.”
— Drago Engelov ([32:49])
Quote:
“Controlling jointly vehicles starts becoming interesting now that we’re getting to some kind of scale…when you want to charge them…that’s one example of where you’re fairly swarmed…”
— Drago Engelov ([44:03])
Quote:
“Even just on the US roads alone…40,000 people die every year from accidents. That’s a lot. And I think these gains are starting to become somewhat meaningful. So it starts becoming thinking, hey, maybe we have a mandate to expand…”
— Drago Engelov ([46:11])
On Public Trust:
“People do not feel statistics…what people feel is when they get into the vehicles…”
— Drago Engelov ([14:50])
On Simulation:
“Simply having drivers on the road is not a particularly scalable solution…some events happen once in a million miles or more and you would much rather test them in the simulator.”
— Drago Engelov ([32:49])
On AV System Architecture:
“It’s a robot designed for safe transportation from the ground up…need redundancy and robustness…”
— Drago Engelov ([12:18])
On Progress:
“Every couple years our capabilities with AI and machine learning dramatically expand…this innovation train has not stopped.”
— Drago Engelov ([37:04])
This episode is a rich, deeply technical yet accessible exploration of AV research led by Waymo—a company at the leading edge of robotics and AI deployment in the real world. Drago Engelov highlights the scope and depth of advances in AI and simulation, the operational and social realities of how Waymo selects and deploys in cities, and the multifaceted, ongoing challenge of realizing safe, trustworthy, and scalable autonomy for society.
“Let's not make it five years next time. We'll try to get you on and hear the update sooner than that for sure.”
— Chris Benson ([51:06])