
Rev and Bilawal discuss how simulated “mirror worlds” can help robots learn faster.
Loading summary
Elise Hu
Hi TED podcast listeners. It's Elise Hu here from TED Talks Daily. Thanks for making our podcast part of your routine. We really appreciate it and we want to make your favorite TED podcasts even better. We put together a quick survey and we'd love to hear from you. It only takes a few minutes, but it helps us shape our shows and get to know you, our listeners, way better. Head to the episode description to find the link. Thank you again for listening and for taking the time to help our shows.
Paige DeSorbo
Hey, it's Paige from Giggly Squad. Real Talk if there's one store that I absolutely love walking around, it's Sephora. It's my total guilty pleasure. They have amazing brands that other people don't have and I find something great every time I walk in and there's literally one down the street from me, so I do that a lot. It's so fun to shop in the store and online and the products are just too good. No regrets ever. For example, one of my favorite beauty brands is makeup by Mario, who just launched his new lip gloss that I absolutely love. So the next time you're in the market for great beauty, shop all the hottest products and brands only at Sephora.
Hydro
Want a workout that actually works? Hydro delivers a full body workout that hits 86% of your muscles in just 20 minutes. Rowing with Hydro combines strength and cardio with thousands of workouts led by Olympians in breathtaking locations.
Rev Leboridian
Natural.
Hydro
No wonder nine out of ten members are still active one year later. Try Hydro risk free at hydrow.com and use code RO to save up to $475 off your Hydro Pro Rower. That's H Y--R-O-W.com code RO.
Elise Hu
This show is sponsored by TeamViewer. What does the term digital workplace mean to you? In today's world of AI, robotics and ar, it's much more than an office full of people on computers. It can be anywhere from a factory floor to the top of a CR to the cockpit of an F1 car. Wherever your digital workplace may be, TeamViewer's mission is to make work work better. How? By securely connecting your people with the data, expertise and insights they need in real time. To make work more efficient by automating and streamlining it and OT support to fix problems before they happen to make work more productive and by bringing all the possibilities of a secure and flexible digital workplace to all your people everywhere to make work more innovative. So discover how TeamViewer can make work work better wherever it happens across your business. Learn more@teamviewer.com WorkBetter Hey, Bilavel here.
Bilaval Sidhu
Before we start the show, I have a quick favor to ask. If you're enjoying the TED AI show, please take a moment to rate and leave a comment in your podcast app. Which episodes have you loved and what topics do you want to hear more of? Your feedback helps us shape the show to satisfy your curiosity, bring in amazing guests, and give you the best experience possible. The world of AI is advancing at an incredible pace, and it's no secret that in many areas, computers have long outperformed humans. But there's been one area that's been tough for robots to physical intelligence. We've talked a lot on this podcast about text and image generation technologies that took years of research, immense computational power, and vast data sets to develop. But when compared to mapping 3D spaces and predicting the chaotic randomness of the real world, that's all child's play. So what gives humans the edge here, at least for now, it's simple. We've had a lot of practice. Imagine you're a pro baseball player in the outfield, watching a flyball come your way. In an instant, your brain calculates the ball's speed, spin and trajectory to predict where it will land. To you, it feels automatic, but it's the result of years of practice and learned experiences. Not just from baseball, but from a lifetime of physical interactions from childhood. Moments of trial and error in the physical world have trained your brain to understand how objects move and react. And for humans, mastering these skills takes time because real world practice can't be rushed. But fortunately for robots, it can be rushed. And Nvidia, the AI giant historically known for its graphics cards, has developed incredibly powerful simulated environments where robots can practice and learn at a supercharged pace. Tens of millions of repetitions, which might take humans years, can be compressed into minutes. We're already seeing this in self driving cars, but the potential goes far beyond that. By building AI that understands the physical world, Nvidia is setting the stage for machines that could revolutionize industries, assist in complex surgeries, and even help around the house. So what does it mean for robots to develop a kind of physical intuition? And what challenges and opportunities lie ahead as we continue to push the boundaries of robotics? I'm Bilaval Sidhu and this is the TED AI show where we figure out how to live and thrive in a world where AI is changing everything.
Instacart
Instacart is on a mission to have you not leave the couch this basketball season because between the pre game rituals and the post game interviews, it can be difficult to find time for everything else, so let Instacart take care of your game day snacks or weekly restocks and get delivery in as fast as 30 minutes because we hear it's bad luck to be hungry on game day. So download the Instacart app today and enjoy. $0 delivery fees on your first three orders. Service fees apply for three orders in 14 days. Excludes restaurants. Got a new puppy or kitten? Congrats. But also yikes. Between crates, beds, toys, treats and those first few vet visits, you've probably already dropped a small fortune. Which is where Lemonade Pet Insurance comes in. It helps cover vet costs so that you can focus on what's best for you and your new pet. The coverage is customizable, sign up is quick and easy, and your claims are handled in as little as three seconds. Lemonade offers a package specifically for puppies and kittens. Get a'llemonade.com pet your future self will thank you. Your pet won't. They don't know what insurance is.
Does it ever feel like you're a marketing professional just speaking into the boy? Well, with LinkedIn ads, you can know you're reaching the right decision making. You can even target buyers by job title, industry, company seniority skills. Wait, did I say job title yet? Get started today and see how you can avoid the void and reach the right buyers with LinkedIn ads. We'll even give you a $100 credit on your next campaign. Get started at LinkedIn.com results, terms and conditions app.
Bilaval Sidhu
Our guest today, Rev Leboridian, began his career in Hollywood, where he worked on visual effects for films like Mighty Joe Young and Stuart Little. His experience in creating detailed, dynamic 3D worlds laid the foundation for his role today as VP of Omniverse and Simulation Technology at Nvidia. There, he's using that expertise to push the boundaries of robotics by applying simulation technology to teach robots physical intelligence. In other words, how to understand and interact with the real world. In our conversation, we explore how Nvidia, known for its role in gaming technology, became a key player in the development of generative AI, what a robot even is, and rev's vision for a future where robots enhance our lives.
Rev Leboridian
So Rev, welcome to the show.
Thank you for having me. Bilavel.
So in the first part of your career, you worked in entertainment, helping audiences become immersed in fantasy worlds. And now your work involves helping robots become immersed in simulations of the real world. Can you explain to our listeners what your role is at Nvidia?
Technically, my role is the Title is Vice President of Omniverse and Simulation Technology. It's kind of a weird title. I don't think there's many others like it out there. And it's strange because it's a new concept, relatively speaking. I started my career, as you mentioned, in entertainment media entertainment, doing visual effects and computer graphics for that purpose. I joined Nvidia 23 years ago with the hope of taking what I was doing in movies, creating this imagery of high fidelity, high quality fantasy worlds and doing it in real time, doing it really fast, using our GPUs to power that computation so that it could become what's a linear experience in movies, could become an interactive one, like in a video game or in a immersive experience like xr. It took a while for us to get there though.
Speaking of that, you've had a very unique vantage point over the years watching Nvidia almost evolve from basically a gaming hardware company to a leader in AI and simulation. Could you share a little bit about your journey at Nvidia and how Nvidia's mission has transformed over the years?
That's a really, really great question. And I think a lot of people don't really understand how Nvidia, this quote unquote gaming company or this chip company that made chips for gaming PCs, is now the most valuable company in the world and at the center of all of this AI stuff. But if you go back to what the idea behind the creation of the company was all the way at the beginning, it actually makes a lot of sense. The founding principle of the company was this idea that general purpose computers, ones built around CPUs, the same architecture that we built all computers around since the 1960s, starting from the IBM System 360, they're really great, but there are certain computing problems that they just aren't fast enough to solve. Now at the time we had this law called Moore's Law. It's not law like a law of physics. It was more like an observation of how semiconductors were essentially providing double the compute for the same price or the same amount of power every year and a half or two. At its height, Moore's Law made it so that we could get 100 times speed increases for the same price or the same power over a 10 year period. But we looked at that at Moore's Law and said, well, if we wait for Moore's Law to give us enough computing power to do certain things like rendering for computer graphics for video games, we would have to wait decades or maybe even hundreds of years before the computers would be fast enough to do some of the things we wanted to do. So Nvidia set about creating this new form of computing that doesn't do everything, but it can do many things that would otherwise be impossible with this generic kind of computer. And we call that accelerated computing. We invented the idea of a gpu. And the first problem we chose to tackle was the problem of 3D rendering for producing these images in video games. At the time when Nvidia was formed in 1993, there was no market for this. There were actually no 3D video games. They were just starting. There was Doom and Wolfenstein, like the first ones that just.
And Duke Nukem.
Yeah, that came a little bit later, I think. It was not 93, maybe 95, I think. And so we imagine that this problem, if we could help solve it, a market would form around that and then we could expand into other markets with the same accelerated computing architecture. And that's essentially what happened. Fast forward a few more years. In the early 2000s, we added a critical feature to our GPUs. It's called Programmable shading, which is simulating how the light interacts with the material inside a 3D world. That's what makes plastic look like plastic, aluminum look like aluminum, wood look like wood. Up until that point in time, the kinds of shaders we could have, the kinds of materials were very limited. And they made the video games look very simple or cartoony, not quite realistic. In the movie world, we weren't limited by time and how much time you have to render. We could spend hours and hours rendering. So there's this big disconnect between how the quality of computer generated image in a movie and what you could see in a video game. We introduced programmable shading, and that feature of making it programmable unlocked the possibility of us using the same GPUs for more than computer graphics and rendering. And very quickly we saw researchers and other people who weren't doing computer graphics take advantage of all the computing capabilities that were in our GPUs by taking their problems, other sorts of physics problems like molecular dynamics and fluid dynamics. They would take these problems and phrase them like they're a computer graphics problem. And when we realized that that was happening, people were willing to contort themselves into using graphics aps to do this other stuff. We said, let's make it easier for them. And we introduced cuda, which was a more natural way of programming general purpose things that weren't graphics on our GPUs. And we essentially waited for six, seven years to see what the killer app would be. We imagine some developer somewhere, probably a grad student, was going to go figure out something amazing to do with this computing capabilities. And it took a while. We introduced CUDA in 2006. At the end of 2012, almost seven years later, we finally had that moment. And what happened was two research students and their professor at the University of Toronto, Ilya Suskiver, Alex Kruzhevsky and their professor, Jeff Hinton, who just won the Nobel Prize, they beat all of the benchmarks in image classification with a deep learning neural network called Alexnet at the end of 2012 when they published that. And that essentially changed everything.
And this is insane because up until that point, basically every other approach for the ImageNet benchmark was not really winning because of this deep learning approach. This was the first time deep learning kind of blew everyone's mind in the realm of computer vision. And it's kind of wild to imagine it started off with programmable shaders and trying to make like cinematic visuals from Hollywood run in real time on your computer. But that same capability, like you said, as you made it easier for developers, unlocked this whole new world in computer vision and certainly caught the whole world's attention, particularly y' alls, probably sooner than everyone else, I assume.
That's exactly right. It seems counterintuitive that this thing built to create images is somehow the same thing that you need to build intelligence. But really it all just comes down to computing the form of computing we had to build for computer graphics. We process a lot of pixels, a lot of triangles, a lot of light rays bouncing around in a scene. That same form of computation is the same thing. You need to do all of the tensor math, all of the matrix math. The problem of image classification that's been a long standing one that we've all known would be great if we could solve, they've been trying to solve it since the 1950s. It's a really, really useful thing to do, to be able to distinguish what's inside an image that you provide the computer automatically. And up until that point, we would take a really smart person, a computer scientist. That person would imagine an algorithm that can do image classification and then transcode what's in their brain into the computer and produce a program. What changed here was for the first time we were able to create an algorithm to solve something that no human could actually imagine. The way we solved it was by taking a large computer, effectively a supercomputer. We gave it millions of examples of images and said, when you see an image that looks like this, that's a cat. And when you look at an image that looks like this, it's a dog. When you look at this image, it's an airplane. And so we did that enough times that it wrote the software, it wrote the algorithm that could do that image classification. And so it did it better than any algorithm that a human could imagine.
And that's wild, right? You're talking about this era where humans have written software, now software is writing software.
That's right. There's two basic ingredients, a supercomputer, lots of computation, and you give it a whole bunch of data or examples of what you would like it to do, and it figures out the algorithm for you based on the examples you give it. The first one, building large computers, that's our happy place. Right? That's what Nvidia knows how to do. We love building powerful computers and scaling them up. And so that's what we set about doing over a decade ago. And the recent explosive growth of Nvidia is essentially because of the bet we placed over a decade ago that these big computers were going to be useful. That's what everybody is clamoring for right now. They're setting up these AI supercomputers.
Yeah. And every country and company wants more of your GPUs. And of course, the recent demand has really been driven by large language models and diffusion models, which we've talked about a bunch on the podcast. But it's interesting, like as. As cool as ChatGPT is, and as cool as it is to be able to type a prompt and get an image out, this stuff isn't the holy grail. These systems have their limitations. Right. Could you talk a little bit about that as we transition this conversation towards physical AI?
Yes, that's. That's exactly right. So at that moment when we realized how profound this change was, that we could now produce algorithms that we never imagined we would have in our lifetimes through this new technique of deep learning and AI. The next question we asked ourselves was, now that we have this possibility of creating these amazing new things, which ones should we go create? What are going to be the most valuable and impactful ones? Now, if you just take a step back and think about the computing industry, the IT industry, it's somewhere between 2 and $5 trillion a year globally, which is a huge number. Right. That's a really big industry. However, all of the rest of the industries out there, the industries that are about our physical world, the world of atoms, that's $100 trillion. That includes markets like transportation, transporting humans, transporting goods. It includes manufacturing, which is reassembling atoms into products, includes drug discovery and design, reassembling atoms into medicines, so on and so forth. Like all these things about our physical world, at least the way humans value them through markets, are much greater value than information. Now, information is the easiest things for us to digitize. So it kind of makes sense that the first algorithms that we develop using this new machine learning, deep learning AI technique, it's going to use all the data that we have readily available to us, which is essentially what's on the Internet. But if we could somehow take this new superpower and apply it to the realm of atoms, we unlock that $100 trillion market and that all of those markets. Take manufacturing, for example. We've applied IT and computing to those markets, like manufacturing. But if you go into a factory, it's not that different from a factory 50 years ago. They've been largely untouched by computing. The reason why we haven't been able to do that is because we haven't really had a bridge between the physical world and the computing world.
Connecting bits and atoms. Baby, let's go.
Yes. And if you think a little bit more about that bridge is essentially robotics.
Totally.
And so we thought about this and we said, this is now maybe possible. The robotics, it's been a dream for a long time, but what we've been missing are the fundamental algorithms we need to build a truly useful robotic brain so that we could apply computing to the real world. And so what's a robot? A robot is essentially an agent out here in our real world that does three things and does these three things in a loop. A robot perceives the world around us, the physical world. It inputs the world through sensors. They can be cameras and lidars and radars, all kinds of sensors, whatever the sensing mechanism is. And it makes some kind of sense out of what's coming in. It understands what's coming in. Essentially, that first neural network, Alexnet was doing that, right? It's getting some information from the real world, an image, photograph, and making sense of what's inside it. The next thing it does, a robot agent inside the physical world. It takes this information, what it's perceived, and makes some decisions, makes a decision about how it should act. It plans and decides how it's going to affect the world. And the third thing is actuation. It actually does something inside the world. So once it's made the decision, it does something that actually moves or affects the physical world. And once that Happens then it's a loop. You perceive your changes to the world, update your decisions and your plan and go actuate. By this definition, many things are robots, not just the things we normally think of as a robot, like a C3PO or R2D2. A self driving car is definitely a robot, has to perceive the world around it. Where are the other cars, the stop signs, pedestrians, bicyclists, how fast are they all moving? What's the state of the world around me, around the car? Make some decisions on how it's going to get to the final destination and actuates, steers, brakes or accelerates. And this thing runs in a loop. Lots of things are robots if you define them this way. The building I'm in right now, which is our endeavor building, our headquarters, every day when I enter it in the reception area, we have turnstiles, there are sensors there, there's some cameras. They know when I walk up to the turnstile, it senses that I've approached and then decides who I am based on an image classification algorithm not dissimilar from that original Alexnet. And once it determines that I'm rev, it can look me up in a database should I have access. And then it actuates in the world, it opens the turnstile so I can pass through and update some count somewhere that now I'm in the main area. So this building is essentially a robot. And so if you think about robots in this way, and you think about robotic systems as essentially the bridge between computing and the $100 trillion worth of industries out there that deal with the physical world, you start to get pretty excited like, wow, we now potentially have the opportunity to go make a big impact in many of these other industries.
And so on that note, I mean, it's interesting, right? You are talking about how factories haven't changed in decades. And you're right, there's like enterprise resource planning software to keep track of the inventory of stuff and how it's moving around. But the world of atoms hasn't seen as much progress in the world a bit. And to unlock that massive like physical, like the massive opportunity in these physically based industries, what's the missing piece? What do we not have today? And what are y' all building to make that happen?
Yeah, and so this is where simulation comes in. If we go back to, you know, what were the key differences between how we used to write software and this new form of AI? 1 is supercomputing, the other is you need that data or the set of examples to give it. So it could go write the function. Well, where are we going to get that data? To learn the physics of the world around us? How do you gather that data? It doesn't just exist on the Internet. The stuff we have on the Internet is largely the things that were easy to digitize, which is not stuff in the physical world. And so our thesis is that the only way we're going to get all the data that we need is by essentially taking the physical world and all the laws of the physical world and putting it in a computer, making a simulation of the physical world. Once you have that, you can produce all of the data you need, essentially the training grounds for these AIs to learn about the physical world. You're no longer constrained by all of the constraints that we have out here in the real world. We can train faster than time, than the real world time out here by just adding more compute. You can go for every real world. Second, we can do millions of seconds in the simulated world. Wow.
Yeah.
And collecting data from the real world is really expensive. Let's take one kind of robot self driving cars, autonomous vehicles. If you want to train a network to perceive a child running across the street in any condition, any lighting condition, any city.
Different times of year. So different weather.
Yeah, different weather conditions. You're going to have to actually go out there in the real world and have a child run across the street as your car is barreling down the road and capture it. I mean, first of all, obviously this is unethical to do and we shouldn't do that, but then just the tediousness of that, of capturing it in every possible long tail scenario, it's just untenable. You can't do that. It's too expensive and it's just impossible. There are some really rare weather conditions. You might want to have that same condition with volcanic ash falling that might happen in Hawaii. How can you even construct that scenario? Right. But in simulation we can create it all. In addition, when you grab data from the real world, you only have kind of half the data you need. We also need to know about what's inside this information and the unstructured information.
What we call labels.
Labels, exactly. So with Alexnet, when they trained it, they had not only the image, but they had the label that said that image is a cat or a dog. When we simulate a world, we can produce the labels perfectly and automatically.
You get it for free, pretty much.
But when you do it in the real world, you have to have an army of humans or some, some other mechanism of adding the labels and they're going to be inaccurate. And before you deploy it out into the real world, you probably want to make sure it's going to work. You know, we don't, we don't want to put a robot brain in a self driving car and just hope that it's going to work when that child runs across the street. And the best place to go test that is in a virtual world in a simulation. And it was a really long winded way to get to this is essentially what I've been working on in recent years here at Nvidia. We saw this, the need for this many years ago. So we started building what we call Omniverse. Omniverse is kind of a quote unquote operating system that we collect all of our simulation and virtual world technologies into. And the goal of Omniverse is specifically about doing simulations that are as physically accurate as possible. That's the key thing. It has to match the real world because otherwise our robots would be learning about laws of physics from something that's just wrong. This is distinctly different than what I did before and my work in movies and doing simulations to produce the amazing imagery that we see in visual effects and CGI movies or in video games. That's all about creating really cool looking images that are fun, of fantasy worlds, of fake worlds. There's all kinds of stuff that we're cheating. We add extra lights and makeup and we're breaking the laws of physics in order to make the movie fun and cool or exciting.
There is something really poetic about that though. It basically goes back to the start of your career. All this stuff, all these capabilities y' all built to emulate the laws of physics, let's say for light transport and just get the material properties right. So the glint veneer, the reflections and refraction all look really good. That's exactly what you need, obviously tuned in a fashion that's physically accurate, as you said. So these robots have kind of a believable digital twin or copy or replica of the real world where they can, where they're free to make mistakes. But also the time dilation aspect that you mentioned, where you can scale up and have these models go do things in the digital realm that would take forever to do in the physical world. And it feels like there's another piece of this too is like you create these digital replicas of the world that becomes the training data, because as you said, you don't have the Internet to go and pull all this text or image data from, but then you have the robots Try things. And there's this like. Like domain gap that. This chasm that you need to cra. Cross between the simulation and the real world. What are some of the other capabilities that are y' all are building to make that happen?
Yeah, I kind of oversimplified how. How we build these AIs to just. You feed into the supercomputer, and out comes this amazing robot brain. That's some of how we do it. But there's many different forms of learning, and I think the one you're touching upon is what's called reinforcement learning. It turns out that these robots, one of the best ways for them to learn is sort of how humans and creatures learn. When a baby is born, a human baby is born into the world, it still doesn't understand the physics of the world around them. A baby can't see depth. They can't really see color, yet they have to learn how to see color. And over time, over weeks, they start learning those things. They start learning how to classify. They classify mom and dad and. And siblings and. And apple. Apple, all of those things around. They learn it just through. Through experience. They also learn about the laws of physics through a lot of experimentation. So when you first start giving your baby food and putting food in front of them, one of the first things they do is drop it or throw it, breaking things, throwing things, making a mess. Those are essentially science experiments. They're all little scientists that are trying things until they learn it. And once they understand how that physics works, they move on. Robots learn in the same way through this method called reinforcement learning, where we throw them into a virtual world or into. It could actually be in the real world, but it's too slow to do in the real world. Generally, we do it in the virtual world. We give this robot the ability to perceive and actuate inside that world, but it doesn't actually know anything. But we give it a goal. We'll say, stand up, and we have them try millions and millions of iterations of standing up. And so what you were alluding to this Isaac Sim, that's our robotic simulator that we've built on top of our Omniverse platform on this quote, unquote operating system that allows you to do many of the things you need in order to build robot brains. One of those things is reinforcement learning.
It's almost like a training simulator built on top of Omniverse where it can. Where it's free to make mistakes. And you're almost like. Like you said, I love the notion of wall clock time and Speeding that up, you're compressing down, compressing all these like epochs of learning and evolution down into something that is manageable. And then you plop that into a real world robot and it still works.
That's exactly right. Simulated time is not bound to wall clock time. If I double the amount of compute, double the size of my computer, that's twice the amount of simulation I can do. That's twice the number of simulation hours. And so the scaling laws apply here in a profound way.
That's pretty magical. Let's talk a little bit about the applications of physical AI. Like obviously applies to so many different fields. We talked about autonomous vehicles. There's like robotic assisted surgery. You alluded to automated warehousing. Could you share some examples of how physical AI is currently impacting these areas and what it's unlocking for these industries that have sort of been stuck in the past.
I think the very first place that it's impacting the most, the first area is autonomous vehicles. The first robots that once we discover this deep learning, machine learning thing, immediately you saw all of these efforts from different companies to go build autonomous vehicles, whether they're robo taxis or assistance inside, inside commercial cars. And it's actually become a reality now. Like I don't know if you've been to San Francisco or Phoenix or.
We got Waymo in Austin here too.
Yeah, I didn't realize they were in Austin as well. It's pretty awesome. I just, I was in Phoenix a month or so ago at the airport and I was waiting for my Uber and five Waymos picked up these, these people standing next to me and, and it was super mundane. Just another day, just another day staring at their phones and got into the car like it was nothing. This was unimaginable 10 years ago and now it's become mundane. And all of that is powered by these AI algorithms. Now I don't know exactly what's inside Waymo or any of the other ones, but there's this trend that's happening where we're moving from the kind of earlier generations of AI that are some more specific AI, like AlexNet, where we trained these models on very specific data sets. And then we kind of string these different models together to form a whole system like, kind of like task specific.
Models that you kludge together.
Yeah, you put together to these more general purpose, unified models that are built on the transformer architecture, the same thing that powers LLMs. And so we're starting to see these robotics models that are more general purpose and that's what we're talking about with physical AI being the next wave, essentially having these kind of foundation models with general purpose understanding of the physics world around us that you use as the basis, as the foundation to then fine tune for your specific purpose. Just like we have LLAMA and GPT and the anthropic models. And then from there you go fine tune those for specific kinds of tasks. We're going to start seeing a lot of new physical AI models that just understand the general laws of physics. And then we'll go take those and fine tune them to specialize for different kinds of robotic tasks.
And so there's robotic tasks. It's like the Roomba in your fricking house versus of course, a warehouse robot or even an autonomous vehicle.
That's right, yeah. They could be a pick and place robot in a warehouse, it could be an amr. They're like basically little driving platforms that, that zip around in these warehouses and factories. They could be drones that are flying around inside, factories outside.
That's what I want, by the way, is I want like a hot latte delivered like on my balcony by a drone, not having to navigate traffic. And it's like it's actually hot and gets to you.
Yeah, I'm not sure I'm with you on that one. I don't know if I want to have thousands of drones zipping around my neighborhood, just dropping off lattes everywhere. That's one of the few things that I do by hand in handcraft at home myself.
You like your latte art?
I make one every morning for my wife. That's the first thing I do every day. And it kind of grounds me into the world. So I don't need a. The drone doing that.
Fair enough. Fair enough. How do you think about where we are in terms of physical AI capabilities today? I don't know if the GPT 1, 2, 3, 4 nomenclature is the right way to think about it, but I'm curious as you think about where we are now and where we're headed. What stage are we at in terms of the maturity of physical AI capabilities, especially this more general approach to agents that understand and can take action in the physical world.
I think we're right at the beginning. I don't know how to relate it exactly to gpt1234. I'm not sure if that works, but we're at the very beginning of this. That being said, we're also building on the GPT 1, 2, 3, 4 on the LLMs themselves. The information and data that's fed into these text based or LLM models is actually still relevant to the physical AI models as well. Inside these descriptions in the text that was used to train them is information about the physical world. We talk about things like the color red and putting a book on a shelf and an object falling. Those abstract ideas are still relevant. It's just insufficient. If a human has never seen any of those things, never touched or experienced it, only had the words describing the color red, they're not really going to understand it.
It's not grounded in the physical world as you said previously.
Right. And so they're going to take all of these different modes of information and fuse them together to get a more complete understanding of the physical world around us.
Is a good analogy. Like different parts of our brains. Like it seems like these LLMs are really good at reasoning about sort of this like symbolic textual world. And there's all this debate over how far the video models can go and like reproduce the physics of the world. But it sounds like you just created another primitive that kind of works in concert with these other pieces that is actually grounded in the real world and has seen examples of the physical world and all the edge cases that you talked about. And then that system as a whole is far more capable.
Exactly. I think there is debate over how far you can go with these video models because of the physics of the world. Now even the current, more limited video models we have, they're not trained with just video. They're multimodal. There's lots of information coming from non video sources. There's text and captions and other things that are in there. And so if we can bring in more modes of information like the state of the world that you have inside a simulator. Inside a simulator, we know the position of every object in 3D space. We know the distance of every pixel. We don't just see things in the world. We can touch it, we can smell it, we can taste it. We have multiple sensory experiences that fuse together to give us a more complete understanding of the world around us. Like right now I'm sitting in this chair. I can't see behind my head, but I'm pretty sure if I put my hand behind me here, I'm going to be able to touch the back of the chair. That's proprioception. I know that because I have a model of what the world is around me because I've been able to synthesize that through all of my senses and there's some memory there. We're essentially replicating the same kind of process, the same basic idea with how we train AIs. The first, the missing piece was this transformer model, this idea that we could just throw all kinds of unstructured data, this thing, and it figures out, it creates this general purpose function that can do all kinds of different things through understanding of complex patterns. So we had that and we need all of the right data to pump into it. And so our belief is that a lot, if not most of this data is going to come from simulation, not from what happens to be on the Internet.
So interesting what your point about. You have the state of the world, you have the, to use nerdspeak, the 3D scene graph and as you mentioned, yeah. Like the vectors of all the various objects. All this stuff that you take for granted in video games could then be thrown into a transformer along with other image data, maybe decimated to look like a real sensor. And then suddenly it'll build an understanding or build a. I've heard it described as like a universal function approximator to figure out how to. Yeah. Emulate all these other senses like proprioception and all these other things. I think there's like 30 or 40. I was like kind of surprised to hear that we have so many. And maybe robots could, I mean they're not limited by art. You alluded to lidar and lasers earlier, right? Or infrared. And so it's like at some point these robots will be going back to the start of our conversation. Superhuman.
Yeah. I mean we have animals that are superhuman in this way too. Right. Bats can see with sound.
Yeah. Eagles can like have got like varifocal like vision. They kind of zoom in.
Sure. Why won't they be superhuman in certain dimensions of sensing the world and acting within the world? Of course, they already are in many respects. We have image classifiers that can classify animals, every breed of dog and plants better than any human can. So, so true. So we'll certainly do that, at least in certain dimensions.
Elise Hu
When your workforce, tech stack and business needs are evolving all at once, you need HCM software that moves just as fast. That's why Paylocity builds what's next, providing innovative and simplified solutions for clients to tackle the real challenges they face every day. From AI driven insights to automated workflows across hr, finance and it. Paylocity's platform doesn't just keep up, it leads. It's time to simplify complexity, drive results and move forward together. Start now@paylocity.com simplified on formative middle school.
Paige DeSorbo
Kids from New York City public school schools interview a phenomenal collection of grownups. Me, like, I don't know what I want to do.
Elise Hu
You don't have to have all the answers.
Rev Leboridian
I feel like a lot of people's favorite topics are, like, interest in their life. That is a really good answer.
Paige DeSorbo
The podcast where the leaders of today are interviewed by leaders of tomorrow. Listen now at newyorkedge.org formative or wherever you get your podcast.
Does it ever feel like you're a marketing professional just speaking into the void? Well, with LinkedIn ads, you can know you're reaching the right decision makers. You can even target buyers by job title, industry, company seniority, skills. Wait, did I say job title yet? Get started today and see how you can avoid the void and reach the right buyers with LinkedIn ads. We'll even give you a $100 credit on your next campaign. Get started at LinkedIn.com results terms and conditions apply.
Rev Leboridian
So let's talk about looking towards the future a little bit here. So you talked about physical AI is transforming factories and warehouses. What's your take on the potential in our everyday lives? Right. Like, how do you see these technologies evolving to bring robots into our home or personal spaces in really meaningful ways? This is like as intimate as it possibly can get. Right. It's not really a controlled environment either.
If you've been watching any of Jensen's keynotes this past year, within the last 10, 12 months or so, there's been a lot of talk of humanoid robots.
Absolutely, yeah.
And that's kind of all the rage. You're seeing them everywhere. I imagine for many people when they see this, they could just kind of roll their eyes like, oh, yeah, yeah. Humanoid robots. We've been talking about these forever. Why does it have to look like a humanoid? Doesn't it make more sense to build specialized robots that are really good at specific tasks? And we've had robots in our most advanced factories for a long time, and they're not humanoids. They're like these large arms in automotive factories. And why are we talking about humanoid robots? The reason why this is coming up now is because if you take a step back and think about it, if you're going to build a general purpose robot that can do many different things, the most useful one today is going to be one that's roughly shaped and behaves and acts like a human. Because we built all of these spaces around us for humans. For humans. So we built our factories, our warehouses, our hospitals, our kitchens, our retail spaces. There's stairs and ramps and, and Shelves. And so if we can build a general purpose robot brain, then the most natural kind of physical robot to build, to put that brain in for it to be useful, would be something that's human like, because we could then take that robot and plop it into many different environments where it could be productive and do productive things. And so many companies have realized this and they're going all in on that. We're bullish on it. I think even within this space, though, there are specializations. Not every humanoid robot is going to be perfect for every task that, that a human can do. Actually, not all humans are good at every task. Some humans are better at playing baseball and some are better at, at chopping onions. You know, there's.
Astronauts have a certain criteria, right?
That's, that's right. So we're gonna, we're gonna have many companies building more specialized kind of humanoids or in different kinds of robots. The ones that we're immediately focused on are the ones in industry. We think this is where they're gonna be adopted the most, the quickest, and where it's gonna make the most impact. Everywhere we look, globally, including here in the US there's labor shortages in factories, warehouses, transportation, retail. We don't have enough people to stock shelves. And the demographics are such that that's just going to get worse and worse. So there's a huge demand for humanoid robots that could go work in some of these spaces. I think as far as in our personal space, a robot that can work side by side with a human in a factory or a warehouse should also be able to work inside your kitchen, in your home. How quickly those kinds of humanoid robots are going to be accepted, there'll be a market for it. I think it's going to depend on which country we're talking about, because there's a very cultural element. Bringing a robot into your home, another entity, some other thing that's human, like into your home, that's very personal.
And God forbid it makes your latte for you.
Exactly. I don't want to do that in my kitchen. I don't even want other humans in there in the morning. But there's cultural elements here in the US and the west in general. We're probably a bit more cautious or careful about robots in the east, especially countries like Japan. Totally.
That's where my answer is.
They love them. Right. And they want it, but industry everywhere needs it now, right?
Yeah.
And so for industrial applications, I think it makes sense to start there and then we can take those technologies into the consumer Space and the markets will explore where they fit the best at first, but eventually we'll have them everywhere.
It's so fascinating to think about how many technologies that their early adopters of, including virtual avatars and things like that, but sort of bridging virtual and the physical. The technologies you all are building aren't just limited to robots. Right. As this tech improves spatial understanding, they could enhance our personal devices, sort of virtual assistants. How close do you think we are to that sort of. In real life, JARVIS experience a virtual assistant that can seamlessly understand and interact with our physical environment, even if it's not embodied as a robot.
So this gets back to what. What I was saying earlier about the definition of a robot. What is a robot?
Totally.
The way you just talked about that. Like, to me, JARVIS is actually a robot. It does those three things. It perceives the world around us.
Yep.
Through many different sensors, it makes some decisions and it can even act upon the world. Like jarvis, inside the Avengers movies.
Yeah.
It can actually go activate the Iron man suit.
Right. Yeah.
And do things there. Right. Like, so. So what is the difference between that and a C3PO?
Totally fundamentally, you're kind of inside a robot, sort of, as you alluded to the Nvidia building, too. Yeah.
And if you think about some of these XR devices that immerse us into the world, they're half a robot. There's the perception part of it. There's the sensors, along with some intelligence to do the perception. But then it's fed into a human brain, and then the human makes some decisions and then it acts upon the world.
Right.
And when we act upon the world, there's maybe some more software, some. Some even AI doing things inside the simulation of that world or that combination. So it's not black or white. What's a robot and what's a human or human intelligence where there's kind of a spectrum between these things. We can augment humans with artificial intelligence. We're already doing it. Every time you use your phone to ask a question, you go to Google or Perplexity or something, you're adding AI. You're augmenting yourself with AI there by asking ChatGPT a question. It's that blend of AI with a Jarvis experience that's immersive with XR. It's just making it so that loop is faster with the augmentation.
You beautifully set up my last question, which is, as AI is becoming infused in not just the digital world, but the physical world, I have to ask you, what can go Wrong and what can go right?
Well, with any powerful technology, there's always going to be ways things can go wrong. And this is the most powerful of technologies potentially that we have ever seen. So we have to be, I think, very careful and deliberate about how we deploy these technologies to ensure that they're safe. So in terms of deploying AIs into the physical world, I think one of the most important things we have to do is ensure that there's always some human in the loop somewhere in the process, that we have the ability to turn it off, that nothing happens without our explicit knowledge of it happening and without our permission. We have a system here, we have sensors all around our building. We can kind of see where people are, which areas they're trafficking the most. At night we have robotic cleaners. They're like huge roombas that go clean our floors and we direct them to the areas that people have actually been and they don't bother the areas that haven't been trafficked at all to optimize them. We're going to have lots of systems like that. That's a robotic system that's essentially a robot controlling other robots. But we need to make sure that there's humans inside that loop somewhere, deploying that, watching it and ensuring that we can stop it and pause it and do whatever's necessary. And so the other part of the question was, you know, what are the good things that are going to come out of this? We touched on a bunch of those things there. But ultimately being able to, to apply all of this computing technology and intelligence to things around us in the physical world, I can't even begin to imagine the potential for the increase in productivity. Just look at something like agriculture. If you have effectively unlimited workers who can do extremely tedious things like pull out one weed at a time and thousands of acres of fields go through and just identify where there's a weed or a pest and take them out one by one. Then maybe we don't need to blanket, blanket these areas with pesticides, with all these other techniques that, that harm the environment around us, that harm humans. We can essentially, the primary driver for economic productivity anywhere is the number of people we have in a country. I mean, we measure productivity with gdp, gross domestic product, and we look at GDP per head. That's the measure of efficiency, Right? But it always correlates with the number of people. Countries that have more people have more GDP. And so when we, when we take physical AIs and apply them to the physical world around us, it's almost like we're adding more to the population and the productivity growth can increase. And it's even more so because the things that we can have them do are things that humans can't or won't do. They're just too tedious and boring and awful. So you find plenty of examples of this in manufacturing and warehouses, in agriculture and transportation. Look, we keep talking about transportation being this huge issue right now. Truck drivers, we don't have enough of them out there. This is essentially a bottleneck on productivity for a whole economy. Soon we're effectively going to have an unlimited number of workers who can do those things. And then we can deploy our humans to go do all the things that are fun for us, that we like doing.
I love that. It's like we're finally going to have technology that's fungible and general enough where we can reimagine all these industries and yet let humans do the things that are enriching and fulfilling and perhaps even have a world of radical abundance. I know that's a little trendy thing to say, but it feels like when you talk about that, it sounds like a world of radical abundance. You feel that way?
I do, I do. I mean, I mean, if you just think about everything I said from first principles, why won't that happen? If we can manufacture intelligence and this intelligence can go drive, be embodied in the physical world and do things inside the physical world for us, why won't we have radical abundance? I mean, that's basically it.
I love it. Thank you so much for joining us, Rev.
Thank you for having me. It's always fun talking to you.
Bilaval Sidhu
Okay, as I wrap up my conversation with Rev, there are a few things that come to mind. Oh, my God. Nvidia has been playing the long game all along. They found just the right wedge computer gaming to de risk a bunch of this fundamental technology that has now come full circle. Companies and even governments all over the world are buying Nvidia GPUs so they can train their own AI models, creating bigger and bigger computing clusters, effectively turning the CEO Jensen Huang into a bit of a kingmaker. But what's particularly poetic is how all the technologies they've invested in are the means by which they're going to have robots roaming the world. We are creating a digital twin of reality, a mirror world, if you will. And it goes far beyond predicting an aspect of reality like the weather. It's really about creating a full fidelity approximation of reality where robots can be free to make mistakes and be free from the shackles of wall clock time. I'm also really excited about this because creating this type of synthetic training data has so many benefits for us as the consumer. For instance, training robots in the home. Do we really want a bunch of data being collected in our most intimate locations inside our houses? Synthetic data provides a very interesting route to train these AI models in a privacy preserving fashion. Of course. I'm left wondering if that gap between simulation and reality can truly be overcome, but what it seems is that gap is going to continually close further. Who knew everyone was throwing shade on the Metaverse when it first hit public consciousness?
Rev Leboridian
Like who really wants this 3D successor to the Internet?
Bilaval Sidhu
Now I'm thinking maybe the killer use case for the Metaverse isn't for humans at all, but really it's for robots. The TED AI show is a part of the TED Audio Content Collective and is produced by TED with Cosmic Standard. Our producers are Dominic Gerard and Alex Higgins. Our editor is Banban Sheng, our showrunner is Ivana Tucker and our engineer is Asia Pilar Simpson. Our researcher and fact checker is Christian Aparta, our technical director is Jacob Winick and our executive producer is Eliza Smith. And I'm Bilaval Sidhu. Don't forget to rate in comments and I'll see you in the next one.
Elise Hu
At Schwab, how you invest is your choice, not theirs. That's why when it comes to managing your wealth, Schwab gives you more choices. You can invest and trade on your.
Instacart
Own, plus get advice and more comprehensive wealth solutions to help meet your unique needs. With award winning service, low costs and transparent advice, you can manage your wealth your way at Schwab. Visit schwab.com to learn more.
Hydro
Want a workout that actually works? Hydro delivers a full body workout that hits 86% of your muscles in just 20 minutes. Rowing with hydro combines strength and cardio with thousands of workouts led by Olympians in breathtaking locations. No wonder nine out of ten members are still active one year later. Try Hydro risk free at hydro.com and use code RO to save up to $475 off your Hydro Pro Rower. That's H Y-R-O-W.com code row each customer is unique. Every shopper is different. So why are your ecommerce search results still one size fits all? Generic experiences don't create loyal customers, they drive them away. Coveyo's AI search delivers relevance at every step, anticipating what shoppers want even before they ask the result. Better discovery, higher conversions, more profit. Visit coveo.com commerce to see how it works.
Episode: How AI Robots Learn Just Like Babies — But a Million Times Faster
Guest: Rev Leboridian, Vice President of Omniverse and Simulation Technology at NVIDIA
Hosted by: Bilaval Sidhu
Release Date: December 3, 2024
The episode opens with host Bilaval Sidhu welcoming Rev Leboridian, whose unique career trajectory spans from Hollywood visual effects to leading NVIDIA's advancements in AI and simulation technology.
Key Discussion Points:
Rev’s Role at NVIDIA:
Rev explains his position as VP of Omniverse and Simulation Technology, highlighting the transition from creating fantasy worlds in entertainment to developing realistic simulations for robotics.
“I joined NVIDIA 23 years ago with the hope of taking what I was doing in movies… to become an interactive experience, like in a video game or in a immersive experience like XR.” [08:02]
NVIDIA's Transformation:
Rev delves into NVIDIA's shift from a gaming hardware company to a leader in AI, emphasizing the introduction of GPUs and programmable shading as pivotal advancements.
“We introduced programmable shading, and that feature... unlocked this whole new world in computer vision and certainly caught the whole world's attention.” [09:29]
Rev discusses the breakthrough moments that positioned NVIDIA at the forefront of AI development, particularly focusing on the advent of deep learning.
Key Highlights:
CUDA and Deep Learning:
The introduction of CUDA in 2006 facilitated general-purpose computing on GPUs, which became a catalyst for deep learning innovations. Rev recounts the pivotal moment when AlexNet revolutionized image classification in 2012.
“They beat all of the benchmarks in image classification with a deep learning neural network called AlexNet... This was insane because up until that point, basically every other approach for the ImageNet benchmark was not really winning.” [14:56]
Automated Algorithm Creation:
Rev marvels at how AI began writing its own algorithms, surpassing human-designed models through vast computational power and data.
“Now software is writing software. There's two basic ingredients, a supercomputer, lots of computation, and you give it a whole bunch of data or examples to figure out the algorithm for you.” [17:30]
The conversation shifts to the concept of Physical AI, exploring how robots are taught to understand and interact with the real world through simulations.
Core Concepts:
Definition of a Robot:
Rev expands the definition of robots beyond traditional humanoid forms to include any agents that perceive, decide, and act within their environments.
“A robot perceives the world around us, makes decisions, and acts upon the world. By this definition, many things are robots, not just the ones we normally think of.” [21:10]
Omniverse and Simulation Technology:
Omniverse serves as an operating system for creating highly accurate simulations that allow robots to learn physical interactions rapidly.
“Omniverse is about doing simulations that are as physically accurate as possible. That's the key thing; it has to match the real world because otherwise our robots would be learning about laws of physics from something that's just wrong.” [25:06]
Reinforcement Learning:
Drawing parallels to human learning, Rev explains how robots use reinforcement learning to experiment within simulations, accelerating the acquisition of physical intelligence.
“Robots learn in the same way through this method called reinforcement learning, where we throw them into a virtual world… and give them goals to achieve through millions of iterations.” [33:15]
Rev details the transformative impact of Physical AI across multiple sectors, highlighting current implementations and future potentials.
Notable Applications:
Autonomous Vehicles:
The deployment of self-driving cars by companies like Waymo demonstrates the practical applications of AI in transportation. Rev shares a personal anecdote witnessing Waymo vehicles in Austin.
“This was unimaginable 10 years ago and now it's become mundane.” [35:34]
Industrial Automation:
Humanoid robots are being developed to alleviate labor shortages in factories, warehouses, and other industrial settings.
“There’s a huge demand for humanoid robots that could go work in some of these spaces.” [49:24]
Agriculture and Manufacturing:
AI-driven robots can perform tedious tasks with precision, such as weed removal in agriculture, reducing the need for harmful pesticides.
“If we can manufacture intelligence and this intelligence can go drive, be embodied in the physical world and do things inside the physical world for us, why won't we have radical abundance?” [56:01]
The discussion broadens to envision the integration of robots into personal spaces and daily routines, drawing inspiration from popular culture.
Future Insights:
Humanoid Robots in Homes:
Rev addresses the skepticism around humanoid robots, advocating for their design to match human environments for broader applicability.
“If you can build a general-purpose robot brain, then the most natural kind of physical robot to build... would be something that's human-like.” [47:03]
Cultural Acceptance:
Acceptance of robots varies globally, with some cultures like Japan being more receptive to humanoid robots in personal settings.
“In the US and the West, we're probably more cautious, but in countries like Japan, they love them and want them.” [48:57]
Beyond Humanoids:
Rev hints at the potential for non-humanoid robots enhancing personal devices and virtual assistants, moving towards more integrated AI experiences.
“We're already augmenting ourselves with AI when we use our phones to ask questions. It's that blend of AI with a Jarvis experience that's immersive with XR.” [50:28]
Rev emphasizes the importance of ethical frameworks and safety protocols in the deployment of physical AI to prevent misuse and ensure societal benefits.
Ethical Points:
Human Oversight:
Ensuring that humans remain in the loop is crucial for safe AI operations. Rev underscores the necessity of having humans monitor and control AI actions.
“We have to ensure that there's always some human in the loop... the ability to turn it off, that nothing happens without our explicit knowledge and permission.” [52:09]
Productivity and Abundance:
Rev is optimistic about AI contributing to economic productivity and potentially creating a world of radical abundance by handling tasks humans find tedious.
“Countries that have more people have more GDP. When we take physical AIs and apply them to the physical world, it's almost like we're adding more to the population and the productivity growth can increase.” [55:01]
Bilaval Sidhu wraps up the conversation by reflecting on NVIDIA's strategic advancements and the promising future of Physical AI.
Final Reflections:
NVIDIA’s Strategic Vision:
Bilaval praises NVIDIA for their long-term strategy in leveraging gaming technology to advance AI, creating a digital twin of reality that benefits both industries and consumers.
“Who knew everyone was throwing shade on the Metaverse when it first hit public consciousness? Maybe the killer use case for the Metaverse isn't for humans at all, but really it's for robots.” [58:37]
Synthetic Data and Privacy:
The use of synthetic data in simulations offers privacy-preserving methods for training AI, alleviating concerns over data collection in personal spaces.
“Training robots in the home... synthetic data provides a very interesting route to train these AI models in a privacy-preserving fashion.” [58:33]
Vision of Radical Abundance:
Rev expresses confidence in AI’s potential to revolutionize industries and enhance human life, envisioning a future where technology fosters abundance and allows humans to focus on fulfilling pursuits.
“If we can manufacture intelligence… why won't we have radical abundance?” [56:23]
This episode of The TED AI Show offers an in-depth exploration of how AI-driven simulations and Physical AI are poised to transform industries and everyday life. With insights from NVIDIA’s Rev Leboridian, listeners gain a comprehensive understanding of the technological advancements, ethical considerations, and future possibilities that come with integrating intelligent robots into our physical world.
Notable Quotes:
Note: This summary excludes advertisements, introductory messages, and concluding segments that do not pertain to the core content of the episode.