Summary9 min read

Invest Like the Best, EP.465: Sergey Levine – Building LLMs for the Physical World

Podcast Host: Patrick O’Shaughnessy
Guest: Sergey Levine, Co-founder & Researcher at Physical Intelligence
Date: March 31, 2026

Overview

In this episode, Patrick O’Shaughnessy speaks with Sergey Levine, a leading researcher in robotic foundation models and co-founder of Physical Intelligence. The conversation centers on building general-purpose AI models for physical robots—enabling "physical intelligence"—drawing analogies to the development of language models, and exploring the challenges, breakthroughs, and promises of creating robots that can perform a wide array of tasks in real-world environments. Key themes include the importance of generality, data, common sense, the interplay of form and function, and the potential impact on industries and society.

Defining Physical Intelligence and the Generalist Approach

What is Physical Intelligence?

Sergey Levine (03:42): “Fundamentally, the goal at Physical Intelligence is to develop robotic foundation models that can control basically any embodied system to do any task.”
The aim is to build models for robots analogous to foundation language models but for any task performable by a physical, actuated device.

Why General Models Over Specialization?

Language models could outcompete specialized solutions because they learn from broad, weakly-labeled data, building more robust world models.
In robotics, a general model trained across tasks/environments could learn physical interaction at a deeper, transferable level.
Sergey Levine (04:38): "If we can draw on data from many sources, many applications, many robots, then we can have a model that has a physical understanding, and then it'll be much, much easier to put new applications on top."

Key Discussion Points

The Trade-Offs and Challenges of Breadth

Exciting demos are usually narrow and highly controlled, but true generalization must work in open, often mundane, environments.
General models are harder to demonstrate effectively, but are more transformative in the long run.
Levine (06:26): "The way to have a really exciting demo is to pick a really cool task, control everything else... The point of generalization is that it does something relatively mundane that any human could do, but it does it in any situation."

Stake and Vision: Robotics as a Toolkit

Unlocking a "Cambrian explosion" of robotic applications—removing obstacles so others can build on top.
Levine (07:38): “Something like what happened with PCs and the internet could happen in the world of robotics, but it can’t today because you have to solve the intelligence problem first.”

Humanoid Robots: Pros, Cons, and Flexibility

Humanoid form captures imagination, but AI shouldn’t be limited to human-like bodies.
Optimal embodiments for tasks may be radically different—imagine swarms or ceiling-mounted robots.
Levine (09:10): "Ultimately they don't have to be constrained to look like humans at all. You can build the right tool for the job."

Going Beyond Human Imitation

Machines could be built to outstrip human abilities in size, dexterity, or miniaturization—especially in domains like surgery.

Timeline of Robotic Research: Milestones & Impact

1980s: First end-to-end learning systems for autonomy.
2010s: Deep reinforcement learning emerges, crucial for superhuman performance.
Present: Multimodal LLMs (language, images, actions) help import "common sense" to handle tail cases.
Levine (14:07): “The advent of multimodal LLMs that can be adapted to robotic control to bring in that common sense... is a really important advance.”

Approaches at Physical Intelligence

Sergey’s Personal Path

Switched from computer graphics to robotics post-PhD.
Sought AI systems that "get better the more they do things".
Pursued collective learning (multiple robots), but found combining web-scale knowledge with RL crucial for generalization.

Technical Methods

Building "vision language action models": pre-trained on text, adapted with image data, then real robot data.
Tasks utilize "chain of thought" reasoning: robot interprets instructions, breaks down tasks, and improves with RL.
Example: Espresso-making robot that improved efficiency and robustness through practice.
Levine (18:32): "So the way these things are trained is they're first trained on text data, then they're adapted with lots of image data from the web to understand images. And then they're adapted to robots with lots of very diverse robot data."

Sensor Minimalism

Surprisingly, effective learning can compensate for limited hardware sensors—barebones setups can still succeed.

Data Collection Philosophy

Major unknown: "How much robot data is really needed?"
The best systems will be those useful enough to gather data while deployed, mirroring Tesla’s approach in self-driving.

Surprises, Paradoxes, and Common Sense

Surprising Discoveries

More progress on dexterity and embodiment transfer than expected, with little bespoke engineering.
Levine (22:27): "We could also get these systems to perform very dexterous behaviors without really doing anything particularly special for that."

Moravec’s Paradox

The public consistently underestimates the difficulty of “easy” physical tasks for robots, and overestimates the difficulty of tasks that are hard for humans.

What is Common Sense in Robotics?

The ability to apply world knowledge contextually to new physical tasks—opposite of muscle memory.
Levine (24:58): “Common sense, in my mind... is when you know something because you saw it, and now you are in a situation where that fact is highly pertinent.”

Long-Horizon, Chain-of-Thought Reasoning

Chain-of-thought enables robots to perform long tasks by breaking them down semantically.
High-level, language-based coaching can help machines generalize and improve—“coaching” by humans labeling semantic commands rather than low-level actions.
Levine (27:26): “And make it better just by talking to it.”

Risks, Challenges, and Controversies

Why Might 2050 Kitchens Not Have Dishwashing Robots?

Sociotechnical challenges outlast technical ones—people must be comfortable with imperfect robots, especially in sensitive settings.
Technical risk: coping with the breadth and unpredictability of home environments.

Generality—The North Star

The principal design goal: systems that generalize and improve autonomously.
Levine (30:31): “The most important thing to get right is to get the system to be general… particularly with respect to how it can be improved.”

Simulation vs. Real-World Data

Debate over simulated vs. real data dominates research: locomotion often uses simulation; manipulation relies on real-world data and large models.
Which path wins remains open.

Cool vs. Useful

Technical teams at Physical Intelligence stress the technology against challenging (and “cool”) tasks, but usefulness trumps novelty.
Levine (33:04): “Subject to the constraint that it's useful, make it as cool as possible.”

Robot Olympics & Evaluating Capabilities

Everyday Tasks as Benchmarks

Real benchmarks for robotic generality aren’t athletic feats, but prosaic daily tasks—doors, dishes, shirts, oranges—where systems must generalize.
Most can be accomplished with the general foundation model, without special case engineering.
Levine (34:08): “We could solve almost all of them... If anybody watches those videos... we literally use this as a test of our task onboarding process.”

Superhuman Physical Abilities

Robots can surpass humans by optimizing for speed and efficiency—by e.g., removing human-centric pauses and hesitation.

Form Factors and Flexible Innovation

AI and foundation models should unshackle hardware experimentation.
Lowering the barrier for new form factors is key, as innovation comes from widespread, creative tinkering, much as in early computing.
Levine (37:19): "If you could just put together a robot in your garage, load up a robotic foundation model and tell it to do a bunch of stuff... that can be a really powerful engine."

The Science and Evolution of Research

Role of Common Sense & Physical Analogy

Physical analogies permeate human problem solving, from daily speech ("momentum") to advanced physics—robots' challenge is to internalize and reason with this.

The Research Community’s Size and Structure

Breakthroughs require wide experimentation—many failures, not just home runs, drive progress.
There is no singular personality type for great researchers; a passion for exploration, regardless of its source, is key.

Future Impact, Uncertainties, and Advice

Economic & Labor Impacts

Productivity likely increases analogously to LLMs in software—robots as human collaborators, not pure replacements.
Prepare for co-evolution, with specific domains requiring more or less autonomy and human interaction.

Biggest Uncertainties

Timeline: "Bootstrap" problem—robots must be useful enough to be deployed and collect their own data, and the activation energy for that remains uncertain.
Key question: Will robots learn mostly from demonstration, or autonomous RL in the wild? It changes deployment strategies profoundly.

For Entrepreneurs

Focus on understanding the economics of labor and where robots could augment it.
Avoid assumptions about data or hardware—needs vary by application and technological maturity.
Levine (52:48): “Coding tools are like a really nice example to look at for a template... More realistic template is not like the humanoid goes in and the people just leave... it’ll be this kind of dance...”

Notable Quotes & Memorable Moments

On Common Sense:
“Common sense... is when you know something to be true because you saw it, or you read about it, or you heard it, and now... you are able to make that connection, apply it to your situation grounded in the environment that you're in.” – Sergey Levine (24:58)
On Surprise Progress:
“We could also get these systems to perform very dexterous behaviors without really doing anything particularly special for that.” – Sergey Levine (22:27)
Chain-of-Thought in Robots:
“We found...that our models had gotten to the point where they could be improved just from supervising them with high level instructions... That actually improves its ability to generalize.” – Sergey Levine (26:00)
On Business Impact:
“A more realistic template is not like the humanoid goes in and the people just leave...it’ll be this kind of dance that we’ve seen with coding tools.” – (52:48)
On Inspirational Environments:
“I was absolutely shocked when I started working at Google at the level of leverage that I felt I could have... That's very special.” – Sergey Levine (63:49)

Timestamps for Key Segments

| Segment | Timestamp | |--------------------------------------------|------------| | Defining physical intelligence | 03:42 | | Why general models in robotics? | 04:38 | | Challenges with demos and generalization | 06:26 | | What success would unlock | 07:38 | | Humanoid robots: value and limitations | 09:10 | | Historical context of robotics milestones | 11:16 | | Combining web-scale knowledge & RL | 16:06 | | Vision-language-action models & methods | 18:32 | | Data collection bootstrap challenge | 21:02 | | Sensor minimalism in practice | 20:17 | | Surprising dexterity & generalization | 22:27 | | Moravec’s paradox and public perceptions | 23:31 | | What is common sense for robots? | 24:58 | | Enabling long-range, chain-of-thought | 26:00 | | 2050: If robots aren’t everywhere, why? | 27:29 | | The cool vs. useful debate in robotics | 33:04 | | Robot Olympics: everyday task benchmarks | 33:44 | | Overcoming human limitations | 35:34 | | Lowering the barrier to form factor ideas | 37:19 | | Researcher impact & failures as learning | 47:41 | | Researchers’ personality types | 49:51 | | Uncertainty and technical milestones | 58:11 | | Immediate next challenges | 60:10 | | Optimism relative to the field | 61:02 |

Closing Thoughts

Sergey Levine provides an optimistic but grounded look at robotics’ future, advocating for general-purpose models and a flexible, data-driven approach. He emphasizes the importance of community, experimentation, and cross-pollination with software breakthroughs like LLMs. While the timescale and precise pathways remain uncertain, Levine is confident that enabling broad experimentation will unlock transformative possibilities—much as the PC and Internet revolutions did for software.

For more episodes and deep dives into business and investment, visit Colossus.com.

Loading summary

Transcript117 lines

[00:00]
Patrick O'Shaughnessy
Most software companies try to maximize your time on their app to juice engagement. Ramp does the exact opposite. Ramp understands that no one wants to spend hours chasing receipts, reviewing expense reports and checking for policy violations. So they built their tools to give that time back, using AI to automate 85% of expense reviews with 99% accuracy. And since Ramp saves companies 5%, it's no wonder that Shopify runs on Ramp, Stripe runs on Ramp, and my business does too. To see what happens when you eliminate the busy work, check out ramp.com invest Every investor should know about Rogo, because Rogo AI's platform is not just another generic chatbot. Instead, it was designed to support how Wall street bankers and investors actually work, from sourcing diligence and modeling to turning analysis into deliverables. For me, three key things differentiate Rogo first, it connects directly to your system so it can work with your actual data. Second, it understands your workflows, how work really happens across a deal or an investment. And third, it runs end to end and produces real outputs the way the best people do auditable spreadsheets, investment memos, diligence materials, and slide decks that match your standards. This all comes from the fact that Rogo is built by finance professionals for finance professionals, and it's already being adopted by some of the most demanding institutions in the world. To learn more, visit rogo AI/invest, OpenAI cursor, anthropic, perplexity and Vercel all have something in common. They all use work os and here's why. To achieve enterprise adoption at scale, you have to deliver on core capabilities like sso, skim, RBAC and audit logs. That's where work OS comes in. Instead of spending months building these mission critical capabilities yourself, you can just use Work OS APIs to gain all of them on day zero. That's why so many of the top AI teams you hear about already run on Work OS. WorkOS is the fastest way to become enterprise ready and stay focused on what matters most, your product. Visit workos.com to get started. Hello and welcome everyone. I'm Patrick O' Shaughnessy and this is Invest. Like the Best, this show is an open ended exploration of markets, ideas, stories and strategies. Strategies that will help you better invest both your time and your money. If you enjoy these conversations and want to go deeper, check out Colossus, our quarterly publication with in depth profiles of the people shaping business and investing. You can find Colossus along with all of our podcasts@colossus.com Patrick O' Shaughnessy is
[02:17]
Sergey Levine
the CEO of Positive Sum. All opinions expressed by Patrick and podcast guests are solely their own opinions and do not reflect the opinion of Positive Sum. This podcast is for informational purposes only and should not be relied upon as a basis for investment decisions. Clients of Positive Sum may maintain positions in the securities discussed in this podcast. To learn more, visit Psum VC
[02:43]
Patrick O'Shaughnessy
My guest today is Sergey Levine, one of the co founders and researchers at Physical Intelligence. As a disclaimer, I'm an investor in Physical intelligence because I believe it's one of the most important companies tackling the problem of robotics. As you hear us discuss today, robotics has what I would call a scarecrow problem. All of these amazing physical devices are becoming ever more possible in all sorts of cool permutations. But what they all really need is an intelligence, a brain. And that is what they're developing at Physical Intelligence. They're trying to develop foundation models that can make any physical robot do any task in any environment. The nature of our conversation today is all of the problems facing robotics and all of the promise of solving these problems across the world. I hope you enjoy this great conversation with Sergey Levine. Sergey, this is going to be a real treat and a blast to learn about possibly the most exciting, impactful area of technology being developed. Just to set the stage, before we go back in time, maybe you could just define physical intelligence as you see it.
[03:43]
Sergey Levine
Fundamentally, the goal at Physical Intelligence is to develop robotic foundation models that can control basically any embodied system to do any task. Broadly speaking, you could imagine that in the same way that a language model is kind of rapidly evolving towards a system that can do any task that can be expressed in language. What we would like is to build a new class of models that can do any task that can be done by a physical actuated device. Part of the thesis of this company is that we believe that doing it at the full level of generality might actually in the long run be easier than trying to special case very specific narrow application domains. Again, in much the same way that for language models, it turned out to be easier in some ways to solve natural language tasks in their full generality than to narrowly target like machine translation or sentiment analysis or whatever.
[04:30]
Patrick O'Shaughnessy
That may not be obvious why you would make that bet versus a robot that just does your dishes or something. What are the key trade offs to understand and why make the decision that
[04:39]
Sergey Levine
you made in the world of natural language? We saw that there were a lot of efforts to develop domain specific solutions that tackled specific problems. Somebody would spend a lot of time thinking about how English differs from French, and then build a machine translation system. The reason that language models took over for all of those different application domains is because they can leverage much broader sources of data. It's not even as simple as saying like, oh, we have this data for this application, this data for this application, we merge everything. It's actually more than that. When you can leverage weakly labeled data, in the case of language models that you just mine from the web, you actually learn more about the world. So you establish foundation of world understanding, and then on top of that foundation turns out to be much more effective to build out different applications. To bring this into robotics. The calculus doesn't look quite the same because in robotics we don't have like an Internet sized data set that we can just draw on. But this notion of understanding the world, if anything, is actually more important in robotics because if you have many different tasks, maybe even many different physical systems, then you can go from training individual dishwashing specialists or laundry folding specialist, and instead train a model that actually understands physical interaction. People can master new skills very, very rapidly. Because we understand physical interaction. We can intuitively grasp what's going to happen in this new, unfamiliar situation. They'll just like bootstrap things really, really quickly. If we can draw on data from many sources, many applications, many robots, then we can have a model that has a physical understanding and then it'll be much, much easier to put new applications on top of that platform.
[06:09]
Patrick O'Shaughnessy
What is the hardest part about building in this way for you? When you see other approaches that are more maybe legible to the average person? Oh, there's a robot moving around doing this one specific thing, it looks a certain way. What's the hardest part about this approach as you're doing it?
[06:26]
Sergey Levine
I think this has actually been kind of an issue in my whole career because when you work on robotic learning, the more general, the more this becomes important. Is effective robotic learning. Effective generalization isn't actually the optimal way to have like a really exciting demo. The way to have a really exciting demo is to pick a really cool task, control everything else in the environment, like set it up so that it's perfectly clean, perfectly pristine, and just make it work in that one setting. That's the way you make a robot demo and generalization. You can't just show it in one spot. The point of generalization is that it does something relatively mundane that any human could do, but it does it in any situation. So we had some demos that we released last April where we showed our robot cleaning kitchens. We it's cool, but if you watch an individual video out of context, it's just like, okay, it's like picking up plates, like anybody can pick up plates, except that we just put it into that home just for that demo. And it never had training data from that setting. So obviously you kind of have to like, understand what's going on to appreciate why this is actually pushing the frontier.
[07:24]
Patrick O'Shaughnessy
What is your model for the stakes of what you're doing, if you are successful? I'm curious for you to define what that would mean, successful, other than we cross this chasm of general physical intelligence. But if you cross that line, then what?
[07:38]
Sergey Levine
One of the things that I think would be really, really exciting, that would be enabled by a general purpose embodied foundation model is the ability to unlock people's imagination in how they build robots and other embodied systems. Personal computers were a really big deal in my mind because it made it possible for lots of people to hack together all sorts of really cool stuff. And there was this Cambrian explosion of amazing applications that started in the 90s and so on, and then was further accelerated by the Internet. And I think something like that might happen in the world of robotics, but it can't happen today, because if you want to put together some cool new robotics application, some cool new robotics idea, you kind of have to build this monstrous stack and you need to basically solve the intelligence problem. But if there is a solution that someone can build on top of, there's a foundation model that you can prompt that'll provide basic functionality and then you can fine tune it a little bit or adjust it in some way to your application. Now you can. It actually makes it a lot more attractable for lots of people, lots of companies, lots of individuals to try out all sorts of different things. Sometimes we think that robots are going to be one thing. There's people, and now we're going to make like metal people and that'll be robots. But I don't think that's how it's going to be because no technology has been like that. It's going to be more like kind of a toolkit where you can put together all sorts of really cool applications, get really creative with it. You know, maybe I'm going to make a robot with like five arms and this one is going to look like that. It's going to move. This one's going to hang from the ceiling and figure out kind of the right thing to tackle your domain, maybe also experiment with software, but you need the right platform on top of which to do that. And I Think the foundation model can be that thing?
[09:05]
Patrick O'Shaughnessy
What are the, in your mind, the pros and cons of the humanoid approach to robotics?
[09:10]
Sergey Levine
There's a lot of value to that, There's a lot of value to capturing the imagination. And there's a lot of value getting people to think about what the future might look like in a way that's understandable. In my mind. It's one of many possible kinds of robots that we're likely to have. The challenge of intelligence looks very similar for all these different robots. I don't think we should be tackling intelligence in the context of one specific body. I think we should handle it in a general way because otherwise it's just really hard to get a handle on this. We need lots of data. The cool thing about being able to build robots is that ultimately they don't have to be constrained to look like humans at all. You can build the right tool for the job. You could imagine that you're building a house with a robot that is a swarm of 1,000 quadcopters. And I think that in the future we'll have a robotic foundation model which can then be adapted to all sorts of applications. And it might really run the gamut from bulldozers or something to humanoids to robotic arms. Maybe it would need to be adapted to each one, maybe it would need to be fine tuned. Maybe it would need something in context to understand how that body works. But the fundamentals of how you interact with objects, how things move in the world, how causality works, like that's all conserved for all of these different systems.
[10:19]
Patrick O'Shaughnessy
Do you have a favorite example of what might be possible with true general intelligence that might not be possible with a humanoid, only intelligence or something?
[10:27]
Sergey Levine
There are a few things that I think are worth thinking about. One is that we can make machines that are very big and machines that are very small. This is not by any means a short term thing, but in the long run, I think there's lots of really exciting applications in medicine and surgery where we not only might in the long run not be limited to robots that look like humans, we might not be limited to robots that can even be controlled by humans. Currently, for example, in robotic surgery, it's done entirely to teleoperation. So you need something that are personally neutral in real time with the right level of dexterity. And of course, that limitation holds for current learning enabled systems too. But in the long run, we could imagine addressing that.
[11:02]
Patrick O'Shaughnessy
Think about the most important hash marks on the timeline of robotics research that have gotten us to here I always think it's super helpful to set the historical context before we talk about what state is today and where we're going. Could you walk us through that at some level?
[11:17]
Sergey Levine
Doing end to end control for robotic systems is a very, very old idea. The first, for example, autonomous driving systems that used end to end learning, they existed in the 1980s. Alvin was 1986 or 87. And that was a driving system that was demonstrated to drive on highways controlled by a neural network and then from a camera. The neural network was tiny. There are some very venerable concepts, but historically what has been really difficult in robotic learning is that you need a system that handles the application you want to address that is cost effective to train for that application. Meaning that you don't need like a huge amount of data for every single application you want to tackle. Handles long tail scenarios with common sense. So if something weird takes place in the world, it needs to have a reasonable response to it. And then also for the thing that it's actually supposed to do, it needs to be robust, fast and reliable. And getting all those things together is very, very hard. Because with machine learning it works best when there's a lot of data. So if you sort of naively approach a robotic problem and say like I want to do washing dishes, the obvious thing to do is to collect like an enormous amount of data of washing dishes. But that's not cost effective because then you go on to the next application and you go through that process all over again. Being able to train general purpose models that can handle many tasks is essential to this because now you need a lot less data for each new task. But then even further, and this is the thing that has probably changed the most in the last few years, you also then need to handle the unusual scenarios. For the unusual scenarios, you are probably not going to have experience. What you need to rely on is knowledge that you've acquired from other sources that you can ground in that new situation. And people are extremely good at this. If you're driving a car and there is something going on in the middle of the road and someone put up a sign saying, don't go here, there's the gas leak or something, you've probably never experienced that before, but you can put these things together and figure out what you're supposed to do in that unusual situation because you have common sense, this has been a huge mystery in robotic learning world. Where do you get that common sense? And this is what's changed in the last few years. Because it turns out that multimodal language Models are really good at pulling in knowledge and trying to articulate that knowledge. They're not very good at grounding that knowledge in physical situations, but they know stuff. There is a path to get that common sense by essentially leveraging the knowledge that is contained in multimodal LLMs. But there's also a challenge because you have to somehow plug into that knowledge in the right way. You can't just show it a picture and say, what would you do here? Because it doesn't have the context, it doesn't know that you're a robot. This is what you look like, this is what's going on. That's a technological challenge. And we've made some headway in addressing that technological challenge. The research community in general has. But most importantly, it's kind of that light at the end of the tunnel. Now we have this way of pulling in lots of knowledge which can help us handle those long tail scenarios.
[13:58]
Patrick O'Shaughnessy
Are there hashmark equivalents on the timeline of Alexnet or the Transformer? Are there big major events that you think everyone will point to when writing the history books about this?
[14:07]
Sergey Levine
I think it's very early on right now to answer that definitively. Probably the first end to end learning systems which were in the 80s, that's definitely a milestone. The first deep reinforcement learning systems which were in the early 2010s, those are probably a milestone because deep reinforcement learning gives us a way to go beyond human level performance, which I think will be essential for robotic systems. And then there's the more recent stuff, but I don't know how that's going to shake out as far as whether that's something that people will point to. But I do think that the advent of multimodal LLMs that can be adapted to robotic control to bring in that common sense, I do think that's a really important advance. I think we're probably going to see quite a few important advances in the next few years and maybe those will be the things people point to.
[14:49]
Patrick O'Shaughnessy
As your business scales up, everything gets more complex, especially your compliance and security needs. With so many tools offering band aids and patches, it's unfortunately far too easy for something to slip through the cracks. Fortunately, Vanta is a powerful tool designed to simplify and automate your security work and deliver a single source of truth for compliance and risk. There's a reason that Ramp, Cursor and Snowflake all use Vanta. It frees them to focus on building amazing, differentiated products knowing that compliance and security are under control. Learn more@vanta.com invest I know firsthand how complex the tech stack is for asset management firms. And seemingly every new tool and data source makes the problem even worse, adding more complexity, more headcount and more risk. Ridgeline offers a better way forward, one unified platform that automates away the complexity across portfolio accounting, reconciliation, reporting, trading, compliance, and more, all at scale. Ridgeline is revolutionizing investment management, helping ambitious firms scale faster, operate smarter, and stay ahead of the curve. See what Ridgeline can unlock for your firm. Schedule a demo at Ridgeline. AI. Can you tell us your own personal history of approaching the problem? Maybe the origin of when you first became interested and why, and then how you've decided what to spend your personal time and attention on ever since then?
[16:06]
Sergey Levine
So I started working in Robotics in 2014, after I finished my graduate degree and started a postdoc with Professor Peter Abbeel at UC Berkeley. I actually hadn't worked on robots before, but I figured I should get a little bit more education after finishing my degree and his lab worked on robots. So I tried to apply what I had learned previously to robotics. Before that I worked on computer graphics. The thing that I've always wanted to really figure out is how to get AI systems. They get better and better the more they do things, because I think that's tremendously powerful. If you can have a system that gets better and better the more it does something and it just keeps getting better and there's no limit, then it can master all the skills you want it to do. Initially, I tried to approach it in a very blank slate way. You start with nothing, you practice a particular skill and you get better at that skill. You can do that in a limited setting and you get something that works. But it's very hard to turn that into a general system that can work in open world settings. Because if I practice something over here and then it goes over there, now something is different and needs to practice all over again. When I worked at Google afterwards, I tried to see if we can do that, but now parallelize it across many robots. So collective learning. Can you put 20 robots in a room and have them all learn together? And that works and it generalizes, but it's very hard for that to handle these tail cases, these edge cases. Now it becomes this savant of this particular task and that's all it knows in the world. The next step is, what I mentioned before is combining this ability to practice skills with lots of prior knowledge. And that's actually a really, really hard problem. It's not just in robotics where it's a Hard problem. I think it's a hard problem in all of AI because arguably the two big impressive results in AI over the last few decades have been generative AI and deep reinforcement learning. Like, if you want a single example to epitomize this generative AI, that's like LLMs and deep reinforcement learning. AlphaGo. They're both very, very impressive, and they're very impressive for very different reasons. The generative AI is impressive because it can reproduce some of the things that humans can do. Like it can draw pictures that look like human pictures, write text. Deep RL is impressive for the opposite reason. It does things that humans hadn't thought of. The big challenge, and this is what I'm leaning up to and what I hope to figure out here at Physical Intelligence is has to combine those threads, how to bring in all of that knowledge that you get with generative AI, but also go beyond just human level performance with reinforcement learning.
[18:29]
Patrick O'Shaughnessy
What literally have you done and are you doing to make that happen?
[18:33]
Sergey Levine
In the past few years, we started off first by developing the basic foundations. The basic foundation is what's called a vision Language action model. A vision language action model, you can think of as an LLM that has been adapted for robotic control. So the way these things are trained is they're first trained on text data, then they're adapted with lots of image data from the web to understand images. And then they're adapted to robots with lots of very diverse robot data. That's a starting point. That's a way to take all of that web knowledge, get it into a model that can control robots and get some interesting behaviors out of it. And then from there we studied two threads. How to get this thing to handle unusual situations with common sense and how to get it to improve with reinforcement learning. The way you get common sense is by essentially using chain of thought. The robot enters a scene and instead of directly starting to move, it thinks about what it was asked to do. So if it was told clean up the kitchen, looks at the scene and says, based on this, I should pick up the plate, and then it goes and does it. So that unlocks all this prior knowledge because those intermediate inferences benefit from the web scale pre training that handles edge cases. And then the reinforcement learning part comes in after you've practiced it a few times. And you can keep getting better and better at the task directly through your experience. For example, we had this demo on making espresso. That system practiced making those espressos many, many times and used that to improve robustness, improve Speed improve throughput. And we're not done with that. Like I think there's a lot more to do there. But we have the starting point.
[20:03]
Patrick O'Shaughnessy
The robot data itself is the right way to think about it. I'm looking at the Gen one of these things. I see a camera here. Maybe there's some sensors somewhere else. Is effectively the data being gathered by various sensors strategically placed on the robot at different parts? Yeah.
[20:17]
Sergey Levine
Something I'll say about sensors is that I think you can actually get away with less than one might think and still do quite a lot. This platform here has three cameras. One on each wrist and a base camera. It doesn't have touch sensing. It doesn't have force sensing. It's very bare bones and very low cost. I'm sure that more sensors could make it better. But a good learning method can actually compensate for deficient sensing fairly well. The wrist cameras are essentially a touch sensor in disguise because you can see local deformations when you touch something.
[20:43]
Patrick O'Shaughnessy
If I think about the analogy to the expert Systems of the 80s and 90s in basic AI to the lesson that scales all you need and the sort of counterintuitive nature of that, that you're not teaching it any specific thing. Just blasting it with data. And there's this reservoir of Internet data. Talk about how to create the reservoir of data needed for this.
[21:03]
Sergey Levine
So I don't think anybody really knows how much robot data is needed to have truly generalizable and powerful embodied AI. My sense is that we actually don't need to know. What we need to do is get to the point where these systems are useful enough that they can go out into the world and gather more data themselves. Tesla doesn't worry about how much data their cars can collect. If anything it's the other way around. That's a little too much data. The key is not so much to quantify here is exactly the price tag of getting the ultimate robot data set. The key is to get a system that can go into the world that's useful enough that does a wide variety of different things and that can keep pulling in more data.
[21:38]
Patrick O'Shaughnessy
You brought up the example of Tesla. The beautiful system of a thing that's useful without the AI to begin with because the human drives it and it gathers data. Why then not start with your best guess at something that's useful as a single robot to have the same sort of flywheel thing happen?
[21:53]
Sergey Levine
I think it's a good idea.
[21:54]
Patrick O'Shaughnessy
And do you think that's an approach that you'll pursue?
[21:58]
Sergey Levine
I don't think that there's like one right answer, right? So I think there are some domains where deploying a system under human control makes a lot of sense. There's some domains where deploying a partially autonomous system is very reasonable. It's kind of domain dependent because robots aren't just one thing. Some people might not want a robot in their home that is constantly being controlled by a person off site, but maybe for some applications, that doesn't matter.
[22:18]
Patrick O'Shaughnessy
If you mark the start of physical intelligence through today, what has been the most surprising thing to you that you've discovered or the nature of how the research has gone?
[22:28]
Sergey Levine
One of the things that's been surprising to me is that I think we've made a lot more progress on dexterity than I thought we would. We had good reason to believe that if we just collect more and more data, that just steadily gets better. What was surprising is that we could also get these systems to perform very dexterous behaviors without really doing anything particularly special for that. The same, by the way, also applied to getting systems to work on different embodiments where we could get our models to work on all sorts of other robots, including robots with multi fingered hands, robots with different numbers of degrees of freedom. And obviously we needed to get data and we needed to fine tune the model, but the model itself didn't need to change. It didn't even need to be told through any kind of prompt what the robot was. And that was also surprising to me because I would have thought that we would need some fancy techniques to adapt the system to faster, more dexterous, more complex tasks, and also to different kinds of environments. But it actually seems to generalize pretty well.
[23:21]
Patrick O'Shaughnessy
I'm always interested in like the spectrum of capabilities and especially where the systems today are more advanced than you think people would probably expect, and where they're less advanced than people might expect.
[23:31]
Sergey Levine
This is something that's always been very tricky to understand in robotics. There's this idea that roboticists always talk about called Moravec's paradox. It's actually true in all areas of AI, but especially in robotics, this is a big deal. We kind of have a cognitive bias to think that things that are easy for us will be easy for the machine. Solving calculus problems is difficult for most people. Picking up a cup is easy for most people. So we think, oh, machines should be able to do this, but it's actually the other way around, that there are things that are easy for us because they have to be, otherwise we wouldn't survive. We're very good at spotting the tiger in the jungle because the people that weren't so good at it got eaten by the tiger and they're not around anymore because of that. We have this cognitive bias and we think that there are things that should be very easy, but they're actually very difficult. Engineering challenges. However, something that is changing is that machine learning slightly changes that equation. Programming something by hand to pick up any cup anywhere, that's difficult. Getting a machine learning system to do it, if you have data for it, it's actually not that difficult. And I think increasingly what we'll see is a shift where domains, where collecting data is straightforward, they actually end up falling into the easy bucket over time, even if they are physically intricate. But there will be domains where collecting data is difficult, where you need to use more common sense, where you need to reason at multiple levels of abstraction and connect physical skills that you've learned in other areas to knowledge that you got from the web. And those will be tough. And that's where we'll need more technology advances.
[24:54]
Patrick O'Shaughnessy
What is the science of common sense? When we say that, what does that
[24:59]
Sergey Levine
mean for the purpose of robotic learning? We can think of it as applying semantic inferences using knowledge learned from other domains to the current physical task at hand. You can think of common sense as. As the opposite of muscle memory. So muscle memory, like if you play a sport, you practice something a lot, you hardly think about it, you just kind of do it on autopilot. Common sense, in my mind, I don't know if this is the conventional definition, but I think it's a reasonable definition, is when you know something to be true because you saw it, or you read about it, or you heard it, and now you are in a situation where that fact is highly pertinent to what you need to do, and you are able to make that connection, apply it to your situation grounded in the environment that you're in, and make the right decision.
[25:42]
Patrick O'Shaughnessy
One of the other differences that's so interesting to me is people that have used, everyone's used chatbot now, you query it, you get an answer. Query, you get an answer. We're now seeing what happens with cloud code and other things where you give it something complicated and it's able to do a very long without failing. What's the similar thing, long range in robotics?
[26:00]
Sergey Levine
It's something that we're working on quite a bit right now, and the methodology is not that different. At some level, the way that our models work now, as I mentioned, is they use this kind of chain of thought process to reason about the task. When you have that, you can actually do very long horizon tasks. You can have a robot that goes and takes out all the dishes from the dishwasher, puts them in the correct cabinets, wipes down the counter, all that kind of stuff. The interesting thing here is that we found maybe about six months ago that our models had gotten to the point where they could be improved just from supervising them with high level instructions. You take a robot, you put it in a new kitchen, you ask it to clean the kitchen, it gets to work, and then it fails somewhere. So now, okay, what do you do? Well, you add more data. Traditionally, what we would do in that situation is add more teleoperation data to cover a wider range of kitchens. But what we tried kind of on a whim is to see, okay, well, what if we don't add more teleoperation data? What if we just add more data labeled with the semantic command? So basically, just take whatever the robot experienced and just label it with some semantic commands, but don't add any more low level actions. And that actually helps. That actually improves its ability to generalize. So what that means is that the bottleneck had actually shifted from the lowest level, meaning the robot's ability to physically do the task, to this like, middle level, where now the system is more bottlenecked by its ability to interpret the scene and select the correct next step, which can be supervised with language. And that's a big deal because now that means that someone can literally talk to the robot.
[27:26]
Patrick O'Shaughnessy
It's coaching, basically.
[27:27]
Sergey Levine
Yeah, exactly. And make it better just by talking to it.
[27:30]
Patrick O'Shaughnessy
We are in 2050 and there's no robot in my kitchen doing my dishes for me. What do you think the most likely explanation is for it not having gotten there by that point?
[27:40]
Sergey Levine
I can think of a few reasons. My suspicion is that there is a long tail of challenges that has to do with the interaction of technology and people. Autonomous cars aren't that different in this regard, where getting to a level of comfort with deploying autonomous vehicles on the road was a significant challenge that ran in parallel with getting the technology to that level. Early Tesla self driving was a bit controversial because it wasn't perfect. There was a question like, are people comfortable with this level of imperfection? Probably. There are some tasks for robots where people will be comfortable with something that's not perfect, something that needs to learn from its mistakes. There are some areas where people will not be comfortable. Are you comfortable with occasionally breaking your dishes? Maybe in a few years it will stop breaking those dishes, but maybe in the meantime it's not quite there. Are you comfortable with a robot like that in a home where there's, like, small children? Maybe not. That's okay. I think that figuring out how those factors interact and what that means for the timeline and for how these systems get better with experience, I think that's a tricky question. And I think it needs to be approached very carefully with a lot of sensitivity. And there may be some domains where it makes a lot more sense for these systems to be deployed and bootstrap and collect more data, and maybe other domains require more care.
[28:53]
Patrick O'Shaughnessy
Could you imagine a purely technical explanation for why something might not work?
[28:58]
Sergey Levine
I think the place where I would see the biggest technical risk is dealing with the breadth of different situations. If we were talking about a well defined but slightly chaotic environment, like cleaning hotel rooms or assisting human cooks in a restaurant, I have a very good sense for how to get that under control. If you're imagining a robot going into a home. One place where I can anticipate a challenge is that there are a lot of other unexpected things that can happen. And you need a system that's very good at inferring what's going on and adapting to it or reacting intelligently. And I think we have a lot of ideas for how we can approach it. But that is the hardest part of the problem. Because when you're in a situation where just about anything could happen and you're controlling a physical device that affects the world around it, then you really need to get things right, at least at some level, pretty much in every case. It doesn't mean that you always have to succeed, but it doesn't mean that you always have to do something sensible that people are okay with. And I think there are a lot of really good ideas for how to do that. But that is probably the most challenging part of the equation.
[30:02]
Patrick O'Shaughnessy
If I go back to thinking about the right model, to think about the physical intelligence approach, to doing this whole exercise, help me make it as simple as possible. So one might be we're going to build a whole variety of different kinds of form factors to do a whole variety of different kinds of things and mash all this data together and start to, you know, experiment with how we can on evals, make it better. Is that just the simplest way of doing it? Is there an even simpler way? And I'm asking because I'd love to then contrast it with some other approaches that you're interested in that you're not doing that others are doing.
[30:31]
Sergey Levine
In my mind, the most important thing to get right is to get the system to be general. In particular, to get it to be general with respect to how it can be improved. For example, hand designed robotic controllers are not very general with respect to how they can be improved because it requires a human engineer to go in and improve it. A learning based perception system is more general because all it requires is human labelers to go in and label more data. A system that learns autonomously from data that it gathers through its own experience is even more general because you don't need the human labelers. The key is this generality, particularly with respect to improvement. And the decisions we make are to a very large extent centered around that. I don't know if the correct design for a robot is to have three cameras. I don't know if it needs like a touch sensor. I think we're very agnostic to that. I think we'll try a lot of those different choices. I'm not even sure if in the long run it's going to have a language model. Maybe it'll have some other kind of model that's trained on very diverse data. The key is this level of generality.
[31:27]
Patrick O'Shaughnessy
What other approaches are the most interesting to you?
[31:30]
Sergey Levine
One thing that's a very important question in this area and something that I think the research community and the tech community has not fully answered, is the dichotomy between different data sources, particularly with respect to real data and simulation. It's a very controversial topic. I have a very strong opinion about it. But I think that it's worth acknowledging that if we look for example at humanoids, you've seen videos of humanoids doing all these acrobatics. There's a particular pipeline that makes that work, which is very heavily reliant on simulation and very light on real world data. Often actually zero real world data. And then there are the approaches that work well for robotic manipulation that often are the opposite. They often use very little simulated data, often use large amounts of real world data and very large foundation models. And it is surprising that in these two robotic domains the dominant approaches look so different. It may be that one will win out and there is a particular approach that can handle everything in the long run. Or maybe there's some sort of synthesis of these ideas that's important. I don't know the answer to it. I have subjective opinions. I think the approach we're taking is a very good one. But I think that it's interesting to look at that and see why is it that these things are so different.
[32:44]
Patrick O'Shaughnessy
Can you talk about the contrast between cool and useful. The Boston Dynamics robot is very cool. The backflip is super cool. Inverting the body, it all looks really good. I don't know what I need that requires a robot to do a backflip. So I'm curious how you think about optimizing around cool versus useful.
[33:05]
Sergey Levine
The strategy we've taken is subject to the constraint that it's useful. Make it as cool as possible. We make decisions first and foremost based on our assessment of what will drive the tech forward towards this truly general, broadly applicable robotic foundation model. But in doing that, we try to stress test it against the toughest challenges we can throw at it. The toughest challenges are often ones that look cool. We didn't set out, for example, to build a robot that can make espresso or can fold laundry. But in the process of building these general systems, we figure, like, these would be particularly challenging, particularly exciting things to try with them, to see how far we can push them.
[33:43]
Patrick O'Shaughnessy
Can you talk about the robot Olympics?
[33:45]
Sergey Levine
There was a gentleman named Benji Holson who used to work at Everyday Robots, part of Alphabet before it dissolved. He spends a lot of time thinking about tasks that robots could do. So he wrote a really interesting blog post a while back. There was this robot Olympics that was held in China where robots would run around on a track and jump and so on. But maybe these aren't the real challenges we should worry about. How about a robot Olympics centered around essentially everyday tasks that people do? That's kind of more of a paradox thing where tasks that people find really easy, but that robots struggle with. And he had things like opening a door, washing a frying pan with grease on it, using a plastic bag to pick up dog poop. Things that people don't find particularly challenging, but that no current robotic system can do. And he listed maybe a dozen of these things. This wasn't part of a concerted, like, research project. We had developed processes and systems for just ingesting new tasks that we wanted to use for all sorts of tasks. And we figured, okay, like, a good way to test this is to say, like, hey, here's like a big list of tasks. Let's just go through this process that we've developed and see if it works, basically. So it's almost like a test of, like, our internal operations and model training system. Then we tried these things and actually turned out that we could solve almost all of them. There's one we couldn't do, which was turning a dress shirt inside out because the grippers on this thing wouldn't fit inside the sleeve. So we probably need to change the gripper. I think on a technicality, we didn't succeed at peeling an orange because he said, do it with the fingers. And our fingers weren't strong enough. So we had to use, like, a little tool, like a little knife, basically everything else we could do. If anybody watches those videos, one thing that I think is important to keep in mind is we didn't, like, develop anything special for this. We literally use this as a test of our task onboarding process. There's something interesting there because it suggests the power of generality, that when you have this general system, you can really just, like, onboard all these crazy tasks without really doing anything particularly sophisticated.
[35:35]
Patrick O'Shaughnessy
I was curious before when you said superhuman ability, dexterity, or something like that, where we're limited by what we can do or maybe by what we can control, even if it gets smaller. What are some of the other dimensions that we might surpass human ability on in terms of physical ability? What are the other trend lines?
[35:53]
Sergey Levine
So here's a fun one. We were working on a task where our robot had to plug in things like power cables or Ethernet cables or something like that. When a person does this, obviously if you practice it a lot, you'll get really good at it. But when a person does this without having practiced a lot, you pause frequently, right? Because it's not a physical thing. You just have to parse what's going on. You have to make sure that it's, like, all aligned and all that stuff. So you do it very slowly. And if you're teleoperating a robot, you do it even more slowly because there's this level of indirection. It turns out to be pretty straightforward to go in and find all those pauses and remove them. You can speed things up further so you can get to a task where a person demonstrates what it means to succeed. And then you can have the robot practice the task and succeed in the same way, but a lot more quickly, a lot more efficiently. The most general way to do this is with reinforcement learning. But there are also, like, some simple tricks you can do that if you just want speed. So that's, like, one example of something where you can have a machine that does it a lot better. You know, at some level, you have, like, a processing bottleneck. Like, that's why the person does it slowly, because they have to process what's going on. But speeding up processing is something that people understand quite well.
[36:53]
Patrick O'Shaughnessy
In computer science, there's this amazing Michael Crichton novel called Prey, where it seems like For a given problem, there may be an optimal or set of optimal shapes of the robot to perform the task, and that what you should do is analyze the problem, then have something that can almost like morph or transform into the right form factor. How do you think about that? The innovation on the form factor side rather than the data and model side?
[37:19]
Sergey Levine
I think that in general, in robotics, the ability to innovate on form factors has been very constrained because of the AI challenge. If you have a traditional AI pipeline, like you're doing some motion planning and stuff like that, it's hard to just go and cobble together some new robot, because when you do that, you have to, like, characterize the dynamics of the system. You have to do society. You have to build up all this stuff. If you could just put together a robot in your garage, load up a robotic foundation model and tell it to do a bunch of stuff, maybe it won't be perfect at it, Maybe it needs more data to really perfect it, but you can at least get the thing moving. I think that can be a really powerful engine just to get everybody to experiment with this stuff. I don't think that I'm the right person to design the perfect robot. There are people here, of course, who are a lot better at that. But in general, I think that is just like with personal computers. So I think the key is to let people experiment and play around with it and just radically lower the barrier to entry for that. Then we'll see a lot of creativity. When we first started using computers, there was a limited number of form factors. Now you can have a computer in your phone, a computer in your car, embed a computer in your refrigerator. They're everywhere, and they're very different generality, good software, good foundation, on top of which you can build applications. Those are key to enabling that your
[38:28]
Patrick O'Shaughnessy
co founder Locky once described to me. The feeling of physical intelligence for a human is like learning how to ride a bike. Like, there's that moment when you didn't know how to do it, and then you do know how to do it. And that feeling is physical intelligence, that snap of understanding.
[38:42]
Sergey Levine
There's actually a physiological explanation for this. There are studies that were done in monkeys using tools, and you can actually find where in the brain which neurons activate for the monkey to figure out where its hand is. It turns out that if it's using a tool, they activate based on the location of the tool tip, not based on location of the hand. The tool being an extension of your body is a real physiological thing, like your Brain literally does that.
[39:04]
Patrick O'Shaughnessy
Knowing that, what does that do to impact the approach to your research?
[39:09]
Sergey Levine
It says that physical intelligence should be at some level agnostic to embodiment, that a good foundation model should figure out how to manipulate whatever body it's controlling, whatever tools it has at hand. There's basically one problem, not many different problems. There isn't like a humanoid problem and a car problem and a bulldozer problem and a robot bolted to the table problem. There is one problem and if you solve it as full level of generality, that's really, really powerful.
[39:36]
Patrick O'Shaughnessy
We're in the early stages of seeing some of the job and other sorts of transformation in businesses, in the economy, etc. That make possible. Certainly we've seen it in engineering. How do you think about what might happen or what you hope will happen when we're at a similar stage, whenever that happens to be for robotics, where all of a sudden we have this thing that's general, that's useful. The world's very efficient at deploying these things. People are creative. Where do you expect to see the world start to change most in the
[40:03]
Sergey Levine
early days, I really don't know. I don't think anybody would have been able to predict how the LLM stuff evolves and people would have guessed. But this is why I keep coming back to this idea that maybe the key is to let people try lots of things. One of the really amazing things about applications of LLMs is that they are really accessible and somebody could put together a really cool new prototype that under the hood is just prompting ChatGPT or something. But they can experiment with it, they can try it out, see what it does. And there's an amazing power to having lots of smart people rapidly iterating and prototyping lots of things. That's a lot of why physical intelligence has really put a premium on engagement. Like we've open sourced our models, we would like to engage with lots of other companies that are building robots because we all see a lot of power in this effect of having many people trying out lots of things.
[40:53]
Patrick O'Shaughnessy
What are the major controversies in the robotics community?
[40:57]
Sergey Levine
To me, a controversy someone gets in an argument with me at a conference. But I can tell you that the kind of arguments that I found myself in, and it's kind of an interesting trajectory that in the early days, the main argument I would have with people is does learning have a place in robotic AI? I think part of why that was often a controversial point is that in a traditional engineering pipeline, robots do look very different than software artifacts they're physical, they can affect stuff around them. There are safety considerations. There are a lot of weird situations they can get into. And it took a really long time for the robotics research community to really internalize that. You don't necessarily need to program in things like knowledge of physics. You don't necessarily need a physics simulator inside your robot when it's planning. We can actually have a learning system, figure all that stuff out. That was a very controversial thing for a very long time. I think at this point there's a lot of acceptance that learning is a really important part of robotics. But I don't think there's still universal acceptance that end to end learning is the right way to go. Basically, I don't think there's universal acceptance of the bitter lesson. The bitter lesson says that you should not program the machine to think the way you think it should think, but you should let it learn from data. And that is not a universally accepted idea. I think there's good arguments against it, but I think that in the long run, if we want that generality, especially generality in the machine's ability to improve, then we need it to primarily be learning from data.
[42:22]
Patrick O'Shaughnessy
What is the good argument against my
[42:25]
Sergey Levine
best attempt at Steel Manning? This is that if you want something reliable in a really complicated open world setting, then you can't afford not to use what you already know about the physical world. And we've got textbooks full of this stuff. So why don't we just plug in what we know from the textbooks?
[42:40]
Patrick O'Shaughnessy
What is compositional learning? Can you describe that?
[42:42]
Sergey Levine
One of my students, he had this idea where he asked a language model to provide a recipe for how to make a sandwich in International Phonetic Alphabet. International Phonetic Alphabet is these symbols that they use in a dictionary to explain how to pronounce a word. And it's very peculiar because it only ever appears for individual words in a dictionary. You never see free form text written in International Phonetic Alphabet. But if you ask a good language model, it will write paragraphs in IPA for you. And that is compositional generalization. That means that you have never seen this particular language, this particular Alphabet used to write paragraphs, but you understand paragraphs. You understand that it's compositional with different alphabets. So you can solve the problem. You can imagine the same thing coming up in robotics, that you've learned a repertoire of skills, and now you can combine and mix those skills and apply them to solve new problems.
[43:31]
Patrick O'Shaughnessy
It makes me wonder what the last type of tasks you think will be possible for a robotic system to Achieve.
[43:39]
Sergey Levine
I think changing a child's diaper will be really, really hard. This really is just Morvik's paradox all over again, that people are extremely good at certain things. We're very good at physical things. We're also very good at interacting with other people. And that makes sense. We have to be. That's a lot of our existence. So things that involve behaviors that interact with other people, where you have to help somebody, I think that's a lot harder than people appreciate. Elderly care, taking care of small children, I think those things are going to be hard and they're probably going to be harder than people think.
[44:10]
Patrick O'Shaughnessy
And the stakes are very high.
[44:12]
Sergey Levine
It's not just that the stakes are high in many places, it's just that it's probably the pinnacle of something that fools us into thinking that it's easier than it really is. We are so evolved for interacting with people and doing things physically. If you're helping somebody get up the stairs or get out of bed, you don't have to think very carefully about how you're going to do that. So I think it's really the pinnacle of Morbid's paradox.
[44:36]
Patrick O'Shaughnessy
If I think about an LLM as a brain and now it's effectively studied everything, I don't know how else to put it. And then I think about a robotics model's brain instead. What are the dark parts of the brain? What has it not been able to study? What are the areas that have just been really difficult, that matter, but have been hard for us to get into?
[44:56]
Sergey Levine
One of the things that people are remarkably good at is using physical analogies to understand other situations. I don't know whether this is something that LLMs can or can't do, but it is something that people use a lot. They use it in everyday life and they also use it for very sophisticated problems. So for example, you could say that company has a lot of momentum. That's a physical analogy. You know exactly what it means. I don't have to explain that statement to you, but if you actually think about that, it is quite a complex thing. There's like a lot riding on that word momentum. There is an interview with Richard Feynman where he talks about teaching, but he talks about analogies that he makes in regard to subatomic particles. And he says we use like the word spin. The thing is not really spinning. Like it's not like a spinning top. But all those kind of analogies help us make sense of it. And not just in a way that allows explaining concepts, but it actually leads to conclusions. It actually leads to inferences, and those differences actually make sense. We are so primed to interact with the physical world, so primed to have physical intelligence, that you can use it in everyday speech by saying that comedy has a lot of momentum and you can use it when advancing fundamental theoretical physics. That's kind of remarkable. I don't know if LLMs can do that. Maybe they can, but I think that really understanding physical interactions, causal structures, all that kind of stuff, there is something special about that. And it's clearly something that people get a lot of mileage out of.
[46:14]
Patrick O'Shaughnessy
Your finance team isn't losing money on big mistakes. It's leaking through a thousand tiny decisions. Nobody's watching. Ramp puts guardrails on spending before it happens. Real time limits. Automatic rules. Zero firefighting. Try it@ramp.com invest as your business grows, Vanta scales with you, automating compliance and giving you a single source of truth for security and risk. Learn more@vanta.com invest Every investment firm is unique and generic. AI doesn't understand your process. Rogo does. It's an AI platform built specifically for Wall street, connected to your data, understanding your process and producing real outputs. Check them out at Rogo AI invest the best AI and software companies from OpenAI to cursor to perplexity. Use WorkOS to become enterprise ready overnight, not in months. Visit workos.com to skip the unglamorous infrastructure work and focus on your product. Ridgeline offers one unified platform that automates away the complexity across portfolio accounting, reconciliation, reporting, trading, compliance and more. All at scale. Schedule a demo at Ridgeline AI. I'd love to talk about the role of researchers and the actual people doing the research in LLM world. It's fairly shocking how few people are at the global scale responsible for basically all the progress in LLMs. Someone like Ilya as an example, what is that like in robotics? How many people in the world are truly impacting this trajectory? And then I want to ask what good research means.
[47:41]
Sergey Levine
I think those questions are often very hard to answer about science because I think that we sometimes have a tendency, especially when we look at history, to underline particular milestones. And certainly in machine learning, this is the case. Alexnet was a big step forward, that's true, but I think it's also important to remember that these advances, they happen because lots of people are trying lots of things and even some of the failures are actually very instructive. I complained before a little bit in a low key way about the controversy around end to end robotic learning. But I don't know if robotic learning would have advanced the same way if it were not for the controversy, so to speak. It is true that you can look through the list of successes and mark down that like, oh, like these folks have a history of repeatedly hitting home runs. But I think in reality, in the scientific community, it's not just the home runs that are responsible for progress. And even some of the failures and even some of the bad ideas are very instructive in pushing towards the good ideas.
[48:36]
Patrick O'Shaughnessy
That's fascinating to think about. The example you gave before is so interesting, where the research insight was like, just give it some coaching and it gets better. It seems like that sort of insight can be very powerful and high leverage, which makes me wonder, what have you learned about what makes for a great researcher?
[48:51]
Sergey Levine
Research is definitely different from engineering because in research the important thing is to get to an answer to a question, which often requires cutting some corners. One of the most delicate decisions in research is when do you try new things? Or is when do you stick with what you're already trying to? That's very, very delicate. It's very, very hard to figure that out. And if you get it wrong, then you can miss something really remarkable. If you get it wrong and you don't stick with something for long enough, you might be right there, you might be about to get to the answer and then you stop just short of it. That's terrible. Or you could get stuck hammering against something that's never going to give way for years, deciding when to turn a little bit and look this way and that to open yourself up to more opportunities, versus when should you keep hammering on the thing because you're about to get the solution? That's often the most important decision. And some people have an instinct for getting that right. That counts for a lot.
[49:42]
Patrick O'Shaughnessy
You've obviously been in and around and are great researchers. What are these people like as people? How do they tend to be distinctive from the average person?
[49:51]
Sergey Levine
I think they're just the same. I have a very hard time thinking of a single set of personality traits. There is no constant. Basically, there might be a commonality in that to do effective science, you have to be very passionate about that. But even that passion can come from many different places. I've worked with people that were remarkably effective, that are just driven purely by the desire for novelty. They don't give a damn about what their technology does. They don't give a damn about whether it's useful. They just want, like, cool new ideas. I've also worked with other people that really want to solve a particular thing. And they're just as happy building stuff as they are testing out experiments, as they are hammering away at things, whatever it takes.
[50:29]
Patrick O'Shaughnessy
You mentioned the difference between research and engineering, which also makes me think of manufacturing. Elon would be fond of saying that the factory is the product. The hardest part of this whole equation is actually the scale up of whatever this thing ends up looking like. Making 100 million of those. How do you think about that part of the equation? Or is it too remote at this stage?
[50:48]
Sergey Levine
I think it's an important part of the equation. I'm not sure it's the part of the equation that we most need to figure out right now, but it's certainly part of it. A lot of how I prefer to think about this is to figure out the hard part and then enable a lot of experimentation on the other parts. Making a robot at scale is difficult. Making a robot at scale is even more difficult if you don't know what kind of software is going to run on it afterwards and you're not even sure whether it's the right kind of robot. One of the really valuable things we can get out of general purpose AI tools like robotic foundation models is the ability to get a lot of the other stuff figured out so that at least some of the uncertainty goes away. So that when you scale things up, you have some confidence that this is like really going to work.
[51:30]
Patrick O'Shaughnessy
A lot of people that listen to this are entrepreneurs, people that run companies. A very popular question has become, how should a traditional company begin to think about using LLMs or preparing itself for the ongoing improvement of these models? How would you answer the same question for robotics? How would you encourage companies to think about this?
[51:51]
Sergey Levine
The technology is changing so rapidly. I want to illustrate why this question is difficult with an example. Here is a particular uncertainty about the tech. Will the robots rely more on demonstrations or on reinforcement learning from autonomous data? We're working on both of those things and they're clearly both important. But how somebody should prepare for the technology will be pretty different. If they're expecting that, they need lots of teleoperation to produce lots of demonstrations. A little bit of autonomous experience versus the opposite. Like a tiny number of demonstrations and huge amounts of autonomous experience. Like is it 90, 10 or 10 90? That's something we're hopefully going to learn about over the next few years. But it does change the correct approach pretty dramatically. That's kind of a case study of how changes in technology will dramatically alter
[52:35]
Patrick O'Shaughnessy
this from a business standpoint. Is the right way to think about it. Get really clear on the economics of the labor in your business. I'm curious how you think about that, the way that this will change the nature of labor itself.
[52:48]
Sergey Levine
Coding tools are like a really nice example to look at for a template of how this might work. It's not like coding tools came on the scene and suddenly we don't need software engineers anymore. It's that the coding tools increase the productivity of individual software engineers. There's some amount of work that needs to be done to make sure that people are able to use them. There's some amount of technology development that needs to be done to make them useful for the appropriate use case. And these things are co evolving and they're also still changing. Coding agents are different than code completion tools and so on. But I think it's like a nice template for us to look at to see how AI tools combine with people doing a job, increase their productivity and also raise new challenges. And I think we'll actually see something like that with robotics too, that a more realistic template is not like the humanoid goes in and the people just leave. There are some aspects of the job that can be done by a robot, some that can be done with a robot working together with a person, some that can be where the person needs to do something special to make the robot more productive, some where it's the other way around, where the robot does something that makes the human more productive. And it'll be this kind of dance that we've seen with coding tools.
[53:51]
Patrick O'Shaughnessy
Do you have a favorite robot that's not part of what physical intelligence is doing and if so, why? It could be anything, could be a factory robot, could be a optimus, a Boston Dynamics.
[54:02]
Sergey Levine
I do really like the Boston Dynamics robot, especially the new version of the Atlas, because it is in some ways very human like and in some ways very not human like. They made some interesting decisions about how they want more range of motion on the joints. So it can do some pretty cool things. It's also a very agile robot, which is really cool. It makes for those awesome demos. So I'm a big fan of that. I'm generally a big fan of like everything that Boston Dynamics has done should
[54:24]
Patrick O'Shaughnessy
or could anything be read into the fact that Boston Dynamics has been doing very cool demos for a very long time and don't actually do anything useful for customers.
[54:32]
Sergey Levine
I think it's also a fair question for lots of robotics companies. To be fair, there is a lot of value in demos that serve to illustrate challenges on the road to something useful and productive. Obviously you can also do a demo without being on the road to something useful and productive. There is value in demos. I think that demos that are used correctly in service to a mission can provide people with an illustration of what to expect. And they also provide a challenge. You just have to be like, honest in setting up that challenge.
[55:02]
Patrick O'Shaughnessy
How much do you think about the business endpoints to this point? Roomba is like the best selling robot of all time in the consumer category, which is surprising. And of course we might be on the edge of some sort of Cambrian explosion, but how much of your cycles do you spend thinking about this is the shape of a product that might result from this. That maybe is the way we bootstrap our way to all this data.
[55:22]
Sergey Levine
It's just something that's very hard to reduce to like a very concrete answer right now. It's not too bad to like think about a space of possibilities. A lot of what we're doing when we develop our models, when we experiment with different tasks, when we do demos like the Robot Olympics underneath, we're kind of prototyping what does it look like when we try to do something real with this to different degrees of real and what goes wrong. It is something we think about a lot. It's not something that I have even close to a concrete answer to, but there's a space of possibilities and a lot of what we actually are planning to do in 2026 also experiment with different things in that space.
[55:56]
Patrick O'Shaughnessy
When you study the history of general purpose technologies, which certainly this would be a major one if it comes to fruition, you often find this constellation of things happening around that thing that enable it. LLMs are a direct complement to what you're doing. Are there any other surprising technology areas or trends that help you do what you do but are different?
[56:18]
Sergey Levine
Robotics hardware has become dramatically more affordable over the last few years. When I started working in robotics about a decade ago, I worked with a robot called a PR2, which I believe had a cost of about $400,000. When I started my lab at UC Berkeley, I used a robot that was in the ballpark of $30,000. Now each arm on this thing is maybe a tenth of that. We think that can be even less. That's not due to like any one single technology. It involves both hardware and software. So the kind of low cost arms that we have here, they wouldn't be useful in an industrial setting because traditional control methods would that rely on a great deal of precision wouldn't be able to use them. And I think that does make it a lot more practical to think about general purpose robotics today for people that
[57:01]
Patrick O'Shaughnessy
would want to be fairly technical about following major milestones that are happening in this field. Where does that information show up? What do you read to stay informed about what's going on or watch?
[57:13]
Sergey Levine
So a lot of it shows up in research papers. Research papers unfortunately are not a very accessible source of information because it takes a bit of care to like sort through everything and figure out what is the signal and what does something really mean. Research results are sort of intended for an audience that already understands the starting point from all the past research results. Robotics and I think technology in general is one of those things where the public facing artifacts, the demos and the videos that somebody might post on social media are often actually not very good for providing a sense of, for the true underlying state of things because they're sort of meant more as a demonstration at the edge of capability. And grounding that. What does the demo really mean? Requires digging deeper. Probably research papers are the way to go. Sometimes even worse than that. You have to actually go talk to the individual people and find out what the inside story really is. And maybe that's not a great situation to be in, but that's kind of how science works.
[58:06]
Patrick O'Shaughnessy
As we look forward to the future, in your mission, what feels the most uncertain?
[58:11]
Sergey Levine
I do think the timeline is uncertain. If anything, my sense of the time has gotten more optimistic since we started, but it's uncertain because of the nature of the technology. This is something where there's a bootstrap challenge getting to a particular level of usefulness so that robots can be deployed, so they can do useful tasks, so they can start collecting data from open world settings at scale. Because that's such a sudden event, getting past the activation energy, I think there is a lot of uncertainty about the timing of that. That's exacerbated by the fact that the timeline looks different depending on what kind of technology is deployed. The example I gave before about whether it should be data collection through teleoperation or data collection with autonomous systems or something in between, maybe shared, autonomous, maybe like this coaching kind of thing. Those all sort of change the picture in terms of how deployments work and how in the wild data collection works. So because of that, I do think there's quite a bit of uncertainty.
[59:03]
Patrick O'Shaughnessy
You're in such an interesting position because you're at the center of research. Lots of different kinds of people are talking to you, asking you questions. What are questions that you're surprised people don't ask you?
[59:14]
Sergey Levine
Well, I think the question you asked earlier, actually about how somebody should prepare, there's a variant of that question, which would be something like, if I want to start using autonomous robots for a thing, what should I start setting up? Should I set up operations? Should I modify my task in some way so it's more accessible? Should I design new hardware? Maybe I should design new hardware so I can plug your software into it. And I think people make a lot of assumptions about that. For example, one assumption is machine learning requires data. So let me just figure out something that will collect data. That's not often the best assumption because you need the right kind of data. Maybe some data is easy. It's easy to get videos of people doing something. But that doesn't mean that's the right kind of data. And it might be domain dependent. It might be dependent on the thesis about the technology that will succeed. So I think that people do make a lot of assumptions about that. Not that I necessarily have a better answer for them, even if they ask me, but it's something where there's a big space of possibilities.
[60:05]
Patrick O'Shaughnessy
We talked about these, like big uncertain long term timelines. What is the very next thing you are trying to solve?
[60:11]
Sergey Levine
A big focus for us right now is actually better understanding this mid level reasoning part of the problem. Because we think that we have a pretty good sense for how to acquire low level physical behaviors. But getting those little physical behaviors to generalize requires bringing to bear a lot of this common sense knowledge. The representation of that might be really important. So LLMs make certain kinds of representations very convenient. They make it very convenient to basically turn text into other text. But that's not necessarily the best representation for what an embodied system needs to do. Sometimes it needs to think about things more spatially, sometimes semantically, sometimes other representations. And trying to figure out exactly how to structure that internal thinking process might be a very important question. The answer to that question might be different in the world of embodied foundation models than it is in the world of LLMs. So that's like a concrete thing that we're working on now.
[61:02]
Patrick O'Shaughnessy
If I could somehow get the hundred most informed and active robotics researchers in the room at once and pull them on how certain they are that things will have unlimited capabilities and how soon that might happen, where do you fall in that distribution?
[61:19]
Sergey Levine
Probably I'm on the optimistic end when it comes to established robotics researchers and on the pessimistic end relative to robotics entrepreneurs.
[61:28]
Patrick O'Shaughnessy
I understand the entrepreneur part for sure. You're optimistic by nature. Why Are you on the optimistic end of the researcher community?
[61:34]
Sergey Levine
Robotics has a very long history which has precious few successes, especially when it comes to robotic AI. So I think if we're being honest about it, most robots that are out there doing useful work are still running state of the art technology from the 1980s. Because the robotics problem is hard, not our fault, it's just a difficult problem. Because of that, I do think that there is good reason for caution to say that, well, maybe we've made a lot of headway on this part of the problem, but there's many other problems that still remain. Part of why I'm optimistic about this is that I kind of have a sense of what has proven tough for me before. And I can see a lot of the puzzle pieces that I'm imagining could be slotted in to address many of those things. As my co founder Carol likes to say, when you've climbed the mountain, only then do you see if there's another mountain after it. In robotics there's been a lot of experience of lots of mountains.
[62:22]
Patrick O'Shaughnessy
Some caution is justified given that endurance is required. Who or what most inspires you?
[62:28]
Sergey Levine
Boston Dynamics I think there's like a lot of things that we can debate on the technology side, but there's a lot of value in repeatedly showing something that people wouldn't have thought possible, even if there's all sorts of caveats and assumptions and so on. And certainly in robotics, whatever we might say about demos and whatnot, I think it's very fair to say that people have revised their thoughts about what's possible from seeing some of that stuff. I think I'm also inspired by organizations that create an atmosphere for experimentation. There are some research labs that have done a very good job of this. OpenAI has historically done a great job of this, of creating an atmosphere where individual researchers can experiment with things and be empowered to see those things through. ChatGPT was basically John Shulman's pet experiment for a while. It wasn't a concerted corporate strategy with lots of spreadsheets and pie charts. It was a pet project. I think there's something pretty inspiring about organizations that empower people to have pet projects turn into world changing successes. Certainly one of the aspirations that I and my co founders have here at Physical Intelligence is to provide some of that to the best of our ability. It's hard to do.
[63:44]
Patrick O'Shaughnessy
I feel like Google used to have that one day you can do whatever you want thing. Is that the spirit of it?
[63:49]
Sergey Levine
I was absolutely shocked when I started working at Google at the level of leverage that I felt I could have. One of the projects that I did with many of my colleagues there in 2015 was colloquially referred to as the Arm Farm. So we took a couple dozen robots, put them in a lab, and had them collect data. I found out from somebody that they had a warehouse full of robots that nobody was using. I asked Jeff Dean and Vincent Van Hook if we could stick them in a lab. And I was just thinking, like, okay, they're not going to take me seriously. I was a Level 4 research scientist. Jeff was like, yeah, let's do it. What do you need? I just remember feeling like, wow. I had never in my life thought that I'd have that leverage. I mean, I was very young at the time. That's very special. And I think getting to a place where people can unlock their creativity and have that kind of agency can make for a very remarkable place.
[64:37]
Patrick O'Shaughnessy
My friend Jesse has this great question, which is, for companies that you're not involved with, which one do you most hope succeeds and why? People say boom a lot because they want to fly places faster. Increasingly, as I've asked this question, people have said buy. Because the sheer impact that it might have if you're successful is massive on such a global scale. And it's been really fun just to hear about all the ins and outs of how you're thinking about the problem and attacking it. When I do these interviews, I have the same traditional last question for everyone. What is the kindest thing that anyone's ever done for you?
[65:09]
Sergey Levine
It's a tough question to answer because I do think there are many moments in my career where I got a leg up on something. I think I have the kind of personality where I sometimes don't appreciate in the moment and only reflect on it afterwards. The three moments in my career that stand out, actually, one of them I'd already mentioned to you, which was the ARM Farm thing. I'm especially grateful to Jeff and to Vincent for willing to take that bet on me and my colleagues. And there are a couple other moments. When I started my postdoc with Peter Beale at Berkeley, I had zero robotics experience. I had done virtual character animation and computer graphics. I felt like that was a bet on my potential more so than my actual accomplishments. And there was another moment even earlier on. I got an internship at Nvidia that got me to, like, experience some cool stuff when I was just, like, a sophomore. And I think the hiring manager for that also took a bet on me. And I think that these kinds of things, they really matter. In a person's career and I think that at the moment I should have been more grateful. But certainly in hindsight it's something that made a big difference and hopefully I can make that difference in other people's careers as well.
[66:07]
Patrick O'Shaughnessy
Well, I've learned so much from you and your co founders and so much today. Thank you so much for your time.
[66:12]
Sergey Levine
Thank you.
[66:13]
Patrick O'Shaughnessy
If you enjoyed this episode, visit colossus.com, you'll find every episode of this podcast, complete with hand edited transcripts. You can also subscribe to Colossus, our quarterly print, digital and private audio publication featuring in depth profiles of the founders, investors and companies that we admire most. Learn more@colossus.com subscribe.
[66:38]
Sergey Levine
Foreign.
[66:52]
Patrick O'Shaughnessy
You know how small advantages compound over time. That's true in investing and just as true in how you run your company. Your spending system is your capital allocation strategy. Ramp makes it smarter by default. Better data, better decisions, better economics over time. See how@ramp.com invest as your business grows, Vanta scales with you, automating compliance and giving you a single source of truth for security and risk. Learn more@vanta.com invest Ridgeline is redefining asset management technology as a true partner, not just a software vendor. They've helped firms 5x and scale, enabling faster growth, smarter operations and a competitive edge. Visit ridgelineapps.com to see what they can unlock for your firm. The best AI and software companies, from OpenAI to Cursor to Perplexity, use work OS to become enterprise ready overnight, not in months. Visit workos. Com to skip the unglamorous infrastructure work and focus on your product. Every investment firm is unique and generic. AI doesn't understand your process. Rogo does. It's an AI platform built specifically for Wall street, connected to your data, understanding your process and producing real outputs. Check them out at Rogo AI Invest.