Summary7 min read

Podcast Summary

Podcast: Y Combinator Startup Podcast
Episode: The GPT Moment for Robotics Is Here
Date: April 16, 2026
Guest: Quan Vuong, Co-Founder of Physical Intelligence (PI)
Host: Y Combinator

Episode Overview

This episode centers on the transformative progress in robotics AI, specifically the dawn of the "GPT moment" for robotics, as described by Quan Vuong and the hosts. The discussion dives deep into PI’s mission to build a generalizable robotics model, recent breakthroughs, challenges in scaling robotic intelligence, and how these advances are poised to spark a Cambrian explosion of robotics startups. The episode is rich in technical insights, practical applications, and actionable advice for founders eager to enter the field.

Key Discussion Points & Insights

1. The New Equation for Robotics Startups

Barrier Reduction: The cost and complexity to start a robotics business have plummeted thanks to AI advancements.
- "The equation I think for starting a robotic business has changed and will continue to change at an accelerating pace because the upfront cost is not that high anymore." (00:00, 31:52, Quan Vuong)
Focus Shift: Robotics development is transitioning from specialized, vertically integrated stacks to modular, generalizable intelligence platforms.

2. The GPT-1 Moment: Foundation Models for Robotics

The "GPT moment" refers to the advent of a foundation model for robotics, able to generalize across tasks and embodiments, akin to how GPT-1 catalyzed progress in language models.
Evolving approach: Build a strong base model, deploy in real jobs, continually improve it with real-world feedback.
- "You start from a really strong base model...by actually exposing the system to the complexity and the edge case of the real world, that system get incrementally, even just slightly better over time every day." (01:01, Quan Vuong)

3. Pillars of Robotics (& Recent Breakthroughs) [03:05–08:29]

Three Pillars:
- Semantics: Massive leap via language models, enabling common-sense reasoning in robots.
- Planning: Leveraging foundation models' abstract reasoning for flexible task execution.
- Control: Still challenging; the core of real-time, robust interaction with the environment.
Key Papers & Models:
- SayCan: Brought language models into robotics for semantic planning.
- RT2 (Robotic Transformer 2): Adapted vision-language models for low-level control, enabling tasks with unseen objects (e.g., "move Coke can to Taylor Swift picture").
  - "Even though the concept of Taylor Swift doesn't exist in the robot data at all. And that work." (04:13, Quan Vuong)
- Open Cross Embodiment / RT-X: Enabled scaling across hardware, not just specific robots—a “50% better generalist” over task-specific specialists (06:41, Quan Vuong).

4. The Robotic Data Problem [09:12–13:34]

Data Scarcity & Collection
- Data for robotics is sparse and hard to collect, unlike language's internet-scale datasets.
  - "There is not an Internet of robotic data that you can use." (10:03, Quan Vuong)
- Emphasis on capturing data from multiple sources (“cross embodiment”), making models less reliant on a single hardware platform.
- Challenge: Even two robots of "the same" platform drift apart over time—cross embodiment models can better generalize (12:18—Quan Vuong).
Emergent Properties
- Zero-shot task execution—models perform complex tasks without specific training data, seen as evidence of emergent general intelligence properties (13:34–14:24).

5. Current State-of-the-Art: Real-World Deployments [15:05–22:18]

Weave (Laundry Folding)
- PI partnered with Weave (a YC company) to enable robots to fold laundry in real laundromats—task requires generalization to deformable, unseen objects.
  - "What you're seeing in this video is a system that we built together folding really diverse item of laundry in a real laundromat..." (15:05, 16:58, Quan Vuong)
- Achieved in just two weeks of collaboration.
Ultra (Warehousing & Logistics)
- Robots packing soft pouches for real Amazon orders in a live warehouse, operating for full days with minimal human intervention and adapting to object variety and changing lighting.
  - "You see this interesting example of the robot kind of nudging the item to go into the pouch and that's really hard. That requires very good understanding of the scene and very precise motion..." (20:16, Quan Vuong)
Key Insight: Transformative shift: From engineering bespoke systems for every task, to scaling AI-powered robots by focusing on data collection and deployment workflows.

6. Cloud Robotics: Decoupling Compute from Hardware [23:27–28:31]

Technical Breakthrough: Most model inference is done in the cloud, not locally on the robot (even for latency-sensitive tasks like laundry folding or making coffee).
- "Almost all of the robot evaluation that we run at PI today...the model is hosted in a data center somewhere...the robot is actually querying an API endpoint that hosts the model." (23:43, Quan Vuong)
- Efficient “real-time chunking” algorithm allows for smooth control despite network latency.
- Benefit: Dramatically simplifies robot hardware (“dumb video camera robots”) and lowers the BOM cost.
Scalability & Integration: PI intentionally avoids knowing details about collaborators' hardware to keep the model agnostic, enabling rapid adaptation to new platforms.

7. Playbook for Aspiring Robotics Founders [29:12–33:59]

Faster, Cheaper, Easier
- Modern models compensate for low-end hardware inaccuracies. Startups can use off-the-shelf gear and focus on integration, data, and customer needs.
  - "You don't need a incredibly expensive robot that is capable of very precise motion today to be able to do this task." (30:45, Quan Vuong)
Vertical Integration Optional:
- Robotics startups no longer need to be fully vertically integrated; foundation models and modularity allow a focus on orchestration and data.
Cambrian Explosion Prediction:
- A thousand vertical robotics companies will emerge, each targeting specific workflows and menial jobs—a democratization of robotics innovation.
  - "I believe there’s going to be a Cambrian explosion of robotic company across the entire world..." (33:13, Quan Vuong)

8. Business Model & Community Approach [36:21–38:46]

Open Community:
- PI open-sources their models and research to accelerate industry-wide progress—not just for internal monopoly.
  - "We publish our research, we open source PI zero and PI zero five...it's the same model." (36:21, Quan Vuong)
Cross-Embodiment as Success:
- True impact is seeing PI's models used on robot platforms they’ve never even seen (38:04, Quan Vuong).

9. Team Structure, Startup Lessons & Tooling Gaps [38:59–46:54]

Team & Culture:
- Large founding team, ex-Google researchers, hardware specialists—critical for attacking such a multidimensional problem.
  - "Any one of us could have started a company and be successful. But the problem is just so incredibly hard..." (40:49, Quan Vuong)
From Big Tech to Startup:
- Surprise at lack of infrastructure: No turnkey systems for data collection, annotation, evaluation, or teleoperation in general-purpose robotics.
  - "We end up writing a lot of the software at PI ourselves." (41:46, Quan Vuong)
- Potential for new startups to fill these auxiliary gaps.
Research Automation Wish
- Need for robotic “research agents”—tools to debug, iterate and improve end-to-end pipelines (44:00–46:00).
  - "I would love it if there is a model that can ingest multimodal data such as this and analyze failure modes..." (44:43, Quan Vuong)

10. Notable Quotes & Memorable Moments

"You literally just gave people the playbook for how to build a vertical robotics company." — Host 2 (36:11)
"We want to enable the community, we want to accelerate progress..." — Quan Vuong (36:21)
"Success for us is not defined as only our model on our robot performing tasks that is useful. The surface area for success is actually much larger..." — Quan Vuong (38:04)
"I think the equation for starting a robotic business has changed and will continue to change at an accelerating pace because the upfront cost is not that high anymore." — Quan Vuong (31:52)

Important Timestamps

| Timestamp | Segment / Topic | |------------|----------------------------------------------------------| | 00:00 | Lowering barriers for robotics startups | | 01:01 | Mission: a generalist GPT-1 for robotics | | 03:05 | The three pillars of robotics (semantics, planning, control) | | 06:16 | Open Cross Embodiment and Robotic Transformer X | | 07:26 | Key advantage: 50% better performance of generalist models| | 09:55 | The robotic data problem | | 12:18 | Hardware drift and generalization challenges | | 14:50 | Emergent zero-shot abilities | | 15:05 | Video and breakdown: Weave's laundry folding robot | | 20:16 | Video and breakdown: Ultra's logistics robot | | 23:27 | Cloud-based robotics model inference | | 29:12 | Advice to aspiring robotics founders | | 33:13 | The coming Cambrian explosion of robotics companies | | 38:59 | Background, founding team, and unique challenges | | 41:46 | Lack of infrastructure in general-purpose robotics | | 44:43 | Research agents and automating the bottlenecks | | 48:40 | Closing reflections and call for ambitious founders |

Conclusion & Takeaways

The episode highlights a seismic shift: robotics is entering its "GPT moment." PI and Quan Vuong embody the engineering audacity fueling this shift—from cross-embodiment foundation models and cloud inference to open research and an inclusive, accelerationist vision for the robotics ecosystem. The playbook for robotics startups is clear: focus on real customer needs, use foundation models, collect data smartly, and move quickly.

Memorable Final Word:
"The one takeaway that I want you to have is I think robotic has changed a lot and the cost of building in robotic has decreased and I think will continue to dramatically decrease. And it also requires a very different kind of scrappy skill set that young startup like needs. We hope to enable really an explosion of many, many, many different robotic use case." — Quan Vuong (48:40)

Loading summary

Transcript103 lines

[00:00]
Quan Vuong
The equation I think for starting a robotic business has changed and will continue to change at an accelerating pace because the upfront cost is not that high anymore.
[00:12]
Host 1
Everyone's sort of spending a lot of time in the digital world and it feels like now is the time to start thinking about the world of atoms.
[00:20]
Host 2
You literally just gave people the playbook for how to build a vertical robotics company.
[00:25]
Quan Vuong
This has really been our mission from the start, is to create that Cambrian explosion.
[00:30]
Host 2
It's still like blows my mind. I didn't know if this would exist even in my entire lifetime.
[00:42]
Host 1
Welcome back to another episode of the Light Cone. Today we have a very special guest, Quan Vuong. He's one of the co founders of physical intelligence, which we think might be the robotics AI lab that brings about the GPT. One moment for all of robotics. Kuang, thank you for joining us.
[01:01]
Quan Vuong
Pleasure to be here. Has been a long time admirer of YC and our mission is to build a model that can control any robot, to do any task that it's physically capable of and to do so at such a high level of performance that's going to be useful to people in all walks of Life. And so GPT1 for robotics, what is it is the ChatGPT moment for robotic real. Our perspective here is, is that we want to build a model that's really intelligent. We want to build a platform that allow us to externalize that intelligence to the rest of world and allow them to use it to build very interesting application in all sorts of vertical and robotics. And we think that it's going to be more like a pulling an onions analogy where you start from a really strong base model that have all sorts of common sense knowledge and already works to some extent on your robot. You have then a mixed autonomy system, very similar for example to a autonomous driving car today. And then you actually deploy that system to do a real job. That system might make mistake, it's okay. And then over time, by actually exposing the system to the complexity and the edge case of the real world, that system get incrementally, even just slightly better over time every day. And you know, one day you wake up and you suddenly have a system that is just fully autonomous and just provide tremendous value.
[02:24]
Host 3
Might be helpful to give the audience a bit of a mini history lesson on why robotics is so hard. And there's been a lot of breakthroughs in the last two years. And I mean just to simplify the robotics problem is three pillars semantics, which I think we got a lot of unlocks and with language models that somehow we ported into robotics. Then you have the planning and then the last thing is control, which needs to be done in real time and interact with a environment that changes. Walk us through the seminal papers that a lot of the team of PI Robotics published that gave you the inkling that the GPT1 moment is near. And that started in 2024.
[03:05]
Quan Vuong
Yeah. So the dream to build general purpose like robots has been a long time dream I think in humanity. We're not the first to say that our mission is to build a model that can work on any robot. And we're really fortunate to be in this moment in time in history where we feel that it's possible to kind of walk back a little bit. A few years before there was, I think the first is saycan, which to me was the first demonstration of language model and how you can bring all of the common sense knowledge into in language model into robotics. And therefore that significantly kind of reduces the need to collect robot specific data. So for example, if you have a task of oh, I want to go to the YC office to record a podcast, what are the steps I need to take? You can ask a language model, just show me the step and show me the plan. And that worked incredibly well. And then the way kind of language model infiltrate, if you will, in robotic is it start at the planning level, at the semantic level, but there's still the control problem. At the end of the day, you still need a mechanism to convert the plan into low level action that can actually actuate the robot and that bring us to Palm E and that bring us to RT2 which stands for Robotic Transformer 2. And what these two work really show is that if you start from a vision language model that is really powerful and you kind of use robotic data to adapt this model to speak robot language, if you will, then you see a lot of transfer from the kind of knowledge that exists in the language in the vision language model down to the low level action. One of my favorite example when we did the RT2 project was you can have picture of celebrity on the table. If you have a picture of Taylor Swift, you have a picture of the Queen of England and you can ask the robot, pick up the Coke can and move it to Taylor Swift. Even though the concept of Taylor Swift doesn't exist in the robot data at all. And that work you can do other examples such as kind of spatial reasoning that doesn't exist in the robot data at all, for example, move the dinosaurs next to the red car. And these are always just completely Unseen object in robot data. And so that was RT2 and that was Pome. Now RT2 and Palme are single embodiment
[05:30]
Host 3
exercises just for the audience. Single embodiment, meaning it worked for a very specific robot.
[05:34]
Quan Vuong
It worked for a very specific robot. In robotic, you can ask the question, how do you scale? Especially how do you scale data collections? And one of the insight that we had back then was maybe the data from one robot is not that different from another robot. Anyway, if you have enough robots in your training data, maybe what the model learned isn't to control one specific robot. What the model learned is something that's more abstract, which is how do I kind of learn a general notion of what it means to control any particular robotic platform and therefore I will be better at controlling any particular platform. And that bring us to what we call Open Cross Embodiment and Robotic Transformer X.
[06:16]
Host 3
That was a big paper because it was the first that showed potential scaling laws that apply to robotics. Because now you could start training all these models across multiple kinds of hardware, not just one, which has never been done in robotics ever before. Because from all the research labs they would all train with a very specific set of sensor actuators and motors. And it was all very finicky with that particular hardware. Right?
[06:42]
Quan Vuong
Yeah. One of the really interesting results from Open Cross Embodiment, and let me provide the context here, is that you can take, let's say, 10 different robot platform, collect data from them, train a policy, and really optimize the policy to work well on that platform. So let's say you have that, you have 10 different platform, 10 different policies. And now if you simply take the data and absorb it into a model that is high capacity enough to really absorb that data and you can compare, you have this generalist that learn to control how to the 10 different robot, you can compare it to the specialist that has been optimized to work well on a particular embodiment. How does it compare? And the interesting result from OpenX is it was 50% better.
[07:27]
Host 3
Wow.
[07:28]
Quan Vuong
And that was really surprising because in robotic, it's hard enough to get your model to work on one particular robot platform. And one of the reasons why I say that we're really fortunate to be in this moment in time in robotic is because OpenX was really only possible because of the support that we received from the robotic community. It was a huge collaboration across the robotic community. And the reason why that's really important is there is this joke in robotic grad school that if you want to add two years to your PhD, just work on any robot platform. By that logic, if you want to have 10 robot platform, that's 20 years.
[08:07]
Host 1
Why is that it takes a year or two to just get the platform up and running to even collect the data.
[08:13]
Quan Vuong
Yeah.
[08:13]
Host 3
Is it fair to say that the data set that was created from Embodiment X is similar to the scale of an impact that imagenet did for Vision? Because it was huge and it was the first large data set across multiple hardware. Huge collaboration.
[08:30]
Quan Vuong
I still think that imagenet was more impactful in the Vision community. And the reason for that is a few. The first is that ImageNet also allowed for reproducible evaluation. Right. OpenX as an effort was more about making data available for people to use. And evaluation is a really difficult problem in robotic that OpenX did not solve. And the second is, I think OpenX is a drop in the bucket at this point in the robotic community. If you measure in the kind of the scale and the volume and the diversity of data that the community is collecting, I think open at this point, it's a drop in the bucket.
[09:12]
Host 1
I mean, I guess we started talking about sort of GPT1, but even GPT1, you know, that was sort of this moment where you can prove, you know, Alec Radford figured out that there was a neuron based on a very specific input and output, and then that allowed the scaling laws to sort of take hold. The biggest problem in robotics I've heard is basically actually exactly what we've been talking about is like, it's the data problem, you know, language you could bootstrap off of, like, you know, the sum total of what you could get off the Internet, which is actually quite a lot. Can you give us like a sense for like scale? Is it like petabytes, like, you know, what do you think is necessary as an input to, you know, the true GPT1 of robotics?
[09:56]
Quan Vuong
Yeah. So the data scarcity problem in robotic, there's a few ways to look at it. The first way is that it's really two problem in disguise. There is the generation data generation problem and there's data capture problem. And the difference is that the data capture is that there might already be lots of robotic data that is being generated, but there's just never been really an incentive to capture it, to make it easy for digestions in training. And that's one of the goal that OpenX was trying to solve, which is if you have robotic data, it's a really good idea to capture it and make it possible to train on the Second way to look at it is that robotic is very different from language model. There is not an Internet of robotic data that you can use. And so you see this kind of very operationally heavy effort to collect data. And that's the question, is it going to scale? Well, the way that I look at it is let's take the US GDP, US$24 trillion, let's say if we actually solve robotics, a model that can control any robot to do any task, napkin math, maybe contribute 10% to US GDP. Well, that's already a massive number. And I think that promise is one of the reason that warrants the investment into data collections in robotics. And the third way to look at it is we're very focused on cross embodiment. And cross embodiment. There is the data collection aspect as well, which is to really make sure that your model and your organizations and infrastructure are set up to consume data from many different sources of, of robots. And that actually allows you to scale easier. For example, if I were to contrast our approach compared to, let's say, a company that have a particular hardware platform that they optimize for and they scale, it's not an approach that have really allowed people to scale because it's just much harder to figure out how do you manufacture a thousand units of something for now compared to making sure that you yourself are ready to absorb data from 1,000 different paths, types of robots that are already in there in the community.
[12:04]
Host 1
I mean, it's a crazy problem, isn't it? I mean, the hardware itself, even within the same design of embodiment, if there's a hardware run that goes awry, or like one of the servos is slightly different. Like you see it in the data, Right. And then how do you control for that?
[12:19]
Quan Vuong
Yeah, so I think we were doing kind of like an inventory of robot in the company. We were so shocked to find that there are no robot, no two robot platforms that are the same. And if you ask people in the royal community, sometimes there's debate about multi robot versus single robot. And the argument is that single robot is simpler to scale. And actually that's not how it plays out in practice. How it plays out in practice is even if you have a single robot that you're optimizing for, over time that platform is going to drift. Maybe you want to make hardware change or you have software change, you end up in a situation where it's much harder for you to reuse old data. Because in machine learning, if you want to generalize from a distribution, you would like many sample from that distribution. And if you just have one robot platform that have a major change every three months, maybe you have a few data points from that distribution. Whereas if you start from the hypothesis that if you have many robot platform in your fleet, your model is going to learn something more abstract, which is how do I control a robot? Not any particular robot. Then the model will be able to ingest data from a slightly different robot barrier. And actually we're starting to see immersion property in this kind of robot large foundation model.
[13:33]
Host 1
That's good news.
[13:34]
Quan Vuong
We're doing where you start to see interesting transfer between different data sources. For example, today it's possible to perform tasks zero shot. Zero shot, meaning you don't collect any data. And these are the tasks that last year might have required hundreds and hundreds of hours.
[13:51]
Host 4
What are some examples?
[13:52]
Host 1
Yeah, do we have any videos? We can see that.
[13:55]
Quan Vuong
So I might get some flack when I come back because this is not published result. Hopefully this will come out soon. So I want to reserve the excitement for that. And I'm building up the excitement a little bit. So hopefully this will come out soon.
[14:09]
Host 1
All right.
[14:10]
Quan Vuong
These are not simple tasks. These are actually difficult tasks that just last year require hundreds of hours of data collections.
[14:17]
Host 1
You hear on Lightcone first that there's some emergent properties that are going to come out of PI.
[14:22]
Host 4
Can you give us a sense of the flavor of the tasks?
[14:25]
Quan Vuong
It's really easy to fool yourself. And so we wanted to test across few different tasks of different flavor. Tasks that require precision, tasks that require reasoning with multiple objects in the scene. It all seems to have this property that's really nice. So it does seem like that's something that's kind of a more general property that emerged rather than we just got lucky and suddenly the model started working on one particular task.
[14:50]
Host 2
Could you help us understand where we are now in terms of what's working and how well it's working? We're not quite at the chatgpt moment. Where are we? And I think you brought some videos that you were going to show us to help everybody visualize what the current state of the art actually looks like.
[15:05]
Quan Vuong
I think where we are is I think if you have a task where it's okay for the robot to make a mistake and it's possible for you to set up a mixed autonomy system where you have a person that takes over when the robot make a mistake and provide corrections, it is possible to get to a level of performance where it starts to make sense to think about scaling robot Deployment. And the example that I specifically want to highlight here is this blog post that we did with Weave and Ultra. And you know, it's great that these are both YC company. I want to provide a little bit of context here first. The context is that PIE is a primarily research organization. We want to focus on building the best model, but we also want to not be tunnel vision. We want to make sure that the model that we built actually going to be useful and actually perform tasks that people in society cares about. And one of the really good way for us to do so is to partner really closely with company that want to get robot out there today. And the way that these relationships work is that we treat each other like we're on the same team. Very free flow of information and we design a system that try to get the best possible performance for the tasks that these companies care about. So let me talk about we first. What you're seeing in this video is a system that we built together folding really diverse item of laundry in a real laundromat in the mission, you can see people walking outside. And why this task is difficult is because there's just infinite possibility of observation space. Like, you know, clothings are deformable and no two item of clothing here are the same and these are also unseen. You know, these are not like clothing items that are seen in the training data.
[16:58]
Host 1
Yeah, I love this team. They are some of the most cracked people out of Apple I've ever met.
[17:03]
Host 2
Gary was the partner for. We maybe want to like explain like what Weave is and what they're like. What they're like company is.
[17:09]
Host 1
Yeah, I mean they're actually, you know, shipping their first robots into the home. We sort of talked about it as, you know, being able to do household tasks like this and I think they were very inspired by Physical Intelligence's first demos with. With laundry folding. So it's actually a total trip to hear about it. You know, a bit a year ago we were talking about them doing it and then now to see them do it working in hand. Hand in hand with you is really awesome. I think this is a great example of like, you know, you need the model smarts, you need the data collection and then the hardware and the sort of system integration all working together is just hard to nail. So.
[17:47]
Quan Vuong
Yeah, and to get back to your question about why robotic is hard, it is a really hard system. Problems like you need everything to work well and work well together to get this result. And like Weave is such an incredible team for us to work with to get this result. And it actually didn't even take us that long to get this result. It was roughly, we set a goal, then maybe it was like two weeks afterwards where we got a model that was got a model and a system that was good enough at performing this task.
[18:18]
Host 2
It still blows my mind to see a robot actually folding laundry, because I remember until basically until ChatGPT, I didn't know if this would exist even in my entire lifetime. Because folding laundry, I mean, it's always been like the Turing test for robotics, because there's no way to deterministically program a system the way that you did pre AI to do this, because the space is so infinite and we've shown that it's possible for us to do. Basically, if everyone can do this, robots will be able to do everything. It's only a matter of improving it from here.
[18:47]
Quan Vuong
There was a funny story where when we first published PI zero, people thought of us as the laundry company, because the demo was just focused on laundry. And actually picking home tasks, especially tasks that has to do with deformable object, is a very intentional choice on our end. We're not just after the home. We really want to make it broadly applicable. But picking home tasks for us to start with has a few benefits. Like, one is relatable. You can see the laundry folding demo and you can kind of like grok how this is going to be useful and you can get a sense of why it's hard. And the second is that it's really easy to set up to test generalization.
[19:27]
Host 3
You can talk about Ultra, which is your company. Jared, a demo of it.
[19:30]
Quan Vuong
Yeah, this is Ultra. The thing that I love about this video is you see, you know, it's bright outside and you see this is 4x speed and it's 100 minutes. If I scroll to the end, the sun has set.
[19:43]
Host 3
Oh, wow. That was one of the big problems in robotics, where it would be so sensitive to the environment in lighting and mess up the vision system. The semantics and part of it, yeah.
[19:55]
Quan Vuong
And the interesting thing here is that it is possible to get to the level of autonomy that the robot is just performing the task. This is autonomy at scale. Like this is ready to be Scaled Kwan.
[20:10]
Host 2
Because this task is less familiar than laundry folding. Do you want to explain what the robot is doing here and what Ultra is like doing as a company?
[20:16]
Quan Vuong
Ultra is a company that want to make it really easy to adapt robot to new tasks. And right now they're focusing on logistics space, which is really important because there's lots of labor shortage in logistics. And the task that we focus on together here is if you order an item from Amazon, you sometimes get this soft pouch that item gets shipped from. And the task here is you have a tray of these items here and the robot is supposed to pick one of them at a time and and place it inside this pouch. The machine would enclose it and then pick up the pouch and put it on the left here to be ready for shipping. Now this heart is hard because there are many different types of object that can be in this tray and the opening here is actually very narrow. So you see this interesting example of the robot kind of nudging the item to go into the pouch and that's really hard. That requires very good understanding of the scene and very precise motion to nudge the object into the pouch. The other thing that's hard about this task is the level of autonomy that's required. This is running for an entire day. There is still human intervention, I want to say, in this full day operation, but the level of intervention is actually quite minimal.
[21:39]
Host 2
This is not just some demo station. This is actually recorded in an actual e commerce warehouse where they're actually shipping real products to real customers. This isn't just a lab, this is
[21:49]
Quan Vuong
packaging real customer, real order for customer to be shipped out in a real warehouse. So this is real operations.
[21:56]
Host 2
So I think this is really cool because I think when people think about robots, they tend to think of the consumer use cases like weave because that's what we're familiar with in our daily life. What I find really interesting is that there's a million applications like this ultra thing that you wouldn't think of as obviously like, oh, who packs the soft pouch of things that you get from Amazon? Well, there's some person who does that. And this is a job that we can now build a robot to do.
[22:18]
Quan Vuong
The interesting thing about the approach is that you're converting it from a very difficult engineering problem into a operation. Problems of how do I identify the use case and how do I collect the right data, which is in some sense more scalable because you can build the system that allow you to collect data for many different tasks. So it's now a problem of how do I scale data collection rather than for every new task, how do I design a really difficult engineering system to solve it.
[22:47]
Host 3
YC Startup school is back. We're hand selecting the most promising builders in the world and flying them out to San Francisco for July 25th and 26th to discuss the cutting edge of tech. Apply now For a spot. Okay, back to the video. I think one thing that the audience may not know is that you have a very unique technical insight that in the past, robotics folks would have kind of gasped and be shocked because robots need to run in real time. A lot of times all of the compute runs in on device. But you guys have done something very different. Can you tell us more about that so that this works in real time with large models and really well.
[23:27]
Quan Vuong
So the context here is that we talked to many companies that would like to deploy robots, and one of the first question we get is what compute units should we get on the robots? It's expensive, it's going to increase the BOM cost, and they're worried that it's going to go out in fashion very quickly because the model change, the model gets bigger. How do I make sure that the hardware that I'm going to commit to today is going to be viable for a couple of years? It's a very difficult question. People often really surprised when I tell them that almost all of the robot evaluation that we run at PI today, including the really complicated demo that we have shown making coffee, folding laundry, mobile robots, navigating around the model, actually hosted in the cloud. And this is not like a cloud as in a server in the office, it's a real cloud. The model is hosted in a data center somewhere. And within this high frequency control loop that is controlling the robot, the robot is actually querying an API endpoint that hosts the model, sending it images and language command and getting back action that then execute directly on the robot. And this is surprising because of precisely the reason that you mentioned. How do you actually make it work? This is why it's really important for PI to couple system hardware and model development and research very tightly together, because it allow us to solve for this problem. So, for example, one of the insight that we have here is that you can actually bury the inference time within the robot control loop. Because if I'm a robot, I have enough action for me to execute for the next 100 milliseconds. There's no reason for me to wait until I finish executing that action to ask my model for a different action. I can do it as fast as inference, essentially. And so maybe when I only have 50 milliseconds of action worth left, I can ask for the next set of action. And when the current 50 milliseconds is over, I have something that's ready for me to continue with my next 100 milliseconds. So that's one of the Inside the other kind of algorithmic improvement, we refer to them as real time chunking. Design inference in such a way that you know there's going to be a delay in how long it takes to query the model on the cloud. Basically like the problem here, if I get a little bit more technical, is an action chunk is a sequence of action that I can execute on the robot. So, you know, it's not just one action. And if I have an action chunk that I can execute for 100 millisecond and 50 milliseconds in, I want to predict another action chunk. And I'm going to transition to that new action chunk after my current 50 meter second is over. How do I make sure the two are consistent? How do I make sure that if I'm moving this way, the next action chunk is going to allow me to continue to be smoothly moving this way?
[26:20]
Host 1
You can pre compute?
[26:21]
Quan Vuong
Yeah, you can pre compute. And that's one of the algorithmic improvement that we've made to make inference using model hosted in the cloud possible.
[26:30]
Host 1
I studied computer engineering, so I'm not really an algorithms person. But when it comes to systems like that, like pipelining, like, get me all over that. That sounds great. That's so interesting.
[26:40]
Host 3
I mean this simplifies, it's kind of, it's a brilliant choice because it simplifies so much of the system for the robots. You don't need all these clunky, I don't know, people have two operating systems at some times for robots, embedded rtos and then the regular one and all these complex giant compute and power. And this is what the initial versions of a Waymo used to run, basically a server on the trunk. And you can't afford to do that with general day robotics, which is brilliant that you figure out how to do it.
[27:08]
Host 1
Yeah, you don't have to. I mean you can do things. Some of it, there obviously has to be some compute there, but a lot of the compute can happen elsewhere. And then is there, there must be a video like this, this thing that we're looking at in the top left. Like how much of that is sort of like video feedback? How much of it is like local processed?
[27:26]
Host 2
I mean, is there any compute locally on this robot or is it just like a dumb video camera that streams
[27:31]
Quan Vuong
data to the cloud for this I am not 100% sure, but I am inclined to believe that it's just a dumb computer. Like for this specific video, I don't remember, But I'm just 100% confident that we can make this Work with a dumb computer on the robot. And one other interesting thing about our collaboration with Weave and Ultra is one, I've never seen that robot in person.
[27:54]
Host 3
Oh, wow.
[27:56]
Quan Vuong
Two is I have very little idea about how the robot actually works.
[28:01]
Host 2
Interesting.
[28:02]
Quan Vuong
And that's a very intentional choice. I want to stay away from that as far as possible. I also don't know how they collect data. I intentionally don't ask them this question to understand whether it's possible for an organization like PI to parachute into their existing system and to work really closely with them on the thing that actually matters, to get the system to work and not have to learn about how they've set up their system. Because in a way that's like a more scalable recipe.
[28:32]
Host 3
Yeah. You completely decouple a lot of the hardware control loop choices from the semantics and planning, which just works. Just brilliant.
[28:42]
Quan Vuong
Yeah, I mean, I'm really surprised it works. When we started the company, we thought that real deployment is only going to be in a conversation like five years into the life of the company. Because the problem is it's really hard and we're two years in and this is the result that we have. And real deployment and scaling the number of robots is a really serious consideration today. And so the pace of progress has just been very pleasantly, much faster than we expected originally.
[29:12]
Host 2
Often on this podcast we talk about what all this means for startup founders. I think that might be an interesting question for us to explore here. So if you imagine someone was listening to this podcast, maybe they're like a college student that's studying computer science and they think robots are really cool and they want to do something like this. How should they get started and what are the skills that they need? Do they need to be a mechanical engineer to be able to build a robot like this? Can they just buy an off the shelf like robot arm and camera system and like what.
[29:39]
Host 1
And load PI and.
[29:40]
Host 2
Yeah, yeah, yeah.
[29:43]
Quan Vuong
Before I actually answer your question, let me provide a few more context. The first is that robotic is traditionally really hard because it's an extremely vertically integrated business. You need to have your own customer relationship, your own hardware, your autonomy stack, your own safety certification, your own everything. And the barrier to entry is just really high because of that. And one of the things that we're trying to change is that we're trying to provide a foundation of physical intelligence that the community can build on top of that allow them to onboard autonomy onto their robot and their tasks much quicker than before. So that's the first we want to provide that Kind of seat of intelligence that allow people to move much faster so that they can focus on other problems. The second thing is that I think the recipe for starting a vertical robotic business today is one, have a really good understanding of the existing workflow, because the robotic system needs to fit into existing workflow. And the second is to be very meticulous about identifying where the opportunity is. If there's a workflow that need X number of work today, where is the robot? When you insert it, it's going to make the biggest difference. And two, is to really be scrappy when it comes to hardware and data collections. You don't need a incredibly expensive robot that is capable of very precise motion today to be able to do this task. And the reason why is this model are really reactive and so they can compensate for some of the inaccuracy in the actual robot movement. And to ensure that you have the ability to collect data and to run evaluation, especially evaluation in real deployment. The next step after that is to get a mixed autonomy system that allow you to get to the point where it's breakeven.
[31:35]
Host 2
Like break even economically.
[31:37]
Quan Vuong
Break even economically. Because the reason why that's important is because it allows you to then scale the number of robots.
[31:43]
Host 2
Because if you lose money in every robot, it's very hard to scale.
[31:45]
Host 3
That has been historically one of the biggest challenges for robotic companies that they go into growth stage. It's just the payback period. It just doesn't make sense.
[31:53]
Quan Vuong
Yeah, so the equation I think, for starting a robotic business has changed and will continue to change at an accelerating pace because the upfront cost is not that high anymore. And now what is the upfront cost? The upfront cost is much cheaper hardware, ability to collect data, ability to collect evaluation, and ability to kind of understand the use case to see where they should insert the robot. It's not about having incredibly expensive hardware. It's not about having your own proprietary, I think, autonomy classical stack anymore to be able to do this task. And so it allow company to focus on the component that will actually allow them to differentiate themselves from the rest
[32:41]
Host 2
of the space now that you sort of unbundled it and you no longer need to build this fully vertically integrated company in order to build a robotics company. Are we on the precipice of a Cambrian explosion of vertical robotics companies where there's going to be like a thousand companies like Ultra going after, you know, every, like menial job in the economy and getting a deep understanding of the customer, building a robot that can solve that problem, doing Mixed human machine deployment until it can run fully autonomously and building a company in every sector. Is that the future that you see people building on top of PI?
[33:14]
Quan Vuong
It's funny that you mentioned Cambrian explosion because when we wrote this blog post, there was that term that was very kind of hotly debated. We are, I think, academics at heart and we want to be kind of very measure when we communicate. But myself personally, I believe there's going to be a Cambrian Explosions of robotic company across the entire world and across many, many different vertical, just because it's just so much cheaper to build. And it doesn't require someone with 20 years of experience in robotic to start anymore. It require someone that is really scrappy, that can move really quickly, can do the system integration, can understand customer what they want to start the deployment.
[33:59]
Host 1
I mean, what's coming up for me is obviously we work with a lot of robotics companies and meet a lot of founders, and it feels like there's this continuum. One is, to use an analogy to personal computing. You could argue that industrial robotics today is basically like mainframe or minicomputer level. Like, you know, if you look back in the 70s, huge public companies like digital computer that, you know, just did like these sort of very, very expensive deployments, but like they were very, very specialized and it was all extreme enterprise. Like, you know, the idea of a personal computer was ridiculous, right? You know, it took the Altair and then Apple one and Apple II and then IBM PCXT to like create personal computing. And then like the traditional advice for robotics for many years is like, go after like dirty and dangerous. And then of course, those are sort of the industrial cases. Like, you know, you have these giant Tesla robots in the gigafactory and things like that. It feels like what you said around profitability is really, really big. So, you know, does that mean that the people who do the vertical robot Cambrian explosion sort of moment, the people who are sort of first in that, like, it sounds like they would be the first to be profitable and not dirty and dangerous.
[35:22]
Quan Vuong
I think this is already happening today. I think we have the fortune of having lots of visibility into the robotic community because, you know, people would like to talk to us, people would like to learn, you know, what it's like to build a foundation model for robotic. And people would like to know, how do I get the same level of autonomy? And there's so many companies and businesses that we talk to that would love to put the robot into that space that, you know, it's okay for the robot to make a mistake and they just need it so much. I really believe that the recipe that I mentioned earlier of identify where the robot can fit in, focus on cheaper hardware, collect data, run evaluation, mix, autonomy, break even scale. Robots will work across many different vertical. And I'm seeing it play out today and it's just incredibly exciting to see.
[36:12]
Host 2
And this is pretty cool that you literally just gave people the playbook for how to build a vertical robotics company. Like this is a playbook that could possibly be followed successfully hundreds or thousands of times.
[36:22]
Quan Vuong
And the reason why I want to mention it is because I do want to see that Cambrian explosions and we want to help enable it for PI. If we talk about why PI it's going to fail, it's probably going to be because the problem is just way too hard. Maybe it takes 50 more years to solve the robotic problem and not a couple of years, 5, 10. And so we want to enable the community, we want to accelerate progress. And that's why we're very open. We publish our research, we open source PI zero and PI zero five. And people also shock when they ask me, is there any difference between Pi0 and Pi05 that you open source versus the model that we use internally, Pi0 and Pi05. And the answer was actually no. It's the same model. The pre trained model weights that you're using that reopensours is also the pre trained model weights that our researcher internally use for pi0 and pi05. And so we really want to help accelerate progress in the community and to create that Cambrian explosions.
[37:22]
Host 1
Yeah, that's very inspiring. I mean, I feel like that's everyone's sort of spending a lot of time in the digital world and it feels like, you know, now is the time to start thinking about, you know, the, the world of atoms. And this is sort of the perfect mix of actually like, you know, how do you take electrons and turn it into abundance in the, you know, Adam's world? And I think about Dario Amade's essay, All Watched over by Machines of Loving Grace. And when you really think about the perfect manifestation of that, it's not like, you know, perfect agents that look over you just like in the electronic world. It's actually something a little bit more akin to what we're seeing here.
[38:05]
Quan Vuong
Yeah. And this has really been our mission from the start, is to create that Cambrian explosion. And this is why we choose to focus on the model because we believe that is the bottleneck to just really make robot useful across many different tasks in the world. And that's why we also focus on cross embodiment. Success for us is not defined as only our model on our robot performing tasks that is useful. The surface area for success is actually much larger, which is our model performing really useful tasks on somebody else robot out there. Maybe that we don't even know what that robot is like in a way that's useful to the end consumer.
[38:46]
Host 4
Could we maybe talk a little bit about like the humans behind the robots here? How did the company get started? Like, who are the, who are your co founders, how do you get together and what skills do you each bring to such a complex problem?
[38:59]
Quan Vuong
Sometimes the joke I make here is that the human behind the robots are also robots. Not really. Yeah. So PI is a very, I would say untraditional company. We have larger than average founding teams and some of us work really closely together. When we were at the robotics team at Google, and the robotics team at Google was, I think, a really, really great environment for seeing the sign of life and creating the relationships and the community that allow the robot community and like these advances to flourish. There is Loki, which we met when we were thinking about starting the company and has just been really instrumental in making sure that we're a good business. And there is Adnan, our hardware lead that came over from Andro. And Adnan has a really difficult job because if you want to work on cross embodiment, you remember my joke about how if you want to add two years to your grad school, you bring on one more robot. The hardware problem and the operational problem for us is how do we build, improve and scale a fleet of heterogeneous robot? It's just not one robot platform. And because we built the organization from scratch in the beginning to support that, I think we're able to do it. But it's just a really hard problem because there's just like no two different robots in the fleet. Like, how do you make sure that everything runs smoothly? We're really good at divide and conquer, if you ask, but.
[40:30]
Host 4
So how many co founders are there in total?
[40:32]
Quan Vuong
We have Brian, we have Chelsea, Sergey, Myself, Lucky, and Adnan.
[40:37]
Host 4
Is it just necessary to have that many co founders to solve a problem as big as this? Or was it a case like you're already sort of like a unit together, you'd already work together and you just, whatever you started, you would all have
[40:49]
Quan Vuong
one common question that we have is why band together? And the first is that we really enjoy each other company, we spend a lot of time at work and it's in some sense Give meaning to life. And so we really want to enjoy the relationship we have at work. And the second is that any one of us could have started a company and be successful. But the problem is just so incredibly hard and the chances of success is just so much higher that we band together and we can divide and conquer the problems. And you know, that's, I think one of the main reason why the progress has been much faster than we expected.
[41:31]
Host 3
What were the differences of you working before in either academia or big industry, big company like Google and as opposed to now in a startup? Because this is the first time for a lot of you doing a startup, right?
[41:46]
Quan Vuong
Yeah, this is the first time for a lot of us. One of the really surprising thing that we learned when we started the company is that the infrastructure for supporting large scale general purpose robot, which is not there and you know this start from the software itself. How do you collect data? What device do you use to collect data? How do you manage the data? How do you annotate the data? How you get visibility into the data? How do you run evaluation? How you build operational process? There wasn't a company that offered this kind of services which is very different from software. And we were really surprised to find out. And so we end up writing a lot of the software at PI ourselves. But I think this is another area of incredible opportunity of kind of building services for robot company. Like if you can offer remote teleo, for example, if you can offer data collections, if you can offer annotation service, because these are functions that doesn't need to be repeated from one company to the next. So I think there's lots of opportunity to build kind of support for growing robotic business. So that's one surprising thing that I learned. And the second is I think one of the reason why we have managed to achieve such progress is that there is a really tight loop of collaboration in the entire life cycle of model development going from what task do you collect data for? If you collect data for the task, how do you do it? What hardware do you use? After you collect the data, how do you get visibility? How do you ensure data quality? How do you then make sure that you can easily train on that data? After you train on that, how do you run evaluation? Evaluation is really hard problem in robotic because it scales super linearly to model capability. Like let's say you have a model that can perform a two minute task. Running evaluation for that is very different from running evaluation for a task of 20 minutes. It's not 10 times harder, it's more than 10 times harder after you run evaluation, how do you can distill the learning from that evaluation to know how to improve the model further? One of the really side project I would love to take on is to build an automated robotic research scientists, which is really one of the bottlenecks we have today because this is a really difficult skill set that require intuition about the entire stack. So, you know, I would love it if there is a model that can ingest multimodal data such as this and analyze failure modes. You know, understanding oh, is the robot performing this way because of the data that was collected or the way that it was annotated? Other way that we train the model and then, you know, suggest idea and actually try them to figure out if those hypotheses are correct. So that's something that I would love to have and would dramatically unlock us. Sometimes I make the joke in the company that we should record all of the meetings and then train a model to basically just make prediction about what is the next sets of experience.
[44:44]
Host 1
Oh, you could, you totally could. What if it's openclaw and obsidian and markdown files and like, you know, a brain MD with like ontology that's custom to your use case. And what if it's 100 open clause in the background that you orchestrate?
[45:00]
Quan Vuong
I think there's two sides to this. The first is that we already see a little bit of a side of life where for simple failure modes during evaluation, if you can describe the way that the robot failed in text very precisely and very clearly, then you can ask a language model to make very reasonable recommendation about what the next step is. But the flip side is that this only works for simple cases today. And the reason why that's the case is because I think it's pretty fundamental limitation of the model that we have today, which is that they are not at the core model that take action in the world and see the consequences of its own action, especially action that changes the physical world. I think this kind of very fundamental understanding about how the physical world works is missing from the really large foundation model. And I think that's one of the ingredient that's missing to be able to build this automated robot research scientist.
[46:00]
Host 1
What's interesting about openclaw? I don't know, I mean basically it can go and it can just do things, which is interesting. And then at that point it's on the research lab to provide cli MCP endpoints to the things that might control robots or reconfigure rooms or. I mean, I Think Karpathy feels like he's starting to talk a bunch about this, where, you know, if you mix auto research plus what he's been talking about with markdown files, like it might just happen in the open. Like, you know, there's this sort of sense that you have to make something much, much more complicated to make it work. But what if that's just wrong? What if we just have markdown files and agents and, you know, you could make it yourself with, you know, literally Claude code and MCP today. What if it's not an algorithm problem, it's just literally an integration challenge.
[46:54]
Quan Vuong
We have a version of this internally that I use a lot. There was a point when I was spending an embarrassingly large amount of money on API queries. Yeah, yeah. And you know, my team was like, juan, what are you doing?
[47:09]
Host 1
Oh, I'm that guy at Y Combinator right now.
[47:13]
Quan Vuong
So to give you an example, we have a cloth skill that essentially serving the role of a pre training on call today. So, you know, we have these pre training runs that are really large. It's very, I think, a difficult exercise to keep them alive for them to continue to churn, just because there's so many things that can go wrong. And we have a prototype, a pre training on call that kind of babysit the run and have the permission to take action to remedy error that it see. And one of the surprising outcome of that exercise is that it leads to about 50% improvement in compute usage, like just overall compute utilization for that large pre training run, which is huge for us. And this is just a small, simple prototype that I built and I think that's a lot more to be done.
[48:07]
Host 1
Kwan, this is incredible. Thank you so much for everything. Thank you for making physical intelligence. Thank you for showing us these incredible demos. And honestly, like, the thing that gives me the most hope is this idea that there's an entity, there's a research lab out there that is focused on giving this to the world about to create this Cambrian explosion of robotic startups. So someone watching right now will be inspired by this and start playing with your models and they might create a robot that touches billions of people's lives for the good.
[48:41]
Quan Vuong
Thank you for having me. Been a pleasure to a listener. The one takeaway that I want you to have is I think robotic has changed a lot and the cost of building in robotic has decreased and I think will continue to dramatically decrease. And it also requires a very different kind of scrappy skill set that young startup like needs. We hope to enable really an explosion of many, many, many different robotic use case. And you know, always reach out to us if you want to collaborate.
[49:12]
Host 1
Thanks, man.
[49:13]
Host 4
Thanks so much.
[49:13]
Quan Vuong
Thank you, thank
[49:27]
Host 2
you.