Podcast Summary: Fei-Fei Li – World Models and the Multiverse
Podcast: AI + a16z
Episode: Fei-Fei Li: World Models and the Multiverse
Date: December 23, 2025
Host: a16z (Eric, main host; joined by Martin Casado, General Partner at a16z)
Featured Guest: Fei-Fei Li, Co-founder & CEO of World Labs
Episode Overview
This episode delves into why spatial intelligence and world modeling are critical next steps for artificial intelligence, moving AI beyond the current dominance of language-based models. Fei-Fei Li, a pioneer in the intersection of data and AI, and Martin Casado, a16z general partner, discuss the limitations of language models, the promise of AI that understands and acts in 3D space, and the implications for robotics, creativity, and the concept of a digital “multiverse.”
Key Discussion Points & Insights
1. The Case for World Models: Beyond Language in AI
- AI’s Current Focus is Language: Most current AI innovation centers around Large Language Models (LLMs), but this neglects a more fundamental component of intelligence: spatial understanding.
- Fei-Fei Li’s Perspective:
- "That space, the 3D space, the space out there, the space in your mind's eye, the spatial intelligence that enable people to do so many things, that's beyond language is a critical part of intelligence." (00:00, 11:33)
- Origins of World Labs:
- The idea came from repeated observations—both by Fei-Fei and Martin Casado—that AI needed a paradigm shift toward “world models” that capture and reason about the 3D physical world. (05:09)
2. Intellectual Partnership & Founding World Labs
- Why Martin Casado as First Investor:
- Fei-Fei wanted not just a financial backer but an “intellectual partner.” She sought a computer scientist who understood both technology and market dynamics, someone who could “be on the phone or in person with me every moment of the day as an intellectual partner.” (03:36)
- Early Conversations Around World Models:
- They realized that most investors and technologists didn’t fully grasp the concept of a world model, often offering only "polite nods." Martin stood out as someone who truly understood the idea: "The way he defined it about an AI model that truly understand the 3D structure, shape and the compositionality of the world was exactly what I was talking about." (05:39)
3. Why LLMs Are Not Enough
- Human Intelligence Is Deeply Spatial:
- “Language is a lossy way to capture the world.” (07:07)
- The physical, perceptual, visual world exists independent of language; evolutionary intelligence is built on spatial and embodied experience.
- Limitations of Language Models:
- Language is great for abstract thought but insufficient for encoding spatial or physical reality—crucial for robotics, creativity, and interacting with the environment.
- "If I put you in a room and blindfolded you and I just described the room and then I asked you to do a task, the chances of you being able to do it are very little." (08:59, Martin Casado)
4. Spatial Intelligence: Evolutionary Perspective
- Language vs. Spatial Reasoning:
- Martin: "The part of our brain that actually deals with language is actually pretty recent ... but the part of the brain that actually does the navigation, you know, the spatial, has been around ... 500 million years." (10:24–11:07)
- Fei-Fei: "That double helix in 3D space. There's no way you can use language alone to reason that out." (11:33)
5. Applications and the Digital Multiverse
- Creativity and Design:
- "Creativity is very visual … From design to movie to architecture to industry design … that alone is a highly visual perceptual spatial area.” (12:58)
- Robotics and Embodied AI:
- All robots, humanoid or otherwise, must understand and navigate 3D space, which requires new AI capabilities.
- The Multiverse Concept:
- Advanced AI world models will allow us to “create infinite universes”—spaces for robots, creativity, travel, storytelling, and socialization—in both physical and digital worlds. (13:23; repeated from the opening quote)
6. Horizontal Technology: Foundational Impact
- Horizontal Scope:
- Martin analogizes world models to LLMs: “The same LLM we use for an emotional conversation, we use it to write code… So with these [world] models, you can take a view of the world … and then you could actually create a 3D full representation…” (14:44)
- Generativity:
- World models enable full 3D reconstructions, manipulations, and creations in both the digital and physical realms. They reach into gaming, art, robotics, architecture, and more. (15:30)
7. Why 3D is Fundamental — Not Just 2D
-
Limitations of 2D:
- "Physics happens in 3D, and interaction happens in 3D. Navigating behind the back of the table needs to happen in 3D, composing the world...needs to happen in 3D. So fundamentally, the problem is a 3D problem." (17:21, Fei-Fei Li)
- Martin: “If that’s 2D [for a robot], and then you ask the robot ... distance or to grab something, that information’s missing ... you need to provide that information ... so that you can actually navigate in 3D space. And so 2D video is great if it’s a human, because we already can turn it into 3D. But ... any computer program ... need[s] to be 3D.” (18:02)
-
A Personal Lens:
- Fei-Fei Li’s own loss of stereo vision after a cornea injury led to firsthand insight:
- "I was just driving in my own neighborhood, and I realized I don't have a good distance measure between my car and the parked car on a local small road ... That was exactly why we needed stereo vision." (19:12)
- Fei-Fei Li’s own loss of stereo vision after a cornea injury led to firsthand insight:
8. Technological State-of-the-Art in World Models
- Pioneering Techniques and Team:
- The field builds on innovations like Neural Radiant Fields (NeRF), Gaussian splat representations, GANs, and style transfer.
- “At World Lab we just have the conviction that we're going to be all in on this one singular big North Star problem, concentrating on the world's smartest people … All of them coming to this one team and try to make this work and to productize this.” (20:06)
- Multidisciplinary Team Necessary:
- Martin: "You need experts both in AI ... and graphics, which is like, how do you actually represent these things in memory ... It takes a very special team to actually crack this problem, which Fei Fei has managed to put together." (22:03)
Notable Quotes & Memorable Moments
- On the need for spatial intelligence in AI:
- Fei-Fei Li, 00:00 & 11:33: “That space, the 3D space, the space out there, the space in your mind's eye, the spatial intelligence that enable people to do so many things, that's beyond language is a critical part of intelligence.”
- On the power of world models:
- Fei-Fei Li, 12:58 & 13:23: “We can actually create infinite universes. Some are for robots, some are for creativity, some are for socialization, some are for travel, some are for stories. It suddenly will enable us to live in a multiverse way. The imagination is boundless.”
- On the evolutionary basis for spatial intelligence:
- Martin Casado, 10:24–11:07: “The part of our brain that actually deals with language is actually pretty recent ... the part that does navigation ... has been around ... 500 million years.”
- On the real-world importance of depth perception:
- Fei-Fei Li, 19:12: "I realized I don't have a good distance measure ... That was exactly why we needed stereo vision."
- On the need for a multidisciplinary approach:
- Martin Casado, 22:03: "To solve this problem you need experts both in AI ... and graphics ... It takes a very special team to actually crack this problem, which Fei Fei has managed to put together."
Timestamps for Key Segments
- 00:00–00:40: Opening theme – spatial intelligence and world models as the next frontier in AI.
- 02:06–04:06: Fei-Fei’s background and what she needed in an "intellectual partner" to launch World Labs.
- 05:09–06:10: The challenge of explaining the significance of world models to others; Martin's unique understanding.
- 07:07–08:59: Limitations of language for encoding and acting upon the physical world.
- 10:23–11:33: The evolutionary foundation and depth of spatial intelligence.
- 12:58–14:44: Real-world and horizontal applications for world models; the multiverse vision.
- 17:21–18:37: Why AI must operate in 3D; limitations of 2D approaches.
- 19:12–20:01: Personal anecdote on stereo vision and embodied intelligence.
- 20:06–22:03: State of the art in computer vision and the technical team at World Labs.
Conclusion
Fei-Fei Li and Martin Casado argue that building AI systems capable of perceiving, reasoning, and acting within 3D worlds—"world models"—is essential for general intelligence. This approach enables new horizons for robotics, creativity, and virtual environments, pushing AI beyond the boundaries of language into the boundless possibility of a digital multiverse. Their conversation underscores a pivotal shift in industry focus, powered by deep technical expertise and a vision that spatial intelligence is the core substrate of truly intelligent machines.
