Latent Space Podcast: The Shape of Compute – Chris Lattner (Modular)
Date: June 13, 2025
Host(s): Alessio (Decibel CTO), Swix (founder, Small AI)
Guest: Chris Lattner (Co-founder & CEO, Modular)
Episode Overview
In this episode of Latent Space, Chris Lattner, renowned software architect, co-founder of Modular, and creator of foundational technologies like LLVM and Swift, shares how Modular is shaping the future of AI infrastructure. The discussion journeys through Modular's past year—transitioning from stealth, research-driven development to open source releases, discoveries in cross-hardware AI frameworks, new approaches to programming language design (Mojo), overcoming industry skepticism, and the philosophy behind democratizing and accelerating AI compute. Listeners get an insider's view into the technical challenges of AI serving, community engagement, and Lattner's builder mindset, both at work and at home.
Key Discussion Points & Insights
1. Modular’s Journey: From Stealth to Open Source
-
Initial Phase (“Prove it’s Possible”):
Modular spent its first three years in a closed, research-heavy phase, tackling “impossible problems” like creating a viable, vertical AI stack that matches or beats state-of-the-art performance on Nvidia GPUs—without using CUDA.“The challenge is prove that we can do something people think is impossible.” – Chris Lattner [03:13]
-
Transition to Productization:
After achieving high-performance, narrow-scope model serving (Llama 3, A100s, etc.), the focus shifted to engineering: expanding hardware support (e.g., H100, AMD MI300), increasing model coverage (now 500+), and regular, six-week release cycles.“[Now] every six weeks we've been shipping a new release… And as you do that... now, suddenly, it's like, oh, okay, I get it.” – Chris Lattner [04:51]
-
Open Sourcing and Community Engagement:
Having confidence in its technology, Modular opened up its codebase, enabling hackathons and rapid onboarding:“The winning team at the hackathon… had not used Mojo before, they hadn't programmed GPUs… they built a training system, wrote an Adam optimizer… in a single day.” – Chris Lattner [05:42]
2. The Evolution of Mojo and the Modular Stack
-
Why a New Language:
The need to unify programming across CPUs, GPUs, TPUs, and future hardware, while retaining Python-like usability and Rust-level performance, led to Mojo.“Let me be the first to tell you... C sucks.… AI people generally don't love C. What do they love? Python.” – Chris Lattner [12:52]
-
Mojo’s Strengths:
- Binding-free drop-in for Python hot paths, soon to be the best way to extend Python (in nightly builds as of recording).
- Performance (faster than Rust) and future extensibility to new chip architectures.
- Not a full Python replacement yet (“no classes, but supports Python objects”).
“Mojo doesn't have all the features that Python does… but as functions… you can get this binding free experience… move that to Mojo, and now you get performance wins.” – Chris Lattner [17:17]
-
Modular’s Stack Hierarchy:
- Mojo: For low-level/high-performance, cross-hardware code.
- Max: Inference-focused GenAI framework, for deploying and customizing LLMs with native, efficient code.
- Emphasizes modularity, composability, and cluster management.
- Auto kernel fusion for productivity and speed.
- 500+ model families now available.
- Cluster Level: Orchestrates workload-intelligent routing, disaggregated prefill, simplifies large-scale operations across heterogeneous hardware.
“What my number one goal right now is to drive out complexity both of our stack… and AI in general. And AI in general has way more tech debt than it deserves.” – Chris Lattner [22:51]
3. Competing Frameworks & Philosophical Approach
-
Comparing with VLLM & SGLang:
- Lattner admires SGLang’s focused, “do a small number of things really well” approach.
- VLLM seen as broad but inconsistent across hardware and features; sometimes falls short on promised “works everywhere” abstraction.
“That approach... having a sparse matrix of things that actually work is very different than the conservative approach, which is saying, okay, well we do a small number of things really well and you can rely on it.” – Chris Lattner [25:08]
-
On Modularity:
- Modularity enables resilience and adaptation in a fast-evolving hardware landscape.
- Clean, modular architectures support quicker integration of state-of-the-art research and features as hardware and techniques evolve.
“If we can accelerate things, we get more product value into our lives, we get more impact, we get all the good things that come with AI. I’m a maximalist, by the way.” – Chris Lattner [28:39]
-
Remixing and Composability:
Lattner is explicit that Modular borrows good ideas from anywhere, but prioritizes making features composable, orthogonal, and expressive.“I'm very shameless about using good ideas from wherever they come. Like everything's a remix, right?” – Chris Lattner [29:28]
4. Philosophy on Open Source, Business Models & Democratization
-
Modular’s Business Model:
- Mojo & Max: Free for Nvidia/CPU at any scale.
- Paid for: Cluster management, enterprise features, support — priced per GPU.
- Focus is on broad adoption and embedding Max into other frameworks (“I’d love to see PyTorch adopt Max” [42:44]).
“We're doing this on hard mode. We're not going to build a data center for you… I want those people to use Max. I'm pretty good at this software thing.” – Chris Lattner [37:33]
-
Why not Fully Hosted Endpoints?
Modular empowers teams to own and control their stack, upskill internally, and avoid complexity cliffs, instead of “taking AI off your plate.”“We don't take AI away from the enterprise, we give power over it back to their team.” – Chris Lattner [39:20]
-
On Democratizing AI (“Distribution, not just Research”):
- Points out PyTorch democratized training—Modular’s mission is to democratize high-performance inference at scale.
- Enabling “the rest of the world” to benefit from techniques previously embedded only in trillion-dollar tech companies.
“Nobody democratized inference. Inference always remained a black art.” – Chris Lattner [35:24]
5. Technical Deep-Dives
-
Reverse Engineering Proprietary Layers (DeepSeek & PTX):
DeepSeek’s work is a critical industry wake-up call: performance means digging into deepest layers (PTX, below CUDA), with each GPU generation demanding reimplementation.“If you work at this level of the stack... you have power over compute, you can understand and solve problems, you can drive research forward in ways that... nobody else can do. And this is what the trillion dollar companies do.” – Chris Lattner [54:42]
-
Why Inference Focus?
Inference, not training, maps to the real adoption curve: “Training scales with your research team, inference scales with your customer base.” [59:11] -
Mojo’s Value with AI Coding Agents:
By open sourcing Mojo, Modular enables coding agents (e.g., Cursor) to accelerate developer onboarding and productivity, reducing the barrier to new high-perf code.“Learning a new language is actually easy when the AI is doing a lot of the mechanical stuff for you.… For new language adoption… AI is cool.” – Chris Lattner [69:36]
Notable Quotes & Moments
-
On Being Told It’s Impossible:
“Across my career, like with llvm, all the GCC people told me it was impossible… humans don’t like change. It takes time to diffuse into the ecosystem for people to process it.” – Chris Lattner [09:00]
-
On Why Build a New Programming Language (Mojo):
“I want something that can expose the full power of the hardware—Not just one chip... but the full power of any chip it has to support... With a member of the Python family, make it much more accessible… but also performance.” – Chris Lattner [12:52]
-
On The Cost of Hardware Lock-in:
“When Blackwell comes out, you have to throw [the kernel] away and write a new one… That's what VLLM is.” – Chris Lattner [56:57]
-
On Product Traction and Technology S-curves:
“I’m in it for a long game… keep making things better and better—there’s like an S curve of technology adoption.” – Chris Lattner [43:34]
-
On Open Source, Shipping, and Industry Impact:
“My values are aligned with people who ship stuff because that's what impacts the world.” – Chris Lattner [52:41]
-
On Leadership and Startups:
“I've built large teams from scratch before, but they've all been at established companies… In a startup, you sometimes get in some hot messes… it’s very personal… but you just have to trust the process and keep pushing.” – Chris Lattner [62:57]
Timestamps for Important Segments
| Time | Segment/Topic | |-----------|----------------------------------------------------------------------------| | 01:21 | Modular’s origin, three-year R&D stealth research, philosophy of extreme focus (Mojo, unlocking heterogenous compute) | | 02:50 | Achieving “state-of-the-art” on Llama 3 serving without CUDA; the impossibility mindset | | 04:51 | Transition to engineering/frequent releases, widening hardware/model support | | 06:55 | From CPU optimization to generalizing for GPUs, and the internal milestones map | | 12:52 | Mojo’s design philosophy, why C & Python aren't enough, binding-free embedding | | 19:22 | MAX: GenAI inference framework, model-level vs kernel-level modularity, auto kernel fusion | | 23:17 | Take on VLLM vs SGLang; focused quality vs spread-thin community frameworks | | 26:43 | Value of modularity; lessons from LLVM and AI | | 29:28 | Remixing research, integrating state-of-the-art innovations | | 35:24 | On democratizing inference (vs prior focus on model training) | | 37:07 | Modular's business model - where’s the value, what’s free, what’s paid | | 43:00 | Industry adoption, vision for integration into other frameworks (PyTorch, SGLang, VLLM) | | 45:10 | Reflecting on Swift launch at Apple vs Mojo, lessons learned | | 53:17 | DeepSeek and the realities of pushing below CUDA/PTX stack, hardware cycles | | 59:11 | Why inference scales with impact, and Modular’s bet there | | 69:36 | Mojo, open source, and “zero to hero” onboarding with AI code agents | | 73:31 | Keeping up with research—arXiv, Reddit, and personal techniques | | 74:47 | Personal projects: modular LEGO robotics tables, woodworking with children |
Personal Insights and Storytelling
-
Balancing Leadership and Engineering:
Lattner discusses the emotional impact of team management (“it hurts a little when people leave”), the difference in moving from established teams to building a startup's culture, and how pursuing “impossible” goals repeatedly is part of his DNA. [62:57] -
Daily Routine:
Balances family time, health (morning hill walks with dogs), long workdays, family dinners, late-night “second workdays,” and carves out uninterrupted time on weekends. [66:09] -
Learning and Collaboration:
Weekly walks with co-founder Tim for strategic alignment; executive team and family for grounding and “EQ support.” [67:33, 68:20] -
Hobby Building:
Continues to create modular tables and woodworking projects with his kids; finds joy in building and the process of discovery. [74:47]
Calls to Action & Closing
- Modular is hiring elite engineers across GPU programming, AI models, inference, and cloud-scale/Kubernetes.
- Open source contributors are welcome; Mojo and Max have evolved substantially, and the codebase is a trove for those wanting to learn modern GPU and AI programming.
- Enterprises struggling with AI scaling are encouraged to reach out.
“Lots more people should be programming GPUs. I think this is a huge opportunity for the industry.” – Chris Lattner [77:07]
Summary prepared for listeners seeking a comprehensive understanding of Modular’s journey and the evolving shape of AI compute from Chris Lattner’s unique and engaging perspective.
