Summary9 min read

Latent Space Podcast: The Shape of Compute – Chris Lattner (Modular)

Date: June 13, 2025
Host(s): Alessio (Decibel CTO), Swix (founder, Small AI)
Guest: Chris Lattner (Co-founder & CEO, Modular)

Episode Overview

In this episode of Latent Space, Chris Lattner, renowned software architect, co-founder of Modular, and creator of foundational technologies like LLVM and Swift, shares how Modular is shaping the future of AI infrastructure. The discussion journeys through Modular's past year—transitioning from stealth, research-driven development to open source releases, discoveries in cross-hardware AI frameworks, new approaches to programming language design (Mojo), overcoming industry skepticism, and the philosophy behind democratizing and accelerating AI compute. Listeners get an insider's view into the technical challenges of AI serving, community engagement, and Lattner's builder mindset, both at work and at home.

Key Discussion Points & Insights

1. Modular’s Journey: From Stealth to Open Source

Initial Phase (“Prove it’s Possible”):
Modular spent its first three years in a closed, research-heavy phase, tackling “impossible problems” like creating a viable, vertical AI stack that matches or beats state-of-the-art performance on Nvidia GPUs—without using CUDA.

“The challenge is prove that we can do something people think is impossible.” – Chris Lattner [03:13]
Transition to Productization:
After achieving high-performance, narrow-scope model serving (Llama 3, A100s, etc.), the focus shifted to engineering: expanding hardware support (e.g., H100, AMD MI300), increasing model coverage (now 500+), and regular, six-week release cycles.

“[Now] every six weeks we've been shipping a new release… And as you do that... now, suddenly, it's like, oh, okay, I get it.” – Chris Lattner [04:51]
Open Sourcing and Community Engagement:
Having confidence in its technology, Modular opened up its codebase, enabling hackathons and rapid onboarding:

“The winning team at the hackathon… had not used Mojo before, they hadn't programmed GPUs… they built a training system, wrote an Adam optimizer… in a single day.” – Chris Lattner [05:42]

2. The Evolution of Mojo and the Modular Stack

Why a New Language:
The need to unify programming across CPUs, GPUs, TPUs, and future hardware, while retaining Python-like usability and Rust-level performance, led to Mojo.

“Let me be the first to tell you... C sucks.… AI people generally don't love C. What do they love? Python.” – Chris Lattner [12:52]
Mojo’s Strengths:
- Binding-free drop-in for Python hot paths, soon to be the best way to extend Python (in nightly builds as of recording).
- Performance (faster than Rust) and future extensibility to new chip architectures.
- Not a full Python replacement yet (“no classes, but supports Python objects”).
“Mojo doesn't have all the features that Python does… but as functions… you can get this binding free experience… move that to Mojo, and now you get performance wins.” – Chris Lattner [17:17]
Modular’s Stack Hierarchy:
1. Mojo: For low-level/high-performance, cross-hardware code.
2. Max: Inference-focused GenAI framework, for deploying and customizing LLMs with native, efficient code.
  - Emphasizes modularity, composability, and cluster management.
  - Auto kernel fusion for productivity and speed.
  - 500+ model families now available.
3. Cluster Level: Orchestrates workload-intelligent routing, disaggregated prefill, simplifies large-scale operations across heterogeneous hardware.
“What my number one goal right now is to drive out complexity both of our stack… and AI in general. And AI in general has way more tech debt than it deserves.” – Chris Lattner [22:51]

3. Competing Frameworks & Philosophical Approach

Comparing with VLLM & SGLang:
- Lattner admires SGLang’s focused, “do a small number of things really well” approach.
- VLLM seen as broad but inconsistent across hardware and features; sometimes falls short on promised “works everywhere” abstraction.
“That approach... having a sparse matrix of things that actually work is very different than the conservative approach, which is saying, okay, well we do a small number of things really well and you can rely on it.” – Chris Lattner [25:08]
On Modularity:
- Modularity enables resilience and adaptation in a fast-evolving hardware landscape.
- Clean, modular architectures support quicker integration of state-of-the-art research and features as hardware and techniques evolve.
“If we can accelerate things, we get more product value into our lives, we get more impact, we get all the good things that come with AI. I’m a maximalist, by the way.” – Chris Lattner [28:39]
Remixing and Composability:
Lattner is explicit that Modular borrows good ideas from anywhere, but prioritizes making features composable, orthogonal, and expressive.

“I'm very shameless about using good ideas from wherever they come. Like everything's a remix, right?” – Chris Lattner [29:28]

4. Philosophy on Open Source, Business Models & Democratization

Modular’s Business Model:
- Mojo & Max: Free for Nvidia/CPU at any scale.
- Paid for: Cluster management, enterprise features, support — priced per GPU.
- Focus is on broad adoption and embedding Max into other frameworks (“I’d love to see PyTorch adopt Max” [42:44]).
“We're doing this on hard mode. We're not going to build a data center for you… I want those people to use Max. I'm pretty good at this software thing.” – Chris Lattner [37:33]
Why not Fully Hosted Endpoints?
Modular empowers teams to own and control their stack, upskill internally, and avoid complexity cliffs, instead of “taking AI off your plate.”

“We don't take AI away from the enterprise, we give power over it back to their team.” – Chris Lattner [39:20]
On Democratizing AI (“Distribution, not just Research”):
- Points out PyTorch democratized training—Modular’s mission is to democratize high-performance inference at scale.
- Enabling “the rest of the world” to benefit from techniques previously embedded only in trillion-dollar tech companies.
“Nobody democratized inference. Inference always remained a black art.” – Chris Lattner [35:24]

5. Technical Deep-Dives

Reverse Engineering Proprietary Layers (DeepSeek & PTX):
DeepSeek’s work is a critical industry wake-up call: performance means digging into deepest layers (PTX, below CUDA), with each GPU generation demanding reimplementation.

“If you work at this level of the stack... you have power over compute, you can understand and solve problems, you can drive research forward in ways that... nobody else can do. And this is what the trillion dollar companies do.” – Chris Lattner [54:42]
Why Inference Focus?
Inference, not training, maps to the real adoption curve: “Training scales with your research team, inference scales with your customer base.” [59:11]
Mojo’s Value with AI Coding Agents:
By open sourcing Mojo, Modular enables coding agents (e.g., Cursor) to accelerate developer onboarding and productivity, reducing the barrier to new high-perf code.

“Learning a new language is actually easy when the AI is doing a lot of the mechanical stuff for you.… For new language adoption… AI is cool.” – Chris Lattner [69:36]

Notable Quotes & Moments

On Being Told It’s Impossible:

“Across my career, like with llvm, all the GCC people told me it was impossible… humans don’t like change. It takes time to diffuse into the ecosystem for people to process it.” – Chris Lattner [09:00]
On Why Build a New Programming Language (Mojo):

“I want something that can expose the full power of the hardware—Not just one chip... but the full power of any chip it has to support... With a member of the Python family, make it much more accessible… but also performance.” – Chris Lattner [12:52]
On The Cost of Hardware Lock-in:

“When Blackwell comes out, you have to throw [the kernel] away and write a new one… That's what VLLM is.” – Chris Lattner [56:57]
On Product Traction and Technology S-curves:

“I’m in it for a long game… keep making things better and better—there’s like an S curve of technology adoption.” – Chris Lattner [43:34]
On Open Source, Shipping, and Industry Impact:

“My values are aligned with people who ship stuff because that's what impacts the world.” – Chris Lattner [52:41]
On Leadership and Startups:

“I've built large teams from scratch before, but they've all been at established companies… In a startup, you sometimes get in some hot messes… it’s very personal… but you just have to trust the process and keep pushing.” – Chris Lattner [62:57]

Timestamps for Important Segments

| Time | Segment/Topic | |-----------|----------------------------------------------------------------------------| | 01:21 | Modular’s origin, three-year R&D stealth research, philosophy of extreme focus (Mojo, unlocking heterogenous compute) | | 02:50 | Achieving “state-of-the-art” on Llama 3 serving without CUDA; the impossibility mindset | | 04:51 | Transition to engineering/frequent releases, widening hardware/model support | | 06:55 | From CPU optimization to generalizing for GPUs, and the internal milestones map | | 12:52 | Mojo’s design philosophy, why C & Python aren't enough, binding-free embedding | | 19:22 | MAX: GenAI inference framework, model-level vs kernel-level modularity, auto kernel fusion | | 23:17 | Take on VLLM vs SGLang; focused quality vs spread-thin community frameworks | | 26:43 | Value of modularity; lessons from LLVM and AI | | 29:28 | Remixing research, integrating state-of-the-art innovations | | 35:24 | On democratizing inference (vs prior focus on model training) | | 37:07 | Modular's business model - where’s the value, what’s free, what’s paid | | 43:00 | Industry adoption, vision for integration into other frameworks (PyTorch, SGLang, VLLM) | | 45:10 | Reflecting on Swift launch at Apple vs Mojo, lessons learned | | 53:17 | DeepSeek and the realities of pushing below CUDA/PTX stack, hardware cycles | | 59:11 | Why inference scales with impact, and Modular’s bet there | | 69:36 | Mojo, open source, and “zero to hero” onboarding with AI code agents | | 73:31 | Keeping up with research—arXiv, Reddit, and personal techniques | | 74:47 | Personal projects: modular LEGO robotics tables, woodworking with children |

Personal Insights and Storytelling

Balancing Leadership and Engineering:
Lattner discusses the emotional impact of team management (“it hurts a little when people leave”), the difference in moving from established teams to building a startup's culture, and how pursuing “impossible” goals repeatedly is part of his DNA. [62:57]
Daily Routine:
Balances family time, health (morning hill walks with dogs), long workdays, family dinners, late-night “second workdays,” and carves out uninterrupted time on weekends. [66:09]
Learning and Collaboration:
Weekly walks with co-founder Tim for strategic alignment; executive team and family for grounding and “EQ support.” [67:33, 68:20]
Hobby Building:
Continues to create modular tables and woodworking projects with his kids; finds joy in building and the process of discovery. [74:47]

Calls to Action & Closing

Modular is hiring elite engineers across GPU programming, AI models, inference, and cloud-scale/Kubernetes.
Open source contributors are welcome; Mojo and Max have evolved substantially, and the codebase is a trove for those wanting to learn modern GPU and AI programming.
Enterprises struggling with AI scaling are encouraged to reach out.

“Lots more people should be programming GPUs. I think this is a huge opportunity for the industry.” – Chris Lattner [77:07]

Summary prepared for listeners seeking a comprehensive understanding of Modular’s journey and the evolving shape of AI compute from Chris Lattner’s unique and engaging perspective.

Loading summary

Transcript223 lines

[00:00]
Chris Lattner
Foreign.
[00:06]
Alessio
Welcome to the Latent Space Podcast. This is Alessio, partner and CTO at Decibel. And I'm joined by my co host, Swix, founder of Small AI.
[00:13]
Swix
And we're so excited to be back in the studio with Chris Lattner from Mojo Modular. Welcome back.
[00:18]
Chris Lattner
Yeah, I'm super excited to be here. As you know, I'm a huge fan of both of you and also the podcast and a lot of what's going on in the industry. So thank you for having me.
[00:24]
Swix
Thanks for keeping us involved in your journey. And I think, like, you spend a lot of time writing when obviously your time is super valuable educating the rest of the world. I just saw a two and a half hour workshop with the GPU mode guys and that was super exciting. We're decked out in your swag.
[00:44]
Chris Lattner
Amazing. I love it.
[00:45]
Swix
We'll get to the part where you are a personal human machine of just productivity and you do so much. I think there's a lot to learn for people just on a personal level, but I think a lot of people are going to be here just for the state of modular. We're also calling it the Shape of Compute, I think is going to be probably the podcast title.
[01:04]
Chris Lattner
Yeah, it's super exciting. I mean, there's so much going on in the industry with hardware and software and just innovation everywhere.
[01:10]
Swix
Most people can catch up on like the first episode that we did and we introduced modular. I think people want to know, like, I think since then you sort of open sourced it, there's been a lot of updates. What would you highlight as like the past year or so of updates? Yeah.
[01:22]
Chris Lattner
So if you zoom out and say, what is modular? We're a company that's just over three years old, three and a quarter. So yeah, we're quite a ways in. The first three years was a very mysterious time to a lot of people because I didn't want people to really use our stuff. Okay. And so why, why that? Why do you build something that you don't want people to use? Well, it's because we're figuring it all out. And so the way I explain is we were very much in a research phase and so we were trying to solve this really hard problem of how do we unlock heterogeneous compute? How do we enable GPU programming to be way easier? How do we enable innovation that's full stack across the AI stack by making it way simpler and driving out complexity? Like there are these core questions and I had a lot of hypotheses. Right. But building an alternate AI stack that is not as good as the existing one isn't very useful because people will always evaluate it against state of the art. What I did and what the team did and what we all did together is we said, okay, well, let's get to the point where at least Chris is happy and I have very high standards. And so we need to be state of the art on Nvidia GPUs meeting and beating Nvidia's best on things like a Llama 3 model, which by the way, is serving end to end, like very high bar. By the way, this is like something that's a pretty well studied problem. Like, it's not Nvidia that's working on this. You're up against industry, the best in the world that are. That's doing this. Exactly.
[02:43]
Swix
And so what is state of the art? Just as a rough.
[02:45]
Chris Lattner
I don't remember the number of tokens per second.
[02:47]
Swix
Tokens per second. And it's roughly between 800 to 1000.
[02:50]
Chris Lattner
Shared GPT benchmark and like industry standard kinds of thanks. And so I said to the team, like, look, until we can do that, I don't believe it's real. Yeah, it turns out that, yeah, there's all these things. There's page attention, continuous batching, there's GPU kernels, there's programming languages, there's a whole bunch of hardware stuff, and there's all this stuff. And I said, oh, by the way, we can't use Cuda.
[03:12]
Swix
That's the whole point.
[03:13]
Chris Lattner
That's the whole point, Right? And so it's not a, let's pick the shortest, easiest path to get to a demo. This is a, let's do the hardest, most fundamental thing that everybody is telling me, as usual, that it's impossible, right? This can't be done. Right. And so for those first three years, right, the challenge is prove that we can do something that people think is impossible and at least prove to me. And so what do you do for that? What you do is you clear the deck. You try to get enough distractions out of the way so you can focus, you can iterate, you can move quickly, you don't have to argue with committees. And so we want to be open to a certain extent because we want people to be aware of like Mojo and the things we were doing before, so we can hire people. And there's very specific reasons. We don't really want design by committee, we don't want a lot of these other things. And so across that research mode, it's like the primary thing I care about is prove that it's possible and make me happy. And we transitioned right at the end of December we had this release where we achieved that goal. And it was a super narrow release. It ran just on a 100, just one model, but it had state of the art performance. It was a full stack, vertically integrated thing. And my whole team is telling me like, oh my God, it's like they pass out and could take time off for Christmas, but then they come back and it's like, okay, well everything sucks. It's not very good. Like we got to work.
[04:29]
Swix
They changed their minds after they went over.
[04:30]
Chris Lattner
No, no, it worked okay. We achieved the goal, but it's not very good. The code is ugly, there's technical debt, there's this and that. And the other thing, it only runs an A100. It's only one model. This isn't useful. This isn't like a valuable contribution. And you know, I built, built some things in the past, right? And so I said, well, that's okay. We have one thing that works end to end and we know 400 ways to make it better.
[04:51]
Swix
Yeah.
[04:51]
Chris Lattner
And so instant mode switch. And at that point you say, okay, well let's just break it down like a normal engineering problem. It's not R and D anymore, it's an engineering problem. You say, okay, cool, let's refactor these APIs. Let's deprecate this thing. Let's. Oh yeah, let's add H100 support. Let's add function calling and token sampling and like all the different things you need, you can project manage that, right? And so every six weeks we've been shipping a new release and so we added all the function calling features and now you have agentic workflows. We have 500 models, we have H100 support. We're about to launch our AMD Mi 30325 support. That'll be a big deal for the industry. And as you do that in Blackwell, like all this stuff is like all now in the product. And so as this happens now, suddenly it's like, oh, okay, I get it. But this is a very fundamentally different phase for us because it works, right? And once it works, you can see it end to end. Lots of people can put the pieces together in their brain. It's not just in my brain and have few other people that understand how all the different individual pieces work now. Everybody can see it. So as we face shift now, suddenly it's like, yeah, okay, let's open source it. Well, now we want more people involved. Now let's do each of these things. It's actually we had Hackathon and so we invited a hundred people to come visit and spend the day with us. And we learned GPU programming from scratch and so we built a very fancy inference framework. The winning team for the Hackathon took that four person team and one day. They had not used Mojo before, they hadn't programmed GPUs before and they had built a training system, they wrote an Atom optimizer, a bunch of training kernels, they built a simple back prop system and they actually showed that you could train a model using all the stuff we'd built for inference because it's so hackable and also because the AI coding tools are awesome. But, but this is the power of what you can do when you're ready to scale. Now if we had done that six months ago or 12 months ago or something, it would have been a huge mess, right? Because everything would break and there are a lot of bugs. And honestly today it's still an early state system, there's still some bugs, but now it's useful, it can solve real world problems. And so that's the difference and that's kind of the evolution that we've gone through as a team.
[06:56]
Alessio
I remember when we first had you at the start you were focused on CPU actually optimization. How long did you work on cpu and then how much of a jump was it to go from CPU to gpu?
[07:07]
Chris Lattner
The way I explained modular is if you take that first three year R& D journey and if you round a little bit, first year was prove compilation philosophy and so this was writing very abstract compiler stuff and then prove that we could make a matrix multiplication go faster than Intel MKL's matrix multiplication on intel silicon and make it configurable and multiple D types and prove like a very narrow problem. And do that by writing this MLIR compiler representation directly by hand, which was really horrible. But we proved the technology, it's some technology milestone. Year two was then say, okay, cool, I believe the fundamental approach can work, but guess what? Usability is terrible. Writing internal compiler stuff by hand sucks. And also matmul is a long ways from an AI framework. And so year two embarked on two paths. One is Mojo. So programming language syntax, member of the Python family, make it much more accessible and easy to write kernels and performance and all that kind of stuff. And then second, build an AI framework for CPUs as you say, where you could go beat OpenVINO and things like this. On Intel CPUs, end of year two, we got. And we're like, we've achieved this amazing thing. But you know what's cool? GPUs, right? And so again, two things we said, okay, well, let's prove we could do GPUs number one. Also, let's not just build, you know, air quotes, a Cuda replacement. Let's actually show that we could do something useful. Let's take on LLM serving. No big deal, right? And so again, two things where you say, let's go prove that we can do a thing, but then validate it against a really hard benchmark. And then that was. That's what brought us to year three.
[08:43]
Alessio
Yep.
[08:44]
Chris Lattner
So in each of these stages is really hard and lots of interesting technical problems. Probably the biggest problem that you face is you face people that are constantly telling you it's impossible, impossible. But again, you just have to be a little bit stubborn and believe in yourself and work hard and stay focused on milestones.
[08:57]
Swix
When they say impossible, do they mean impossible or very, very hard?
[09:00]
Chris Lattner
Well, so, I mean, it's common sense that CUDA's nearly 20 years old. Nvidia's got hundreds or thousands of people working on it. The entire world's been writing Cuda code for many years, very, very hard. And so it's. No, I mean, many people think it's impossible for a startup to do anything in the space. Like, that's just common sense. All these people have thrown all this money at all these different systems. They've all failed. Why is your thing going to succeed when all these other things built by other smart people have failed? Right? And so it's conventional wisdom that change is impossible. But hey, we're an AI. You know this like, change is all around us all the time, right? And so what you need to do is you need to map out what are the success criteria, what causes change to actually work. And across my career, like with llvm, all the GCC people told me it was impossible, right? Like LLVM will fail because GCC is 20 years old and has had hundreds of people working on it and so blah, blah, blah and spec benchmarks and whatever. Nobody told me it was impossible outside because it was secret. So that was a little bit different. But everybody inside Apple that knew about it said, no, no, no, Objective C is fine. Like, we should just improve Objective C. The world doesn't need new programming languages. New programming languages never get adopted. Right? And it's common sense that, like, new programming languages don't go anywhere that's conventional wisdom, you know, mlir, super funny. MLIR is another compiler thing. And so built this thing, brought to the LLVM community and said, hey, we open sources. Does LLVM want it? I know a few LLVM people, right? It was my PhD project, and all the LLVM Illuminati in the community had been working on LLVM for 15 years or something. They're like, no, no, LLM is good enough. We don't need a new thing. Machine learning is not that important, you know. And so again, obviously have developed a few skills to work through this kind of challenge, but you get back to the reality that humans don't like change. And when you do have change, it takes time to diffuse into the ecosystem for people to process it. And this is where we talk about hackathon. Well, you kind of have to teach people about new things. And so this is why it's really important to take time to do that. And the blog post series and things like this are all kind of part of this education campaign because none of this stuff is actually impossible. It's just really hard work. It requires an elite team and good vision and guidance and stuff like this. But it's understandably conventional wisdom that it's impossible to do this.
[11:15]
Alessio
And is the idea to build serving, basically, I don't need to tell you how much better this is. I can just actually serve the models using our platform and then you can see how much faster it is and then obviously you'll adopt it.
[11:27]
Chris Lattner
Yeah, well, so today you can download Max. It's available for free. You can scale it to thousands of GPUs you want for free. I'll tell you some cool things about it. So it's not as good as something like Vllm because it's missing some features and it only supports Nvidia and AMD hardware, for example. But by the way, the container is like a gigabyte.
[11:43]
Alessio
Wow.
[11:44]
Chris Lattner
Right? Well, why is it a gigabyte? Well, it's a completely new stack. It doesn't use Cuda. You can run arbitrary Pytorch models. And if that's the case, then, okay, you pull in some dependencies. But if you're running the Common LLMs and the Genai models that people really care about, well, guess what, it's a completely native stack. It's super efficient, doesn't have Python in the loop, and for Eager Mode, op dispatch and stuff like this, because you don't have all that dependency, guess what? Your server starts up really fast. So if you Care about horizontal auto scaling. That's actually pretty cool. If you care about reliability, it's pretty cool that you don't have all these weird things that are stacked in there. If you're wanting to do something slightly custom, guess what? You have full control over everything. And stuff's all open source and so you can go hack it. This thing's more open source than vlm, because VLM depends on all these crazy binary Cuda kernels and stuff like this that are just opaque blobs from Nvidia. Right. And so this is a very different world. And so I don't want everybody just like overnight drop vlm because I think it's a great project. Right. But I think there's some interesting things here and there's specific reasons that it's interesting to certain people.
[12:47]
Alessio
Can you just maybe run people through the pieces of Max? Because I think last time none of this existed.
[12:52]
Chris Lattner
Yeah, exactly, last time. It's been a long time ago, both in AI and also in martial time. Yeah. So the bottom of our stack, we think about it as concentric circles. So the inside is a program programming language called Mojo. Chris, why did you have to build another programming language? Right. Well, the answer is that none of the existing ones actually solve the problem. So what's the problem? Well, the problem is that all of compute is now accelerated. You have GPUs, you have TPUs, you have all these chips, you have CPUs also. And so we need a programming language that can scale across this. And so if you go shopping and you look at that like the best thing is, is C. Like there are things like OpenCL and SQL and there's like a million things coming out of the HPC community that have tried to scale across different kinds of hardware. Let me be the first to tell you, and I can say this now, I feel comfortable saying this, that C sucks. I've earned that right, having written so much. And so, and let me also claim AI people generally don't love C. What do they love? They love Python. Right? And so we decided to do is say, okay, well, even within the space of C, there isn't an actually good unifying thing that can talk to tensor cores and things like this. And so we said, okay, I want again, Chris, being unreasonable. I want something that can expose the full power of the hardware. Not just one chip coming from one vendor, but the full power of any chip it has to support the next level up. Very fancy compilers and other stuff like this, with graph compilers and stuff like that needs to be portable, so portable across vendors and enable portable code. So yeah, it turns out that an H100 and AMD chip are actually quite different. That is true, but there's a lot that you can reuse across that. And so the more common code that you can have, the better. The other piece is usability. And so we want something people can actually use and actually learn. And so it's not good enough just to have Python syntax, but you want to have performance and control. And again, full power of the hardware. And so this is where Mojo came from. And so today Mojo is really useful for two things. And so Mojo will grow over time, but today I recommend you use it for places you care about performance. So something running on a gpu, for example, or really high performance, doing continuous batching within a web server or something like this where you have to do fancy hashing. And if you care about performance, Mojo is a good thing. Other cool thing that we're about to ship, so stay tuned as it's the best way to extend Python. And so if you have a big blob of Python code, you care about performance, you want to get the performance part out of Python, you can make it, we make it super easy to move that to Mojo. Mojo is not just a little faster than Python, it's faster than Rust. So it's like tens of thousands of times faster than Python and it's in the Python family. And so you can literally just like rip out some for loops, put it in Mojo and now you get performance wins and then you can start improving the code in place, you can offload it to gpu, you can do this stuff and all the packaging is super simple and it's just a beautiful way to make Python code go fast just.
[15:52]
Swix
Just to double click. You said you were about to ship this.
[15:55]
Chris Lattner
It's technically in our nightly's, but we haven't actually announced it yet.
[15:57]
Swix
Isn't it okay.
[15:59]
Chris Lattner
We do a lot of that by the way, because we're very developer centric and so if you join our discord or discourse, night lease is great.
[16:04]
Swix
It's like the best program.
[16:06]
Chris Lattner
Yeah. And so we have a ton of stuff that's kind of unannounced but well known in the community. Okay, I thought this was already released.
[16:12]
Alessio
Yeah, and when you say rip out is this, you have a Python binding to then run the Mojo. Like you have C bindings or.
[16:19]
Chris Lattner
This is the cool thing is it's binding free. So think about like so today, again, sorry, I get excited about this. I forget how tragic the world is without outside of our walls. The thing we're competing with is if you have a big blob of Python code, which a lot of us do, you build it, build it, build it, build it. Performance becomes a problem. Okay, what do you do? Well, you have a couple of different things. You can say, I'm going to rewrite my entire application in Rust or something. Right. Some people do that. The other thing you can do is you can say, okay, I'm going to use PI bind or Nanobind or some Rust thingy and rewrite a part of my module, the performance critical part. And now I have to have this binding logic and this build system goop and all this complexity around. I have Rust code over here and I have Python code over here. Oh, by the way, you now have to hire people who can work on Rust and Python and like that. Like this is this fragments your team. It's very difficult to hire Rust people. Like, I love them, but there's just too few of them.
[17:17]
Alessio
Right.
[17:17]
Chris Lattner
And so what we're doing is we're saying, okay, well, let's keep the languages basically the same. So Mojo doesn't have all the features that Python does. Notably, Mojo doesn't have classes, but as functions. Yeah, you can use arbitrary Python objects, you can get this binding free experience and it's very similar. And so you can look at this as being like a super fast, slightly more limited Python. And now it's cohesive within the ecosystem. And so I think this is useful way beyond just the AI ecosystem. I think this is a pretty cool thing for Python in general. But now you bring in GPUs, you say like, okay, well I can take your code and you can make it real fast because CPUs have lots of fancy features like their own tensor cores and SIMD and all this stuff. Then you can put it on GPU and then you can put on eight GPUs and then you can and that, you know, and so like, you can keep going. And this is something that Rust can't do. You can't put Rust on a gpu, really. And things like this. And this is part of this new wave of technology that we're bringing into the world. And also, I'm sorry, I get excited about the stuff we've shipped and what we're about to ship, but this fall is going to be even more fun. Okay, stay tuned. There may be more hardware beyond AMD and Nvidia.
[18:25]
Swix
Nice.
[18:26]
Alessio
So that was Mojo, the first concentric circle.
[18:29]
Chris Lattner
Yeah, thank you for keeping me on track. So the inner circle is the programming language. Right. And so it's a good way to make stuff go fast. The next level out is you say, okay, well you know what's cool? AI. Have I convinced you? And so if you get into the world of AI, you start thinking about models and beyond models. Now we have gen AI and so you have things like pipelines, right? The entire pipeline where you have KV cache orchestration and you have stateful batching and I mean, you guys are the experts, agentic everything and all this kind of stuff. And so next level out is an AI, a very simple gen AI inference focused framework we call Max. And so Max has a serving component.
[19:05]
Swix
Is it Max engine or. Yeah, okay, there's.
[19:07]
Chris Lattner
We just call it Max. We got too complicated with sub brands. This is also part of our R and D on branding. Is that HPA has the same problem. Yeah, exactly. Well, also like who named llvm? I mean, what the heck is that, right?
[19:19]
Swix
So honestly, it's short, it's Googleable, not the worst.
[19:23]
Chris Lattner
Yeah. And VLM came and decided to mess with it. So Max, the way to think about it is it's not a Pytorch, that's not what it wants to be, but it's really focused on inference, it's really focused on performance and control and latency. And if you want to be able to write something that then gets Python out of the loop of the model, logic is really good for control. And so it dovetails and is designed to work directly with Mojo. And so within a lot of LLM applications, as you all know, there's a lot of very customized GPU kernels. And so you have a lot of crazy forms of attention, like the deep SEQ things that just came out. And like all this stuff is always changing. And so a lot of those are custom kernels, but then you have a graph level that's outside of it. And the way that has always worked is you have, for example, Cuda or things like Triton Lang or things like this on the inside and then you have Python on the outside. We've embraced that model. Like, don't fix what ain't broken. So we use straight Python for the model level. And so we have an API. It's very simple. It's not like designed to be fancy, but it feels kind of like a very simple Pytorch. And you can say, give me an attention block, give me these things, like can figure out these ops, but it directly integrates with Mojo. And so now you get full integration In a way that you can't get because none of the other frameworks and things like this can see into the code that you're running. And so this means that you get things like automatic kernel fusion. What's that? Well, that's a very fancy compiler technology that allows you to say, okay, you write one version of Flash attention and then cool, we can auto fuse in silu and the other activation functions that you might want to use. And you don't have to write all the permutations of these kernels. That just means you're more productive, that means you get better performance, that means like a lot of things, it just drives down complexity in the system. And so you shouldn't have to know there's a fancy compiler. Everybody should hate compilers. Like the only reason people should know about compilers is if they're breaking right? And so it just feels like a, a very nice, very ergonomic and efficient way to like build custom models and customize existing models and things like this. And so with max we have 5, 600 very common model families and implemented in that you can see builds.modular.com, we have a whole bunch of models and you can scroll through them and you can get the source code and play with them and do that. That's actually really great for people that care about serving and research and all this kind of stuff. Next layer out is you say, okay, well you have a very fancy way to do serving on a single node that's pretty useful and pretty important but you know, it's actually cool large scale deployment. And so we have a next level out cluster level. And so that's the. Okay, cool. I have a Kubernetes cluster, I've got a platform team, They've got a three year commit on 300 GPUs and now I have product teams. I want to throw workloads against this shared pool of compute. And the folks carrying the pagers want the product teams to behave and so they want to keep track of what's actually happening. And so that's the cluster level that goes out and each of the. And so very fancy prefix caching on a per node basis. And then you have intelligent routing and there's a whole bunch of disaggregated pre fill and like a whole bunch of cool technologies at each of these layers. But the cool thing about it is that they're all co designed and because the inside is heterogeneous you can say hey, I have some amd, I have some Nvidia, hey I throw a model on there, run it on the best architecture. Well, this actually simplifies a lot of the management problems and so a lot of the complexity that we've all internalized as being inherent to AI is actually really a consequence of these systems that are being designed together. And so to me, what my number one goal right now is to drive out complexity both of our stack because we do have some tech debt that we're still fixing, but of AI in general. And AI in general has way more tech debt than it deserves.
[23:02]
Swix
Well, yeah, I mean there's only so much that most people excluding you can do to comprehend their part of the stack and optimize it. I'm curious if you have any views or insider takes on what's happening with Vllm vs Sglang and everything coming out of Berkeley.
[23:18]
Chris Lattner
Yeah, I don't, I have outsider perspective. I don't have heavy inside knowledge. SG Lang seems to me as an outsider, so I'm not directly involved with either community. I just don't have time so I can be a fanboy without, without participating. But sglang, they have a beef going on. So I see. I don't know the politics and drama, but S.G. lang to me seems like a very focused team that has specific objectives and things they want to prove and they're executing really hard towards a specific set goals, much like the modular team.
[23:49]
Swix
Right.
[23:49]
Chris Lattner
And so that's. So I think there's some kindred spirits there. VLM seems much more like a massive community with a lot of stakeholders, a lot of stuff going on and it's kind of a hot mess and so but they want to be like a.
[24:01]
Swix
Default inference platform that everyone kind of benchmarks against.
[24:05]
Chris Lattner
Well and, and I mean it's as far as I know, it's like crushing TRTLM and some of the other older systems and things like that. So I think like metrics are good probably for that and so I can't speak to their ambition, but it seems structurally like they're very different approaches. One is say let's be really good at a small number of things that are really important. That's our approach too. One is saying let's just say yes to lots of things and then we'll have lots of things and some of them will work and some of them don't. One of the challenges I hear constantly about Vllm for what it's worth is if you go to go to their webpage, they say yeah, we support all of this hardware, we support Google, TPUs and Inferentia and AMD and Nvidia obviously, and CPUs and this and that and the other thing. So they have this huge, huge list of stuff that they, they support. But then you go and you try to follow any demo on some random web page saying here's how you do something with vlm. And if you do it on a non Nvidia piece of hardware, it fails. And so what is the value of something like vlm? Well, the value is you want to build on top of it generally, you don't want to be obsessing about the internals of it. If you have something that, you know, advertises that it works and then you pick it up and it doesn't work and I have to debug it, it's kind of a betrayal of the goal. And so to me, again, I know how hard it is. I know the fundamental reason why they're trying to build on top of a bunch of stuff below them that they didn't invent. That doesn't work very well, honestly. Right. Because we know that the hardware is hard and software for hardware is even harder these days. So I understand why that is. But, but that approach of saying here's all the things we can do and then having a sparse matrix of things that actually work is very different than the conservative approach, which is saying, okay, well we do a small number of things really well and you can rely on it. And so I can't say to which one's better. I mean, I appreciate them both and they both have great ideas and the fact that there's the competition is good for everybody in the industry.
[25:54]
Swix
And so they're in the benchmark wars state of the way that this industry evolves.
[25:59]
Chris Lattner
That's right. But I also hear just on the enterprise side of things that they're all so good they don't want to follow the drama. Right. And so like this is where having less chaos and more I can work with somebody is actually really valuable these days. And, and you need state of the art, you need performance and things like this. But, but actually having somebody that is executing well and works together can work with you is also super important.
[26:25]
Swix
Have we rounded out the offerings you, you talked about, Max? Yeah, I realized like, it impresses me that you named the company modular and you've designed all these things in a very modular way. I'm wondering if there are any sacrifices to that or is modular everything the right approach? There's no trade offs.
[26:44]
Chris Lattner
If you allow me to show my old age, my obsession with modular design came from back when creating llvm. So at the time there was gcc. GCC is again, I love it as a C and C compiler and stuff, right? And have tons of respect for it. But it's a monolith. It was designed from the old school unix, like, like compilers are UNIX pipe, everything is global variables, C code, it was written in knrc, if you even know what that is anymore. And so it was from a different epoch of software and LLVM came and said, you know, what everybody teaches you in school is you have a front end, an optimizer and a code generator. Let's make that clean, right? And so at the time people told me again, first of all, it's impossible to replace GCC because it's so established, et cetera, et cetera. But then also that you can't have clean design because look at what GCC did and it's successful and therefore you can't have performance or good support for hardware or whatever it is without that. They might be right if in an infinitely perfect universe, but we don't live in an infinitely perfect universe. Instead what we live in is a universe where you have teams, like you have people that are really good at writing parsers, people that are really good at writing optimizers, people that are really good at writing a code generator for x86 or something like that, right? And so these are very different skill sets. Also, like we see in AI, the requirements change all the time. And so if you write a monolith that is super hard code and hack today, well, two years from now, is it going to be relevant? And this is why what we see a lot in AI is we see these really cool and very promising systems that rise and grow very rapidly and then they end up falling. It's because they're almost disposable frameworks. This concept really comes. Language is a framework, it's. Each of these systems end up being different. But, but what I believe in very strongly is that in AI we want progress to go faster, not slower. That's controversial because it's already going so fast.
[28:39]
Swix
Right?
[28:39]
Chris Lattner
But the, but if we can accelerate things, we get more product value into our lives, we get more impact, we get all the good things that come with AI. I'm a maximalist, by the way. But with that comes the reality that everything will change and break. And so if you have modularity, if you have clean architecture, you can evolve and change the design. It's not over specialized for a specific use case. The challenge is you have to Set your baseline and metrics, right? This is why it's like compare against the best in the industry, right? And so you can't say, or at least it's unfulfilling to me to say let's build something that's 80% as good as the best. But it's got this other benefit I want to be best of in the categories we care about.
[29:17]
Alessio
And when it comes to using like a prefix, caching page attention, how do you decide what you want to innovate on versus like, hey, you are actually the team that has built the best thing. We're just going to go ahead and use that.
[29:29]
Chris Lattner
Yeah, I'm very shameless about using good ideas from wherever they come. Like everything's a remix, right? And so if somebody, if I don't care who it is, if it's Nvidia, if it's sglang or blm, and if somebody has a good idea, let's pull it together. But the key thing is make it composable and orthogonal and flexible and expressive. To me, what I look at is not just the things that people have done and put into vlm, for example, but the continuous stream of archive papers.
[29:55]
Swix
Right.
[29:55]
Chris Lattner
And so I follow, you know, there's a very vibrant industry around inference research. Yeah, Used to be it was just training research. Right. And much of that never gets into these standard frameworks. Right. And the reason for that is you have to write massively hand coded Cuda kernels and all this stuff for any new thing and I would want one new dtype and I have to change everything because nothing composes. So this is again where if you get some of these software architecture things, right, I mean, which admittedly requires you to invent a new programming language, There's a few hard parts to this problem, but the cool thing about that is you can move way faster. And so I'll give you an example of that. This is fully public because not only do we open source our thing, we open source all the version control history. And so you can go back in time and say, Chris likes open source software. By the way, let me convince you of this. I don't like people meddling in my stuff too early, but I like open source software. And so you can go look at how we brought up H100, built flash attention from scratch in a few weeks, built like all the stuff we're beating the treedao reference implementation that everybody uses, for example, written fully in Mojo. Again, all of our GPU kernels are written in Mojo. You can go see the History of the team building this, and it was done in just a few weeks.
[31:11]
Swix
Right.
[31:11]
Chris Lattner
And so we brought up H100, entirely new GPU architecture, if you're familiar. It has very fancy asynchronous features and tensor memory accelerator things and like all this goofy stuff they added, which is really critical for performance. And again, our goal is meet and beat CUDNN and TRTLM and these things. And so it's not. You can't just do a quick path scale success. You have to get everything right and everything line up because any little thing being wrong nerfs performance. And we did that in, I think it was less than two months public on GitHub. Right. And so like that velocity is just incredible. I think it took nine months or 12 months to invent flash attention and it took another six months to get into VLM. And this is just like the. That latter part is just integration work. Right. And so now you're talking about like building all this stuff from scratch in a composable way that scales against other architectures and has these advantages. It's just a different, different level. So anyways, I mean, our stuff's still early in many ways, and so we're missing features. But if you're interested in what we're doing, you can totally follow the changelogs and you can like follow them nightly, where we publish all the cool stuff and all the kernels are public. And so you can see and contribute to this as well, if you're interested.
[32:20]
Alessio
Yeah. Do you have any requests for projects that people should take on outside of modular that you don't want to bring in?
[32:26]
Chris Lattner
Absolutely. So we're a small team. I mean, we're over 100 people. But compared to the size of the.
[32:31]
Swix
Problem, the size of your ambitions.
[32:33]
Chris Lattner
Yeah, we're infinitesimal compared to the size of the AI industry. Right. And so this is where, for example, we didn't really care about Turing support. Turing's an older GPU architecture. And so somebody on the community is like, okay, I'll enable a new GPU architecture too. It's like contributed Turing support. Now you can use Max and Colab for free. That's pretty cool. There's a bunch of operators. So we're very focused on AI and gen and things like this. By the way, our stuff isn't AI specific, so people have written ray tracers and there's people doing flight safety and like all kinds of weird things. That flight simulation at our hackathon, somebody made a demo of, like, looking at. I think it was A voice transcript, the black box type traffic. And then predicting when the pilot had made a mistake and the airplane was going to have a big problem and predicting that with high confidence. So it's like your car, you know, you're driving your car until it starts beeping when you're not going down. That kind of thing for the FAA and stuff like this, right? And I know nothing about that. Like, this is not my domain, trust me. This is the power of. There's so many people in our industry that are almost infinitely smart. It feels like they're way smarter than I am in most ways. And you give them tools and you enable things. And also you have AI coding tools and things like this that help like bridge some of the gap. I think we're going to have so many more products. This is, this is really what motivates me. Right. And this is where I think we talked about last time. One of the things that really frustrated me years ago and inspired me to start modular in the first place is that I saw what the trillion dollar companies could do, right? You look at the biggest labs with all the smart people that built all the stuff vertically, top to bottom, and they could do incredible product and research and other work that nobody with a five person team or a startup or something could afford to do. And here we're not even talking about the compute, we're just talking about the talent. Well, the reason for that is just all the complexity. It only worked for example, at Google because you could proverbially walk down the hallway, tap on somebody's shoulder and say, hey, your stuff doesn't work. How do I get it to work? I'm off the happy path. Like how do I get this new thing to work? And they'll say, I'll hack it for you. Well, that doesn't work at scale. We need things that compose, right, that are simple, that you can understand, that you can cross boundaries. And you know, so much of AI is a team sport, right. And we want it so more people can participate and grow and learn. And if you do that, then I think you get again more product innovation and less just like gigantic AI will solve all problem kind of solutions and more fine grained, purpose built product. Integrated AI.
[35:00]
Swix
Yeah, I think one way of phrasing what you're doing is, you know, you have this line about it's, it's you want AI to accelerate and that's contrarian because it's already fast and people are already uncomfortable. But I think you're accelerating distribution. I have mixed feelings about the words democratizing, but that's really what you're doing.
[35:19]
Chris Lattner
Well, it used to be democratizing AI used to be the cool thing back in 2017.
[35:23]
Swix
I know, it's not cool anymore.
[35:25]
Chris Lattner
Yeah, I know. But it used to be the cool thing. And what it meant and what it came to mean is democratizing model trading. Right. And it's super interesting. Again, as veterans Back in 2017, AI was about the research because nobody knew both what the product applications were, but we didn't know how to train models. And so things like Pytorch came on the scene. And I think Pytorch gets all credit for democratizing model training. It's taught to pretty much every computer science student that graduates. That's a huge deal. But nobody democratized inference. Inference always remained a black art. And so this is why we have things like VLM and sglang, because they're the black box that you can just hopefully build on top of and not have to know how any of that scary stuff works. It's because we haven't taught the industry how to do this stuff. And you know, it'll take time, but I think that that's not an inherent problem. I think it's just that we don't have like a Pytorch for inference or something, something like that. And so as we start making this stuff easier and breaking it down, we can get a lot more diffusion of good ideas.
[36:24]
Swix
I want to double click a little bit more on some technical things, just, just to sidetrack on, you know, VLM is an open source project led by academics and all that. And I think a lot of the other inference team, effectively every team is a startup, like the fireworks together. You know, all those guys, your business model is very different from them and I want to spend a little bit of time on that.
[36:45]
Chris Lattner
Happy to talk about it.
[36:46]
Swix
You intentionally. Well, you believe in open source, but you know, it's not, it's not just that you just choose not to make money from a lot of the normal sort of hosted cloud offerings that everyone else does.
[36:56]
Chris Lattner
Yeah, there's a philosophical reason and differentiation reason. There's a whole bunch of reasons.
[37:00]
Swix
Yeah. So maybe remind people of what that is. Like basically, how do they pay you money and what they get for that and why you then why you pick that?
[37:08]
Chris Lattner
So again, we're doing this on hard mode. Right. We took a long path to product. We're not just write a few Cuda kernels and have some alpha and go buy a GPU reserve thing and resell our GPUs like that path is been picked by many companies and they're really good at it. So that's not a contribution that I'm very good at. And I'm not going to go build a data center for you. There's people that are way, way better than that. Okay. And so all the best luck I.
[37:33]
Swix
Want really crusoe like walking the Stargate grounds.
[37:37]
Chris Lattner
I want those people to use Max. I'm pretty good at this, I'm pretty good at this software thing. And so you can handle all the computer. Moreover, if you get out of startups, you have a lot of people that are struggling with GPUs in cloud. Right. And so GPUs in cloud fundamentally are a different thing than CPUs in cloud. And a lot of people walk up to it and say it's all just cloud. Right. But let me convince you that's not true. And so first of all, CPUs in cloud, why was that awesome? Well, all the workloads were stateless. They all could Horizontally auto scale. CPUs are comparatively cheap and so you get elasticity. That's really cool.
[38:17]
Swix
You can load them pretty quickly. It's not like gigabytes of weights.
[38:20]
Chris Lattner
Yeah. And so it turns out what business do you know knows what they're doing in two and a half years? Nobody. Nobody. Right. And so cloud for CPUs is incredibly valuable because you don't have to capacity plan that far out. Right. Now you fast forward to GPUs. Well, now you have to get a three year commitment, three year commit on a piece of hardware that Jensen is going to make obsolete in a year. Right. And so now you get this thing and so you make some big commit and what do you do with it? Well, you have to over commit because you don't know what your needs are going to be and you're not ready to do this. Also all the tech is super complicated and scary. Also GPU workloads are stateful. And so you talk about the fancy agentic stuff and you all know all this stuff. Yeah, it's stateful. And so now you don't get horizontal auto scaling, you don't get stateless elasticity and so you get a huge management problem. And so what we think we can do and we can help with people is say, okay, well let's give you power over your compute. And a lot of people have different systems and there's very simple systems that go into this, but you can get like 5x performance tco benefit by doing intelligent routing that's actually a big deal for a platform team. They don't like to have to deal with this. They want hardware optionality to get to AMD and they want this kind of power and technology. And so we're very happy to work with those folks. The way I explain it in, in a, in a simple way is that a lot of the endpoint companies and there's a lot of them out there and so you can't make. They're not all one thing. They obviously have pros and cons of trade offs but, but generally the value prop of an endpoint is to say, look, AI and AI software and applications and workloads, it's all a hot mess. It's too complicated. Don't worry your little head about it. I'll take AI off your plate so that you don't have to worry about it. We'll take care of all the complexity for you and it'll be easy. Just talk to our endpoint. Our approach is to say, okay, well guess what, it's all a hot mess. Yes, 100% like, it's horrible. It's worse than you probably even know and get to. Tomorrow it's going to be even worse because everything keeps changing. We'll make it easy, we'll give you power, we'll give you superpowers for your enterprise and your team. Because every CEO that I talk to not only wants to have AI in their products, they want their team to upskill in AI. And so we don't take AI away from the enterprise, we give power over it back to their team and allow them to both have an easy experience to get started. Because a lot of people do want to run standard commodity models and you do want stuff to just work as table stakes. But then when they say, hey, I want to actually fine tune it. Well, I don't want to give my proprietary data to some other startup or even some big startups out there. And you know whose these are, that's my proprietary ip. And then you get to people who say, hey, I have a fancy data model. I actually have data scientists, I actually have a few GPUs. I'm going to train my model. Cool, that got democratized. Now how do I deploy it? Right, well again you get back into hacking. The internals of VLM and PyTorch isn't really designed for KVCache optimizations and all the modern transformer features and things like this. And so suddenly you fall off this complexity cliff if you care about it being good. And so we say, okay, well, yeah, this is another step in complexity, but you can own this and you can scale this and so we can help you with that. So it's a different trade off in the space, but I will admit that their time to market and revenue growth and stuff like that has been much faster because they didn't have to build an entire replacement for Cuda to get there.
[41:38]
Swix
Nice.
[41:38]
Alessio
And when it comes to charging, people are buying this as a platform. It's not tied to like Token Inferred, anything like that.
[41:46]
Chris Lattner
We have two things going on, so the MAX Framework and the Mojo language. Free to use on Nvidia and CPUs. Any scale, go nuts, do whatever you want. Please send patches. Like free, right? Why is this? Well, it turns out Cuda's already free. Nvidia is already dominant in here. We want the technology to go far and wide, use it for free. Like, it would be great if you send us patches, but you don't even have to do that, right? We do ask you to allow us to use your logo on our webpage and so send us an email and say, hey, you're using it and you're scanning 10,000 GPUs. That'd be awesome. But. But that's the only requirement. If you want cluster management and you want enterprise support and you want things like this, then you can pay on a per GPU basis and you can contact our sales team and then we can work out a deal and we can work with you directly. And so that's, that's how we break this down. And also, let me say, like, one thing I would love to see again, it's still early days, but I would love to see Pytorch adopt Max. I'd love to see VLM adopt Max. I'd love to see Sglang adopt a Max. We have our own little serving thing, but go look at it. It's really simple. It would be amazing. And again, we're in the phase now where I do want people to actually use our stuff. And so we have historically been in the mode of like, no, our stuff's closed, stay away. But we're phase shifting right now and so you'll see much more of this being announced.
[43:00]
Swix
Yeah, I don't know how to make this happen, but I think you win when Mistral meta Deepseek and Quinn adopts you and ship you natively.
[43:13]
Chris Lattner
Right.
[43:14]
Swix
How does that happen?
[43:16]
Chris Lattner
I don't want to talk that far in the future, but we may have.
[43:19]
Swix
I don't think it's that far.
[43:20]
Chris Lattner
We may have an Industry leading, state of the art model launching first on MAX soon.
[43:24]
Swix
I'll stay tuned for that. But it also let us know.
[43:26]
Chris Lattner
Yeah, so. But that hasn't happened, so assume it doesn't happen.
[43:28]
Swix
I think that's when it really tips. Because then everyone's like, okay, is it good enough for them? It's good enough for us. Right? And then get the rest of the industry.
[43:35]
Chris Lattner
Yeah, but again, I mean, I'm in it for a long game, right? And I realize again, the stuff I work on, it takes time for people to process. And so what we need to do and what I want us to do and what I ask the team to do is keep making things better and better and better and better and better. And there's like an S curve of technology adoption. And so I think it's great that there's a small number of crazy early adopters that were using our stuff in February before it was open source. And it made no rational sense, it barely worked, but it was amazing and I'm very thankful for those people. And then of course, we open source it and we start teaching people and you get a much bigger adoption curve. You make it free, go adopt and go. And then as you say, there's more validation that will be coming soon. And each of these things is the knee in the curve. But what it also does is it gives us the ability to fix bugs and improve things and add more features and roll out new capabilities.
[44:26]
Swix
How does this feel, rolling this out as compared to Swift?
[44:30]
Chris Lattner
Oh, well, so let me reinterpret your question. Given you've done a few interesting things in the past, what have you learned and what are you not doing? Again?
[44:40]
Swix
Yeah, that actually is a better question than the one I asked because Swift is too narrow.
[44:45]
Chris Lattner
Almost the character of Swift. So just because I assume most people don't know about this characteristic, was I started as a nights and Weekends project in 2010, hacked on it alone nights and weekends for a year and a half, eventually told management at Apple about it. Their heads exploded. Like, why do we need something? Objective C is good enough. Why do you need this? Got approval to have a couple more people get involved on a fellowship or.
[45:09]
Swix
An internship at Apple at the time?
[45:11]
Chris Lattner
No, I was leading the developer tools. So yeah, I was leading a.
[45:15]
Swix
Doing something. Yeah, okay.
[45:16]
Chris Lattner
Yeah, yeah, no, I was leading a huge team. And I. Let's just say this was not my day job. But so started in 2010, it launched publicly by Apple in 2014. Okay. And by the time it launched in 2014, only about 250 people in the world knew about it, most of whom were in my team, about 200 and something of them were in my team. And then it was senior execs marketing, Tim Cook, like, et cetera, right? And this was the category of people that knew about Swift. So we had built it in secret, literally an NDA within Apple to know about it. Right. When we launched it, part of the requirement was that you had to be able to submit apps to the App Store in Swift, right? That was requirement put on me. And so it's like, cool, that sounds great. And so we launched it and said it's a 1.0. So you're launching a 1.0 brand new programming language. Nobody has ever seen it before. No internal user, like one demo app. It was a frickin nightmare. So it was a nightmare for the community because I mean, fortunately a lot of people were excited and wanted to adopt it and a lot of people did adopt it right away, but it was not battle hardened, had tons of bugs. We should have launched it as a 0.5, right? And so it took another year for it to become pretty good and then two years for it to become quite good, in my opinion. Also, none of the software engineers at Apple knew about it and so they, their heads exploded and they said, wait a second, why are you replacing Objective C? I joined Apple because I love Objective C. Why didn't you ask me my opinions about the new programming language, right? And so there's that whole dynamic, oh.
[46:44]
Swix
Was there a company mandate that they had to write Swift from now on?
[46:47]
Chris Lattner
No, but still it's like, wait a second, this isn't the company I thought I joined, right? And stuff like this, right? And so there was this huge amount of turmoil and drama and nonsense that came out of that. And so, okay, fast forward to Mojo, lessons learned. Hey, one don't have a hot start. And so we launched Mojo a long time ago, before it even made sense and we called it a 0.1. And so that. How's that honesty in advertisement, right? Like this is 0.1, please don't use it, but if you're interested, we'd love your feedback, right? And so soft start, go. Second thing that's very different is that in Swift we had one demo app and so you have, you know, a very, I think, high powered team building a language that had done lots of credible stuff, but was building a language for iOS developers. And the compiler is written in C. And so yeah, there's sympathy for the user, but not a lot of understanding and not A lot of learning internally when we launch. In the case of Mojo, guess what Modular is Mojo's first customer. We have more Mojo code in our repository than any other language. Right.
[47:53]
Swix
That's awesome.
[47:53]
Chris Lattner
And it's open source and we Open source Like 650,000 lines of mojo code. Right. This is, this is a lot. And so we suffer and we drive the features and the improvements based on our needs. We also appreciate the community and we have a whole bunch of contributions coming in and somebody just optimized my string to get rid of a bit out of my string implementation, which was suboptimal and so that was super awesome. But driving it that way makes sure it's real, it's grounded on the use case. We're trying not to over promise even when you're asking me what it's useful for earlier. Right. I didn't say it's a replacement for Python. I said it's go fast language. Someday it may be a pretty credible Python alternative, but for right now it's good at a specific class of use cases and if you're interested in those use cases, like making GPUs go, brr, Mojo's awesome. But if you want a replacement for Rust end to end, give us six months.
[48:47]
Swix
Yeah, I mean you're a force in nature. I think there's a lot of mystery around what is going on with Apple's AI initiatives. And I think the consumers suffer at the end of the day. The end users are waiting for this.
[49:01]
Chris Lattner
And it's unfortunately, anything I know is massively out of date. They've changed and reorged and grown and culture and it's a very successful company and so I think that they probably feel success and they're having trouble adapting to changes in the industry and that's pretty typical of a lot of big companies. And so I can't speak to the specific, specific causes.
[49:19]
Swix
Speaking of like Google, obviously, I think they were, they were one of the earliest. What have we been talking about? The invented transformers, A lot of other things.
[49:27]
Chris Lattner
TensorFlow, remember that?
[49:28]
Swix
Yeah, exactly.
[49:29]
Chris Lattner
Huge.
[49:29]
Swix
Yeah. NTPUS.
[49:31]
Chris Lattner
I credit Google with making AI open source. Well, they did not have to open source TensorFlow. Yeah, that was an incredible decision. Full kudo to Jeff Dean and like many of the other people that were involved in this because they said, you know, what's actually the most important thing for Google is for AI to go faster. How do we do that? We open source TensorFlow rather than making it some proprietary internal thing which they had a previous system called Disbelief. And so that is a huge moment that set the stage for Pytorch to be open source and for the research to be open and for all of these things because they decided the value system was AI go faster. The transformer paper being published, so many contributions from Google came from that. Okay, I don't think Google gets enough credit for that.
[50:15]
Swix
Yeah, like why is it better for Google for AI to be open source rather than, you know, Google owns it?
[50:20]
Chris Lattner
Uh, well, so I can't tell you if like the bet worked, but I can tell you that that was a bet. But from my outsider now perspective because I haven't been at Google for over five years, somehow time flies. The bet makes sense when you have an amazing team of researchers and SWE that can go incorporate this into your products. And so Google does have billions of users. It has all the product services, has all the different applications and it has an incredible density of talent. And I think that Google's recent announcements. So just after Google I o and things like this.
[50:52]
Swix
Yeah, we were both there.
[50:53]
Chris Lattner
Yeah, yeah. It's like Google's actually working. I think it's pretty impressive. And for a while they were dealing with organizational drama and Google Brain versus DeepMind and some of this stuff. And I can't speak to what they've done, but seems like they're a much more unified team. They're executing well, they're getting research into product and so feels like a different Google to me.
[51:13]
Swix
Yeah, it totally does. It used to be that there was just two of everything in Google and you didn't know which one to use and like killed by Google, they all deprecated in a year. So like yeah, I think they've, they've gotten the memo, whatever.
[51:24]
Chris Lattner
And the other thing that's super impressive to me about them and this me fanboy and Google. Right? Sure the f a railing it's the trillion dollar companies but the, the things that they announced they're actually shipping so.
[51:36]
Swix
Much in AI is this is more Apple shade.
[51:40]
Chris Lattner
I was specifically saying Apple this is like very common in AI is like here's and modular's done this in the past too. This is why. So I gave this very deep tech talk. I sent you a link to the GPU mode talk and the slide two.
[51:52]
Swix
Was warning you can actually use this.
[51:54]
Chris Lattner
This is not vaporware. Everything here you can replace, produce these claims you can download. This is actually real like here's links to the source code.
[52:01]
Swix
I was wondering why you stressed that so much. I'm like who, who hurt you? You know like I know I mean.
[52:06]
Chris Lattner
There'S so many claims, nobody knows what is real anymore. Right. And I mean there's literally, literally been product demos where, you know, it's like some electric semi rolled downhill instead of working under its own power and nobody knows what's real.
[52:19]
Alessio
I knew they started to work. There's like a WhatsApp chat with like playing soccer at the Go Google field during the week. And about six months ago the admin posted, it's like, hey, not enough people are showing up anymore to play soccer at lunch. What is going on? And I think that's when people started.
[52:33]
Swix
That's your Google. Yeah, there you go. I mean Sergey Brin was at I.O. and he's like working again and it's awesome.
[52:41]
Chris Lattner
Yeah. So I have mad respect for that. Right. And so my values are aligned with people who ship stuff because that's what impacts the world.
[52:47]
Alessio
Let's talk about open source a little more. There's the more recent maybe open source thing, which is Deep SEQ obviously. And I think specifically in your case, you know, they worked at the PTX layer of the GPU which is like even lower and more proprietary than cuda. I'm curious how both in terms of like, obviously the impact was like huge, but maybe impact on how much people should actually try and move away from this proprietary thing. Because now the next from, from my understanding is like the next set of chips, all the code is like useless.
[53:18]
Chris Lattner
Well, so it's widely known. But Blackwell is not compatible with Hopper. Hopper kernels don't always run on Blackwell for example. Right, but so your question is like what does it mean for the industry.
[53:28]
Alessio
Or what it's like why is it so important? Like why the Deep seq, the Deep SEQ example is so important of like they need to navigate all this like proprietary stuff just to make it work.
[53:37]
Chris Lattner
So I'll give you my lived experience because Deep SEQ came out in December, which is when I. And probably you noticed it. Right. But then the world had a big wake up call and Nvidia stock price went down and all that stuff like a month later. Right. So here's my explanation of what happened. Okay. What the Deep Seat team did was really impressive research. They pushed MLA forward like they, which is a form of attention and they pushed low precision training forward. They pushed a whole bunch of stuff forward. They reverse engineered some PTX instructions that weren't well known at the time. Right. And so a lot of people were just like. And it was a Chinese team, right. Which put Americanism. Threatened Americanism. Right. And Things like this. And so what I found really exciting about it was they pushed the research forward and they did this incredible things. They showed the world that it was possible and they opened it and they published it and they actually taught the world about it because I don't know why they chose to do that, but it's because they believe in openness and AI moving forward. Right. And so the thing that I found striking is that the world's reaction to that was more striking to me than.
[54:40]
Swix
The actual models or whatever. I don't know.
[54:42]
Chris Lattner
Yeah. Because so first of all there's the Chinese American drama, which geopolitics is not exactly my strong point. So I will, I, I get it that I push that aside. Right. But the other thing I found really interesting is that people said, wow, okay, only Deep SEQ is able to go down to the PTX level. But that is standard. That is what all of the leading teams do. In the case of modular, we go literally like we only work at that level because we get rid of all of Cuda Right. And so we've replaced the entire stack and so we only do that. But that's, but a lot of teams that care about performance will actually go down and use the PTX Tensor core instruction foo, which isn't really documented. Like you kind of have to figure it out and look at cutlass code. Like it's, it's really Nvidia doesn't make it as easy as they should to do this. Wonder why? Well, I think it's because they're breaking in Blackwell and so they didn't want people to actually use it and so they had their own issues. Right. But good luck with that. There's a lot of smart people out in the world. And so to me I thought it was really interesting because there wasn't a lot of awareness of how that level of the stack worked. For me I thought it was a great wake up call where people like, oh wow, if you work at this level of the stack, you know, level of stack I have inhabited for decades now, you have power over compute, you can understand and solve problems. You can drive research forward in ways that you nobody else can do. And this is what the trillion dollar companies do. It's not just Deep Seq, but I think Deep Seq was a huge wake up call and it drew attention to that layer of the stack because it wasn't just like throwing layers on top of Pytorch or VLM or it was like doing that fundamental work and I thought that was incredible. Now the challenge with it and the challenge with the way Deepseek did it and the way that everybody else does all this work is that it's completely specific to one gpu. It's not just that you're working at the PTX level, it's that you're writing code that really can only work on that one gpu. And that means that when Blackwell comes out, you have to throw it away and write a new one. Right. But here's the open secret. That's what VLM is. Go look at VLM. They have different kernels for a 100 than H100. They're now trying to catch up with Blackwell. And so state of the art for these systems since Genai. Before that there were fancy AI compilers like XLA and that kind of stuff in the trad AI world provides some scalability, but in Genai it's a rewrite. All the things when a new piece of hardware comes out. And so this is what Mojo is solving. Right. And this is where again, we can't turn the incremental cost of a new piece of hardware to zero, we can massively reduce it. And so this is really exciting time, I think, for us that we've demonstrated now, but also what it means for the future.
[57:05]
Swix
I still also wonder their hiring or internal training that they managed to have a small team that does this in the same way that you do.
[57:14]
Chris Lattner
I think they have a pretty significant team. I don't have no insider information, of course, but it's not like a five person team. It's like hundreds of people. Okay. Yeah. So it's.
[57:22]
Swix
Yeah. You know, it's not a thousand.
[57:24]
Chris Lattner
Right.
[57:24]
Swix
It's amazing. Especially like, I mean, presumably there's some language barrier, but even ignoring that, just getting that amount of talent density in, in one company is not a. Yeah, well, no, significant.
[57:35]
Chris Lattner
I agree with that. Building a company is hard. I have no visibility in how they build Deepseek, but all I can say is thank you to Deepseek for. For publishing their work.
[57:42]
Swix
Yeah.
[57:42]
Chris Lattner
Because they didn't have to do that. And I think it left temporarily a lot of people flat footed and it was kind of embarrassing for certain groups. And I think a lot of people paused and were like, oh crap, what do I do about this? But I think that what it did is it pulled forward progress in AI by like six months.
[57:58]
Swix
Yeah. They didn't just do GPU level stuff, they also like had a file system.
[58:02]
Chris Lattner
Oh yeah.
[58:02]
Swix
If you've looked at like. Do you see, Obviously that's not modular as Bread and butter. But like, do you see potential there for someone else to, I don't know, like take that and run with it.
[58:13]
Chris Lattner
I've been very obsessed with the inference problem for a couple of years and so I don't know the best way to solve that problem for training.
[58:19]
Swix
Google internally has a great.
[58:21]
Chris Lattner
Indeed.
[58:22]
Swix
Right. And so like, where is that for the rest of the world?
[58:25]
Chris Lattner
Yeah. I'm not honestly just not the right person to answer that question because I have my own obsessions.
[58:30]
Swix
Yeah. Well, talking about inference reasoning models in inference, time compute, very big topic. Does anything change or does nothing change because it's just more inference?
[58:39]
Chris Lattner
It depends on change from what. Right. So we. So when we started modular over three years ago, we made the like ridiculously weird bet at the time to focus on inference instead of training.
[58:51]
Swix
Yeah.
[58:51]
Chris Lattner
And again at the time, I'm used to this. People are like, what's wrong with you? Everybody obviously knows that training is the thing and training, training, training and people building these massive clusters and all the spend is on training, et cetera, et cetera, et cetera. And I said, well, yeah, I understand why you see that. But I've lived this at Google. Google's like five years ahead of the rest of the industry in many ways.
[59:10]
Swix
Inference, inference.
[59:11]
Alessio
Yeah, yeah.
[59:12]
Chris Lattner
Training scales the size of your research team. Inference scales the size of your customer base. Right. And so the thing that happens between those is research gets into production and so the gap is research gain, production. Once you do that, suddenly it scales like crazy. So I didn't plan for Jenny, I did not plan for inference, time compute and things like this. But that bet on the production use case because it scales with the number of applications of AI, not just the hot research team. Right. Which you know, is pretty important. It's just not something I've focused on for the last few years that was controversial. And so I think now the whole world's flipped and I think the world gets it right. But, but that was really because of lived experience. And so when you come to these new techniques. Right. Another controversial thing going back to the CPU thing is why are you starting with CPUs? The answer at the time was pre processing, post processing, full system integration, networking. You need a CPU to feed the gpu. Very standard things. But now you say KV caches like your eviction policy runs on a cpu, that radix hashing algorithm and block hashing and all that stuff happens primarily cpu. That's really important for performance because if you, you have latency in these steps, you're not Keeping your GPU utilized. Again, this comes back to the rewrite and rust and things like this, all the agentic stuff and things like this. I didn't predict that, but I'm not surprised. And I think we're going to see more. And so tight integration, optimization across boundaries, these are things I believe in and I think this is how you move the world forward.
[60:38]
Swix
It's amazing how basically all the smart programmers I know always focus on where the bottleneck is and it's always, it always leads you to the right answer. Like if you just are very clear eyed about that and I don't know, it's nice to see that happening. The other talking point I'll mention for you, because you didn't bring it up, but it's something that other people are talking about. Is that actually now because of the requirement for train of thought and reasoning models, inference is now part of training. Okay, yeah, because you need to. Inference.
[61:09]
Chris Lattner
RL has always been to rl.
[61:10]
Swix
Yeah, yeah, yeah. And so like it's getting. That actually is a plus in favor of what you're doing anyway.
[61:16]
Chris Lattner
Yeah, absolutely.
[61:16]
Swix
Because it's the same code.
[61:19]
Chris Lattner
So my experience with the RL systems were like at scale, DeepMind style, AlphaGo and things like this. So it's been a couple of years ago, but setting those things up was incredibly complex because now you're dealing with cluster scale orchestration, you're batching across all these agents and again, none of the systems are set up for that. Right. You've got Pytorch if you want to train a model, but nothing is set up to do this. And so again, I can't speak to all RL systems everywhere but they ended up being like duct tape and bailing wire and super crazy stuff. And it was incredible. Again, it's incredible what a team of experts who knows the full stack and what they can achieve, but it shouldn't be that hard. So Maja's not focused on solving that problem yet. Maybe, maybe we'll get there.
[62:03]
Swix
Building blocks.
[62:04]
Chris Lattner
We have the building blocks. And again, we're not focused on solving training yet. Maybe we'll get there. I have some ideas on logical steps to do that, but I want to make sure we're grounded and we solve things all the way end to end. We make people happy. People talk about our stuff from their voice. It's not just me talking about our stuff. And so we're in that phase where you'll hear people talking about our stuff soon.
[62:24]
Swix
I think people already are, but they will even more. And I think I appreciate you coming on the podcast to talk about that more. We want to turn to some personal stuff.
[62:32]
Alessio
So I think that was a great modular story. I think now on the personal side, there's a couple things. So when you first joined us, I think I saw it was September 2023, so there was a year and a half into building the company, something like that. Now, three years and a quarter. And what are like personal learnings, both from obviously being a leader in a company, having a growing team that you're managing, having a lot of responsibilities that are not technical anymore, kind of run us through some of that.
[62:58]
Chris Lattner
Yeah. So for me, I've built large teams from scratch before, but they've all been at established companies. I've worked at a startup before, but it was somebody else's startup. And so this is the first startup that I've founded and then built a significant team and built a product that takes years to build. So a lot of the lessons I learned were super valuable and allowed me to achieve some of the stuff. But it's also very different. And so one of the things that's very different is it's very personal. And so I've lost people from our team before. Like, many times I've had to fire people and people have quit and gone Apple to Google or whatever. Right. But that was never personal in the same way. It is a modular. Right. And so this is something where the intellectual side of my brain knows. Like, if somebody leaves, it makes sense they had a life change. Like, I don't want to get in the way of their family or whatever. Right. I mean, I intellectually know that. But on the other hand, it hurts a little bit. And so I think I'm getting better at handling some of that kind of stuff. The other thing is that when you're growing a team from like 0 to 100 people at a big company, you have all of the infrastructure, and the infrastructure is mature. Right. And so you're. Even if you grow to a team of 100 people, you're still tiny, you know, compared to an Apple or a Google or company like this. Like, you're still tiny in proportion to the size of the overall scale. And so they've already got all the recruiting and all the other stuff and all the legal and finance and all that kind of stuff going. They've got the manager training, they got all this stuff going. And so in a startup, you sometimes get in some hot messes where it's like, okay, well, I need to do reorganization and things like this. And so that's been good learnings. I think we've scaled into that. Well, and another thing coming back to the people telling you it's impossible, actually here's probably the most important thing is that I'm used to being told that things are impossible and then I'm pretty bullheaded and I have a formula and so like there's a path to success that I can explain if you want. But the I'm used to the feeling when it's, you know, that year one and it's just a completely skunk works, let's like prove a thing phase. I'm used to that. Okay, cool. Now we're telling people about it, but it's still not good enough. And people tell you all your stuff sucks. I'm used to that. Like, okay, well you get into that window where all this stuff is almost there. People are sweating, it's really hard. There's like all these like seemingly impossible things. It's a significant team, but the pieces don't line up yet. And so we went through some of that last fall and people are like, oh my gosh, maybe it will never work. And you get this anxiety. And I think the thing that I didn't appreciate going through that and this is also why I'm so excited to be in this phase that actually impacts other people's thought processes because they haven't been through it and they like, trust me, we'll get there. Well, when exactly what you get these super analytical engineers who want to understand everything and they're really expert in their part of the stack and they don't really have that established trust with the other department and the other department, the other department are all lining up. And so definitely some warnings from that. But this is where again you get out of that R and D phase and you get into the execution phase and it's like, okay, well engineers are really good at taking an imperfect thing that works end to end and making it better and better and better and better and better. And so it just like feels fundamentally different modular now than it did in any of those previous phases.
[66:04]
Alessio
Yeah. What's your day to day like? You have a lot of meetings? You have just a few.
[66:09]
Chris Lattner
Yeah, I, I have a lot of. You're talking about my, my, my lived life. A normal weekday for me is I wake up about 7:15, my wife and I get the kids out the door and she drives them to school at about 8. I usually do a half hour to 45 minute walk with my dogs, get exercise strenuous up and down a hill. Heart rate goes up, which is good. Listen podcasts, for example, yours. So that's, you know, why, why I have time to actually follow exciting, cool things that other people are doing. Get to work at 9. And so work 9 to 6 or 7 or something like that. And most of that is meetings. And so that's me trying to solve whatever the problems are of the day, get home, dinner with the kids. I insist on eating with my family, hang out with them until bedtime, and then crush through for like two, three more hours until I pass out. So. And do that regularly. Second work day, and then on the weekend, it's amazing because I get a lot of time to work. Not all day because I do things with kids and stuff, but I get a lot of time and there's no meetings. So that's amazing.
[67:07]
Swix
That's when stuff actually gets done.
[67:08]
Chris Lattner
Yeah, exactly.
[67:09]
Swix
I think the. The key here is some kind of strategic review time that you lock off. Because, like, I think people often say, you know, there's. There's times when you're working in the business and then there's times you're working on the business where you step out for a bit. Often for founders, it's when you do a board meeting. But I wonder if there's anything that's very meaningful for you. Do you have a coach? Do you have something like that?
[67:33]
Chris Lattner
Yeah. So I guess there's two people I really owe a lot to or two categories. So one is my co founder, Tim. So Tim and I formed the company together. We walk every week, every Friday, catch up, and it's like that, zoom out and try not to make it tactical and so make sure that we can bounce crazy ideas off. And I have a lot of crazy ideas. Believe it or not, he does too. But then ground ourselves on execution and go. The other thing is this combination between my wife, who's a sounding board, and the executive team and I'll admit, get a little bit crazy and want to solve the industry problem. And the exec team pulls it back to, okay, next quarter. Let's make sure we have a plan. Let's make sure we can communicate. Let's decide what the actual priorities are. Okay. We can have three priorities max for the whole team, not. Not 50 or something.
[68:20]
Swix
Non priorities.
[68:21]
Chris Lattner
Yeah, exactly. Everything can't be the top priority otherwise. Nothing else. Right. And things like this. And then my wife, who keeps me sane and is, you know, an amazing life life coach.
[68:30]
Alessio
How much do you get your wife involved on, like, the actual work? Like, does she Know, like, you don't.
[68:35]
Chris Lattner
Yeah, yeah. She has her own thing going on. My wife runs the LVM foundation, and so she's got a bunch of things going on, plus kids and everything else, too.
[68:42]
Alessio
She's like, I don't want to hear about these kernels.
[68:45]
Chris Lattner
She's great for helping me. So I'm more IQ than eq. And so working with humans is not. It's an acquired skill, not a natural skill. And so I think this is something where once, you know, I often end up at this place where some weird thing is happening. What the heck is going on? It's like, Chris, it's obvious, like they're saying this, but this is what they actually mean. I'm like, oh, I never thought about that.
[69:08]
Swix
You mentioned coding agents.
[69:09]
Chris Lattner
Yeah.
[69:10]
Swix
What do you guys use internally? What do you like?
[69:12]
Chris Lattner
Yeah. So, I mean, as of this recording, I personally use cursor. And so cursor is great for. I mean, it's the best thing I've seen. And I don't spend a lot of time dabbling with things, and so. But cursor is. So I write a lot of C and Mojo code. The key thing for working on Mojo code in AI coding tools is make sure you have a lot of code in the context window.
[69:31]
Swix
Yeah.
[69:32]
Chris Lattner
And so cursor and these tools can index really well.
[69:34]
Swix
I was thinking you open source Mojo code. Actually, that's pretty good.
[69:36]
Chris Lattner
That's one of the reasons we did this. Yeah, one of the many reasons we did this. And so this is what we saw at our hackathon is people could go zero to hero with Mojo because you could just put. You just index this entire huge code base and it's phenomenal. And then learning a new language is actually easy when the AI is doing a lot of the mechanical stuff for you. And also it looks like Python, so you could read it, but it just, like massively scales the on ramp. And so for new language adoption, this is AI is. Let me convince you. AI is cool.
[70:02]
Swix
I would just go ahead and, like, actually ask the data sets people at the big labs what they need, and.
[70:07]
Chris Lattner
Then just like, I've asked them all.
[70:08]
Swix
Feed it to them.
[70:09]
Chris Lattner
Yeah, just take the code. Please label it for them. It's Apache too. Like, just go. Yeah, yeah. We're adding markdown files. There's a cloud md, and so we're doing some of the basic stuff. People within the company are dabbling with Claude code and some of the stuff. And so I don't have personal experience with that, but for me, I Found that it's mostly, it is very useful, but it's really about boilerplate. And so it's not about inventing new algorithms and stuff like this. And it's probably because I'm not building a REACT component.
[70:33]
Swix
Yeah. So the labs are touting that they are training or benchmarking their models for writing CUDA kernels.
[70:42]
Chris Lattner
Right.
[70:43]
Swix
I wonder if you see a noticeable performance improvement when you change model to model. I don't know if you. You probably don't benchmark them.
[70:52]
Chris Lattner
I haven't, I haven't looked at that specific use case. But people often ask me, hey, Chris, why are you building a new programming language when AI is going to write all the code? Similarly, if you look at a lot of this, like, let's generate a Cuda kernel, you start to ask and wonder, or at least I do a lot, what is the purpose of code? What I've reflected on the way I currently think about it, subject to change, obviously, because we're all learning. I take a step back and I say, well, code isn't really about telling a computer what to do. Code is about humans being able to understand what code does. And so someday when we get AGI or ASI or something like this, maybe it can be completely opaque. And I really don't have to know. We're not there yet, so. And I don't know when that will happen. But in the meantime, like, I want to be able to look at what the code actually does. And I live in a world of constraints. I need to know I have a product which has all these features. If I go add another feature, what happens is it's going to hit my latency budget, Is it going to crash, run out of memory? Is it going to do, Is it going to cost too much? Like, I need to be able to reason about this. And so to me, I look at a lot of these coding tools, you know, scaled beyond where they currently are now, but into the foreseeable future is saying, like, okay, well, it's like hiring another engineer onto your code base or into your team. And fundamentally, coding and software engineering is a team sport. Like, you have product managers, you have engineers, you have a lot of things. And if you automate all the engineering of code, maybe you get to product managers only and marketing only or something. Theoretically, these still want to reason about what the code does.
[72:17]
Swix
And so in that, yeah, interpretability argument.
[72:21]
Chris Lattner
And so in that world, like, what is actually the most important thing? The most important thing is you can express everything the hardware can do because you don't want to have some capability or some cost or some, some boundary you can't penetrate. Mojo does that. The second is you want readable code that you can actually understand, right? And so you don't want assembly language or something like that. You want something high level and express, impressive and easy to understand. And so like this is where I think Mojo is really unique. And then the AI coding tools I see is a straight value add for adoption. Because, I mean, I've already seen what people that have never touched any of this stuff can do and it's just incredible. You put somebody that's intelligent, they know the use case, and now you put these tools in their hands and they can do amazing things already. And so I find it super empowering. And again, come back to me wanting to see humans being empowered with new technology and being able to upskill and be able to learn. You know, you're not a GPU programmer today, but you know, tomorrow, hopefully there'll be 10 times as many people programming GPUs. I think that'll make the world a better place. And so I don't think we're going back to the world of CPUs. I think GPUs will only get more important. And if we can help people do that, I think it's great.
[73:25]
Alessio
You mentioned some of the research on inference and the Archive papers. How do you keep up with arXiv?
[73:31]
Chris Lattner
Yeah, well, we're a Slack shop and so we have a paper channel. It's, I think, I think there's a, I don't know, somewhere between three to ten papers a day that come through. And so we have amazing smart people that do that. I also follow the Reddit communities and things like this. I'm an old school person that uses RSS still. And so if you have an RSS feed, then it's way easier for me to follow you.
[73:53]
Swix
Which reader?
[73:54]
Chris Lattner
And I use Feedly. Feedly, yeah. But archive has the ability to follow specific groups and so I do that for various groups. And so that's another technique.
[74:05]
Swix
Any notable papers come to mind that you want to give a mention to or authors that you're really watching anytime they publish, you're like, I'm reading that one.
[74:14]
Chris Lattner
I can't give you a well considered answer. Just this morning a person from Microsoft published a paper. Forget his name offhand, but he just published a really cool paper about auto generating flash attentions and doing blocker or something. It had some cute name with block in it. And so anyways, I mean there's a Ton of things going on.
[74:33]
Swix
Okay, so that kind of. Yeah, yeah. You know, it's interesting to just see what you pay attention to.
[74:37]
Alessio
Last but not least, the question that everybody wants the answer to. Have you finished building the LEGO Robotics table for your kids that you mentioned last time? And what's the. Do you have a new project that you're working on?
[74:48]
Chris Lattner
Massively forgot about this. So we still use the LEGO Robotics table.
[74:51]
Alessio
Nice.
[74:51]
Chris Lattner
This is a big 4 by 8 sheet of plywood with a bunch of 2 by threes around the edge. But a 4 by 8 sheet of plywood was pretty hard to work with. And so it breaks apart into 3D composable modular sections. And so it's super great. And the kids are getting way better at programming and Legos and all this kind of stuff. Gosh. What's my most recent project? I was just building swords in the shop with kids on a band saw. And so you take a piece of wood, you have a band saw, give it to a kid, and you say, don't cut your fingers off. Turns out that a band. I mean, I'm obviously joking a little bit. They get a lot of oversight. But a band saw is actually a very safe tool. And so the reason for that is that a band saw, which if you probably a lot of people have never seen a band saw, you can do Google search for. You have two wheels and then you have a blade that goes around the wheels and you've got a table. And the cool thing about a bandsaw is that you can crank down the opening towards the blade. So it's just as big as a piece of wood. And also it pulls the wood into the table. And so because it pulls the wood into the table, there's a risk of like something called kickback. And there's a lot of other things. Like this is very low, and so you can basically tell a kid, look, you see your fingers, keep them away from the blade. And there's no like sun jerking or other things that if you, you know, use the wrong technique, you can get yourself into real trouble. And if you keep the guard all the way down, then you can't get like an arm in there or something like that.
[76:11]
Alessio
But did you make your whole garage woodworking station?
[76:15]
Chris Lattner
Yeah, I'm a cars outside kind of guy. So that's again, my thank you to my wife for tolerating my odd behaviors. I mean, I love building things, right? And so this is fundamentally, when you talk about, you know, what makes me tick is I love the joy of discovery, right? And so whether it's building an amazing team or building a new table, you know, built like dining room table or things like this, you can look at my website. I'm not a very good web designer. I have some woodworking projects on there, but I love building software. I love building and solving the problems that come to this. I'm not super great at building things that are rote mechanical. And so maybe I'd be good at building one chair, but I'm not going to build eight chairs. Go around dining room table. That would just drive me crazy. The discovery and the learning is what you can build.
[76:55]
Swix
The machine that builds the chairs.
[76:56]
Chris Lattner
There you go. That sounds great. Spend 10 years building a thing, three weeks, crank it out.
[77:04]
Swix
Yeah.
[77:04]
Alessio
Any call to actions like, are you hiring?
[77:07]
Chris Lattner
Yeah, we're hiring a small number of elite nerds. And so if you care about GPU programming, you care about AI models, you care about inference, you care about kubernetes and cloud scale stuff, please check us out. We expect to grow a lot more later this year. The other thing is that we have a ton of open source code. And so if you've heard about Mojo, but you looked at it a year ago, guess what, everything's completely different now. And so if you are interested in a lot of these things, if you're interested in learning about GPUs, we have a ton of content that will teach you about GPU programming, GPU puzzles and things like this. People are now picking up Mojo and putting in a lot of like elite GPU came out today. And there's a whole bunch of other people that are taking the stuff and putting it out there. And I think it's just such an exciting time because I think that lots more people should be programming GPUs. I think this is a huge opportunity for the industry. And of course, if you're enterprise and you're having trouble scaling your AI, you know, let us know, we can, we can help.
[78:00]
Swix
Awesome. Thank you so much for coming on. Your inspiration as always.
[78:03]
Chris Lattner
Yeah, well, thank you for having me.