#459 – DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Mon Feb 03 2025

Dylan Patel is the founder of SemiAnalysis, a research & analysis company specializing in semiconductors, GPUs, CPUs, and AI hardware. Nathan Lambert is a research scientist at the Allen Institute for AI (Ai2) and the author of a blog on AI called Inte...

Transcript

Lex Friedman (0:00)

The following is a conversation with Dylan Patel and Nathan Lampert. Dylan runs Semianalysis, a well respected research and analysis company that specializes in semiconductors, GPUs, CPUs, and AI hardware in general. Nathan is a research scientist at the Allen Institute for AI and is the author of the amazing blog on AI called Interconnects. They are both highly respected, read and listened to by the experts, researchers and engineers in the field of AI. And personally I'm just a fan of the two of them. So I used the Deep Seq moment that shook the AI world a bit as an opportunity to sit down with them and lay it all out. From Deep Seq, OpenAI, Google, XAI, Metaanthropic, to Nvidia and TSMC and to US China, Taiwan relations and everything else that is happening at the cutting edge of AI, this conversation is a deep dive into many critical aspects of the AI industry. While it does get super technical, we tried to make sure that it's still accessible to folks outside of the AI field by defining terms, stating important concepts explicitly, specifically spelling out acronyms, and in general always moving across the several layers of abstraction and levels of detail. There is a lot of hype in the media about what AI is and isn't. The purpose of this podcast in part is to cut through the hype, through the bullshit and the low resolution analysis, and to discuss in detail how stuff works and what the implications are. Let me also, if I may comment on the new OpenAI O3 mini reasoning model, the release of which we were anticipating during the conversation, and it did indeed come out right after its capabilities and cost are on par with our expectations. As we stated, OpenAI03 mini is indeed a great model, but it should be stated that Deepseek R1 has similar performance on benchmarks, is still cheaper, and it reveals its chain of thought reasoning which O3 mini does not. It only shows a summary of the reasoning. Plus R1 is open wait and O3 mini is not. By the way, I got a chance to play with O3 mini and anecdotal vibe. Check wise I felt that O3 mini, specifically O3 mini high is better than R1. Still, for me personally I find that Claude Sonnet 3.5 is the best model for programming except for tricky cases where I will use O1 Pro to brainstorm. Either way, many more better AI models will come, including reasoning models both from American and Chinese companies. They will continue to shift the cost curve, but the deep seek moment is indeed real. I think it will still be remembered five years from now as a pivotal event in tech history, due in part to the geopolitical implications, but for other reasons too. As we discuss in detail from many perspectives in this conversation. And now a quick few second mention of each sponsor. Check them out in the description. It's the best way to support this podcast. We got InVideo AI for video generation, GitHub for coding, Shopify for selling stuff online, Netsuite for running your business, and AG1 for staying healthy. Choose wisely my friends. Also, if you want to get in touch with me for whatever reason, go to lexfreammen.com contact and now onto the fall ad reads. No ads in the middle. I try to make this interesting, but if you skip them, please still check out our sponsors. I enjoy their stuff. Maybe you will too. This video is brought to you by a new sponsor, but I've known these folks for a long time and perfect fit for this podcast. They're called InVideo AI. It's a video generating app that allows you to create full length videos using just text prompts. It's intuitive, works amazing. It's truly incredible what you can do. I've been playing quite a bit in using it for stock footage and by the way, they make it super easy for you to switch between actually available stock footage and AI generated footage. I've been preparing a lot for a conversation with Tim Sweeney, who is the creator of Unreal Engine. And there it's 3D worlds and you get to think about the role of AI in generating those 3D worlds. That's what's coming 5, 10, 20 years from now. In video games and simulations, a fundamental part of our lives would be generated with AI. And I think Nvidia AI does a masterful job of pushing us in that direction in the 2D plane of video. Now, I think this is not a tool that replaces human creativity. I think it supercharges human creativity. I think now and for a long, long time to come, humans will be in the loop of creating great art because we're creating for each other and only humans truly, deeply know what makes other humans go ah, like the old Kerouac line. If you want to try out Nvidia AI, you can do so for free at Nvidia IO lexpod, saving time and money on production costs. This episode is brought to you by the thing that's brought me joy for many, many years and created a community for hundreds of thousands, millions, I don't know how many developers. And that place is called GitHub. It is a company that really has supercharged the developer community. I mean where would the world be without GitHub? And they're also as a company pushing the limits of what's possible in terms of AI code generation, AI assisted coding. They were pioneers on Copilot. They are still pioneers in Copilot. It's super competitive space and they are doing their best to win. I will forever be a supporter of GitHub Copilot now it integrates in a bunch of IDEs, not just into VS code. I am of course a VS code guy at this time. I did use Jetbrains for a long time. I still dabble a little bit. For people who don't know Jetbrains has a plethora. Don't like using that word. It seems elitist. There's gotta be a better word. There is a lot of different sort of sub ides inside JetBrains. I've even used DataGrip, which manages the MySQL I should mention and this might be embarrassing, but I have not. Ooh, this might be interesting, but I have not used anything like Copilot on any database management gui's. I wonder if Data Grip integrates Copilot. I'm gonna have to check that out. But everything I use I'm writing SQL queries from scratch inside the database management gui. If I want to do complicated queries, I'll go to any of the LLMs. Probably going to be cloth sonnet 3.5 or if it's part of the code then I'm going to be inside my ide. I just like having a GUI management of a database. I'm going to have to check that out. If Data Grip integrates Copilot, that's going to be incredible. If not, I'm going to yell from the top of my lungs, hoping it will eventually because it'll make my life a bit easier to have the visual component of a database together with a code component of SQL queries. Yeah, it would be amazing. Anyway, go check out GitHub copilot@gh IO copilot this episode is brought to you by Shopify Not Spotify. Shopify. Easily confused. The CEOs are tagged on X often. They're both great CEOs, but this is Shopify. You can sell anywhere with a great looking online store using Shopify. I've been learning a lot about the Silk Road, actually. Not the digital one, the one that for a lot of human history served as a place for merchants to travel and trade goods. And I'm reading a lot about Genghis Khan who enforced the rule of law on the Silk Road. And that actually had a big invigorating effect on the economy of the Eurasian region. Anyway, that was before computers. If they had computers. Imagine. Imagine if they had computers. Boy, would the Genghis Khan force be terrifying. Or maybe not. Maybe each technological age has their own kind of military tactician, their own human that matches perfectly for that time in order to conquer the land and people. Still, what a terrifying time that was. Much of human history, lots of beauty, but lots of ways to die. So I'm glad to be living in the 21st century where I can sit back with a margarita. I don't drink margaritas, but if I wanted to I could and then buy stuff on stores created by Shopify. Anyway, you can sign up for a $1 per month trial period at shopify.com lex go to shopify.com lex to take your business to the next level today. This episode is also brought to you by NetSuite, an all in one business management system. Not sure why I said that so slowly, but I did. I actually did a little intermission for 5, 6 minutes for this episode where I added in the middle of it, an addendum after having tried to openai03 mini. That was such a weird feeling to sort of insert myself in the middle of an episode. I felt like a third wheel to myself. It's like, hey, hey everyone. What are you doing? Why'd you guys not invite me to this party? That's what I felt like, hey, Lex from the past. It's me, Lex from the future, right? I should be talking about NetSuite, which is an all in one cloud business management system. It's the machine inside the machine. And boy are we increasingly building stacks of machines, layers and layers and layers of abstraction until we're just sitting back on a beach somewhere talking to an AI system that's taking care of everything else. Anyway, you can download the CFO's guide to AI and machine learning at netsuite.com lex that's netsuite.com lex this episode is also brought to you by AG1, an all in one daily drink to support better health and peak performance. I drank it today. I enjoyed it today. I've been sleeping very, very little. The amount of work I have to do is insane. And Last night at 6am I went to bed at 7am 8am Thinking about doing an all nighter. It's madness. But anyway, at 6am I drank an AG1 and I was sitting on a couch and I was watching like 10 minutes of American Primeval. I watched like 5, 10 minutes of a show at a time. I was sipping on the AG1 and I was thinking how lucky, how fucking lucky I am to be alive. First of all, because I'm watching the American frontier and people being just brutal to each other. The brutal reality of nature and war during that time and the lawlessness during that time. But also just how lucky I am to be on the spinning rock, enjoying this green, healthy drink, being able to watch a show, being able to work hard towards a thing I love, being able to love, being able to breathe. All of it. Just amazing. Anyway, they'll give you one month supply of fish oil when you sign up@drink ag1.com Lexus this is the Lex Friedman podcast to support it. Please check out our sponsors in the description and now, dear friends, here's Dylan Patel and Nathan Lambert. A lot of people are curious to understand China's Deep Sea Ki models. So let's lay it out. Nathan, can you describe what Deep Seq v3 and Deep Seq r1 are? How they work, how they're trained? Let's look at the big picture and then we'll zoom in on the details.

Summary

Lex Fridman Podcast #459 Summary: DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters

Host: Lex Fridman
Guests: Dylan Patel (Semianalysis) and Nathan Lambert (Allen Institute for AI)
Release Date: February 3, 2025

1. Introduction

In episode #459 of the Lex Fridman Podcast, host Lex Fridman engages in an in-depth conversation with Dylan Patel and Nathan Lambert. Dylan, leading Semianalysis, specializes in semiconductors, GPUs, CPUs, and AI hardware, while Nathan, a research scientist at the Allen Institute for AI, authors the respected AI blog "Interconnects." The discussion delves into the "DeepSeek moment," exploring the ramifications of DeepSeek’s advancements in AI, their open-weight models, and the broader geopolitical landscape involving China, the US, and key industry players like OpenAI and NVIDIA.

2. DeepSeek Models: v3 and R1

Overview and Training Methodologies

Dylan Patel introduces DeepSeek’s latest models: DeepSeek v3 and DeepSeek R1. DeepSeek v3 is described as a mixture of experts Transformer language model, released in late December 2024. It serves as a base model capable of general instruction following, comparable to OpenAI’s GPT-4 and Meta’s Llama 4.

“Deep Seq V3 is a new mixture of experts Transformer language model from DeepSeek who is based in China... it's an open weight model and it's an instruction model like what you would use in ChatGPT.” ([13:43])

DeepSeek R1, released shortly after v3, represents a reasoning model designed to enhance performance on tasks requiring logical deduction and complex problem-solving. Unlike v3, R1 unveils its chain-of-thought reasoning, providing transparency into the model’s decision-making process.

Comparison with Other Models

Dylan contrasts DeepSeek R1 with OpenAI’s O3 mini reasoning model, highlighting cost-efficiency and openness.

“As we discuss in detail from many perspectives in this conversation... OpenAI’s O3 mini is not [open].” ([00:00])

3. Open Weights vs Open Source

Definition and Licensing

The conversation shifts to the concept of "open weights," where model weights are publicly accessible, differing from traditional open-source software. Dylan explains that open weights allow users to run models independently, offering greater control over data privacy.

“Open weights is the accepted term for when model weights of a language model are available on the Internet for people to download... what makes a model open weight.” ([15:09])

DeepSeek’s models employ the MIT license, granting permissive usage rights without commercial or use-case restrictions, unlike Meta’s Llama, which has more stringent licensing terms.

“The Deep Seq R1 model has a very permissive license. It's called the MIT license... Between the Deepseek custom license and the LLAMA license we could get into this whole rabbit hole.” ([18:07])

Implications for the AI Ecosystem

Nathan Lambert emphasizes the importance of truly open-source AI, advocating for the release of training data, code, and weights to enable replication and innovation.

“...for us that means releasing the training data, releasing the training code, and then also having open weights like this.” ([17:32])

4. Geopolitical Implications: US vs China

Export Controls and Strategic Advantage

The discussion explores how US export controls on advanced semiconductors aim to curb China’s AI advancements. Dylan argues that restricting access to high-performance GPUs and manufacturing technologies places China at a disadvantage in training large AI models.

“The US government has effectively said and forever right. Like train training will always be a portion of the total compute.” ([75:20])

Nathan Lambert expands on the potential for a new technological Cold War, highlighting China’s initiatives like the Stargate project and their substantial investments in AI infrastructure.

“Deep Seek is a hedge fund... they have a lot of computer.” ([66:03])

Potential for Conflict

Lex raises concerns about the ramifications of these restrictions, questioning whether they might escalate tensions leading to military confrontations over regions like Taiwan.

“We should lay out the importance. By the way, it's incredible how much you know about so much.” ([66:03])

5. Hardware and Infrastructure: GPUs and Data Centers

TSMC’s Dominance and US Manufacturing Challenges

The role of Taiwan Semiconductor Manufacturing Company (TSMC) is pivotal, as it manufactures the majority of the world’s advanced semiconductors. Dylan and Nathan discuss the challenges the US faces in replicating TSMC’s manufacturing prowess due to high costs and technical complexities.

“TSMC produces most of the world's chips, especially on the foundry side... the cost to build the next generation fab keeps growing.” ([101:11])

NVIDIA’s Strategic Position

NVIDIA remains the leader in AI hardware, with unmatched software ecosystems that facilitate efficient model training and inference. Despite competition from AMD and Intel, NVIDIA’s robust CUDA libraries and continuous innovation keep it at the forefront.

“The biggest thing is you have to see that an advantage goes up and down, right? It's the network-centric nature of AI inference.” ([258:18])

Data Center Mega-Clustering

The guests highlight the unprecedented scale of modern data centers dedicated to AI, such as Elon Musk’s Memphis cluster housing 200,000 GPUs and OpenAI’s planned 2.2 gigawatt facilities in Texas.

“So Ellis is building his own natural gas plant... his patron is like, hey, I'm going to build a factory with 200,000 GPUs in it.” ([296:55])

6. Model Inference and Reasoning

Reasoning Models and Cost Efficiency

DeepSeek R1’s reasoning capabilities emerge from reinforcement learning techniques that allow models to perform complex tasks by generating and evaluating multiple reasoning paths. This approach significantly lowers inference costs compared to OpenAI’s O3 mini.

“R1 is a reasoning model... O1 Pro is spawning multiple and O3 mini...” ([34:26])

Chain-of-Thought and Efficiency

The chain-of-thought mechanism, where models explicitly display their reasoning process, enhances transparency but increases computational demands. DeepSeek’s innovations in latent attention reduce memory usage, making their reasoning models more cost-effective.

“Memory is important because... thinking out loud...” ([35:09])

7. Safety and Alignment

Training Techniques and Ethical Concerns

The conversation delves into how models like DeepSeek R1 and OpenAI’s offerings incorporate safety and alignment through techniques like Reinforcement Learning from Human Feedback (RLHF). However, differing approaches can result in varying levels of model behavior censorship.

“Chain of thought is something where it's able, it's one chain... more dirty operator...” ([155:54])

Risks of Backdoors and Influence

Concerns are raised about the potential for backdoors in open-weight models, where hidden prompts or alignments could manipulate model outputs to serve specific agendas.

“...deep down in the model is what is the overall outcome and we're just picking the topk owners.” ([156:10])

8. Future of AI and Open Source

Open Source Progress and Challenges

DeepSeek R1 epitomizes a shift towards truly open-source AI models with permissive licenses, challenging closed models from major players. The guests discuss the difficulty of maintaining open standards amidst rapid AI advancements and proprietary innovations.

“This is a first time that we've had a really clear frontier model that is open weights and with a commercially friendly license with no restrictions.” ([294:16])

Community and Collaboration

Nathan and Dylan stress the importance of community-driven AI development, advocating for openness to democratize AI advancements and ensure widespread benefits.

“We want this whole open language models thing... it's a democratic way to power AI.” ([295:12])

9. AI’s Impact on Society

Transformation of Software Engineering

AI’s integration into software development is highlighted as a major area of impact, with tools like copilot dramatically enhancing productivity and reducing the cost of programming.

“Software engineering costs are going to plummet like crazy... AI is going to revolutionize software development.” ([292:28])

Automation and Robotics

The potential for AI-driven automation extends beyond coding, encompassing fields like robotics and industrial engineering. While challenges remain in physical world interactions, the prospects for AI-assisted tasks are promising.

“Robotics in the home... agent-based systems... software engineering and automation, AI is set to revolutionize these domains.” ([286:22])

Ethical and Societal Considerations

Lex and the guests contemplate the ethical implications of AI’s pervasive influence, emphasizing the need for responsible development to prevent misuse and ensure AI advancements enhance human well-being.

“How can this be avoided?... What does this mean for global stability and individual autonomy?” ([284:58])

10. Conclusion

Lex Fridman wraps up the episode with reflections on the transformative potential of AI, balanced by the inherent risks and ethical dilemmas. The conversation underscores the urgency of fostering open, collaborative AI development while navigating the complex geopolitical landscape shaped by technological supremacy.

“There are some structural things in a global, interconnected world that you have to accept... AI is coming. There are some structural things that we have to accept.” ([303:44])

Final Thought from Richard Feynman:

“For a successful technology, reality must take precedence over public relations, for nature cannot be fooled.” ([Final Words])

This episode provides a comprehensive examination of the intersection between AI advancements, hardware infrastructure, and geopolitical strategies. Dylan Patel and Nathan Lambert offer expert insights into how DeepSeek’s open-weight models challenge the status quo, the critical role of semiconductors in AI development, and the broader implications for global power dynamics. As AI continues to evolve rapidly, the dialogue emphasizes the need for openness, ethical considerations, and strategic foresight to harness AI’s full potential for societal benefit.