Transcript
Dwarkesh Patel (0:00)
Today we are interviewing Satya Nadella. We being me and Dylan Patel, who is founder of Semi Analysis. Satya, welcome. Thank you.
Satya Nadella (0:08)
It's great. Thanks for coming over at Atlanta. Yeah.
Dwarkesh Patel (0:10)
Thank you for giving us a tour of the new facility. It's been really cool to see.
Satya Nadella (0:14)
Absolutely.
Dwarkesh Patel (0:15)
Satya and Scott Guthrie, Microsoft's EVP of Cloud and AI, give us a tour of their brand new Fairwater 2 data center, the current most powerful in the world.
Scott Guthrie (0:25)
We try to 10x the training capacity every 18 to 24 months. And so this would be effectively a 10x increase, 10x from what GPT5 was trained with. And so to put it in perspective, the number of optics, the network optics in this building is almost as much as all of Azure across all our data centers two and a half years ago.
Satya Nadella (0:44)
It's kind of what, 5 million network connections.
Dwarkesh Patel (0:47)
You've got all this bandwidth between different sites in a region and between the two regions. So is this like a big bet on scaling in the future that you anticipate in the future there's going to be some huge model that needs to require two whole different regions to train.
Satya Nadella (0:59)
The goal is to be able to kind of aggregate these flops for a large training job and then put these things together across sites.
Dwarkesh Patel (1:08)
Right.
Satya Nadella (1:08)
And the reality is you'll use it for training and then you'll use it for data gen. You'll use it for inference in all sort of ways. It's not like it's going to be used only for one workload forever.
Scott Guthrie (1:20)
Fairwater 4, which you're going to see under construction nearby, will also be on that one petabits network, so that we can actually link the two at a very high rate. And then basically we do the AI WAN connecting to Milwaukee, where we have multiple other fair waters being built.
Satya Nadella (1:35)
Literally, you can see the model parallelism and the data parallelism. It's kind of built for essentially the training jobs, the pods, the super pods across this campus. And then with the wan, you can go to the Wisconsin data center and literally run a training job with all of them getting aggregated.
