El Capitan - Big Ideas Lab | Wave AI Podcast Notes

Summary7 min read

Big Ideas Lab: Unveiling El Capitan – The World's Fastest Supercomputer

Podcast Information:

Title: Big Ideas Lab
Host/Author: Mission.org
Description: Your weekly exploration inside Lawrence Livermore National Laboratory. Hear untold stories, meet boundary-pushing pioneers, and get unparalleled access to groundbreaking science and technology. From national security challenges to computing revolutions, discover the innovations that are shaping tomorrow, today.
Episode: Processing...
Release Date: November 19, 2024

Introduction: The Dawn of El Capitan

In the November 19, 2024 episode of Big Ideas Lab, Mission.org delves into the monumental achievement of Lawrence Livermore National Laboratory with the launch of El Capitan, the world's fastest supercomputer. This exascale system, boasting a peak performance of over two exaflops (2 × 10¹⁸ calculations per second), represents a significant leap in computational capabilities, poised to address some of humanity's most pressing challenges.

Host (00:04): "For decades, the U.S. Department of Energy has been pursuing a bold vision. A system powerful enough to tackle the greatest challenges facing humanity."

The Genesis of Exascale Computing

El Capitan is not an overnight success but the culmination of decades of research, development, and strategic vision. Lawrence Livermore, alongside other NNSA laboratories like Los Alamos and Sandia, recognized the need for an unprecedented computational powerhouse to simulate nuclear reactions, discover new materials, and advance national security.

Expert 1 (00:33): "Exa is a Greek prefix meaning 10 to the 18th. And exascale system nominally is about how many calculations can it perform per second?"

The journey to exascale began in earnest around 2008 when DARPA highlighted the impending challenges in meeting exascale targets, notably in power consumption, memory, and system resiliency. This foresight led to the Exascale Computing Project launched in 2015-2016, focusing on developing software and hardware necessary for such advanced systems.

Collaboration and Co-Design: A Unified Effort

A pivotal strategy in developing El Capitan was the co-design approach, fostering deep collaboration between hardware manufacturers and software developers. This methodology ensured that both hardware and software evolved in tandem to meet the meticulous requirements of exascale computing.

Expert 1 (22:13): "Co-design was really the idea that we're going to have to take this from a standard customer-client relationship with these companies to something much more collaborative."

Partners like AMD, Intel, NVIDIA, and HPE played crucial roles by developing advanced processors and GPUs tailored for scientific computation. The introduction of AMD's Accelerated Processing Units (APUs) was a game-changer, integrating CPUs and GPUs into a single package, enhancing energy efficiency, and simplifying programming.

Expert 1 (17:12): "The APU was an innovation that AMD came up with... They were separate memory. And one of the complications... The APU now gets rid of one of those complications that makes the system more efficient."

Overcoming Technical Challenges: Power and Precision

Achieving exascale performance came with formidable challenges:

Energy Efficiency: El Capitan's design prioritized energy efficiency to manage the immense power demands without exorbitant operational costs. The shift from traditional CPU-based systems to incorporating GPUs and later APUs was instrumental in this regard.

Expert 1 (12:23): "One of the original challenges... really around the power requirements of these computers."
Precision in Computations: Supercomputers operate with 64-bit double precision calculations, essential for reliable scientific results. The fidelity of these calculations is paramount, especially in fields like nuclear weapons research, where even minor errors can have significant consequences.

Host (08:10): "64-bit precision allows supercomputers to handle numbers with up to 64 binary digits, enabling the highly accurate calculations needed for complex simulations... where the smallest margin of error can have critical consequences."
Cooling Infrastructure: Managing the heat generated by El Capitan, which cycles through 5 to 8 million gallons of water daily for cooling, required an extensive overhaul of the laboratory's cooling systems. Liquid cooling ensures that the supercomputer operates efficiently without overheating.

Expert 1 (20:44): "El Capitan will cycle through 5 to 8 million gallons of water every 24 hours to keep its systems cool and running efficiently."

The Launch of El Capitan: A New Era in Computing

On November 18th, at the Supercomputing Conference (SC Conference), El Capitan was officially unveiled as the world's fastest supercomputer. Its deployment is not only a testament to technological prowess but also a strategic asset for national security and scientific advancement.

Host (02:40): "At a peak speed of more than two exaflops, El Capitan is not just a technological marvel, but a machine that holds the future of national security, scientific research, and breakthrough innovations in its hands."

Impact on National Security and Scientific Research

El Capitan's primary mission revolves around Science Based Stockpile Stewardship, ensuring the safety and reliability of the U.S. nuclear stockpile without the need for physical testing. This aligns with the cessation of underground nuclear testing since 1992.

Expert 1 (04:30): "To maintain confidence in our nuclear stockpile... we use supercomputing and modeling and simulation as one leg of a new program called Science Based Stockpile Stewardship."

Beyond national security, El Capitan's capabilities extend to diverse fields such as fusion energy, climate modeling, renewable energy research, drug discovery, and earthquake simulation, making it a versatile tool for scientific breakthroughs.

Innovations in Hardware and Software

The transition to heterogeneous computing using CPUs and GPUs was a significant milestone achieved with systems like Sierra. However, adapting complex, millions-of-line codes to run efficiently on GPUs was a daunting task. Tools like Raja and Umpire were instrumental in streamlining this process, reducing code implementation time, and paving the way for El Capitan.

Expert 3 (15:46): "To make sure it has the payoff... tools, first used for Sierra, sped up the work for El Capitan, dramatically reducing code implementation time."

The introduction of APUs further simplified programming and enhanced energy efficiency, essential for managing the power consumption of such a vast system.

Building the Infrastructure: Power and Cooling Upgrades

Supporting El Capitan required significant upgrades to Lawrence Livermore National Laboratory's infrastructure. The Exascale Computing Facility Modernization Project increased electrical capacity from 45 megawatts to 85 megawatts, ensuring ample power supply for El Capitan and existing supercomputers.

Expert 2 (19:39): "We deployed a significant increase in the electrical infrastructure... taking us from 45 megawatts to 85 megawatts."

The sophisticated liquid cooling system ensures that the supercomputer operates within optimal temperature ranges, preventing overheating despite its immense computational power.

The Future: Beyond El Capitan

El Capitan's deployment marks the beginning of a new chapter in high-performance computing. Lawrence Livermore National Laboratory is already strategizing for the next exascale systems, anticipating increased power demands and continuing to push technological boundaries.

Expert 1 (27:05): "We're already starting to think about what the next system in the 2030 timeframe is going to be."

Challenges such as slowing technology advancements, rising costs, and evolving hardware landscapes necessitate innovative approaches in computational methods and numerical algorithms to maximize efficiency and performance.

Expert 2 (29:07): "We're thinking about how can we make the systems get more work done for the same compute capability."

Despite uncertainties in future hardware developments, the team remains committed to leveraging El Capitan's capabilities to tackle unprecedented problems and drive scientific discovery.

Conclusion: A Catalyst for Progress

El Capitan is more than just the fastest supercomputer; it symbolizes the United States' leadership in high-performance computing and its commitment to addressing critical national and global challenges through technological innovation.

Expert 1 (29:56): "A fast computer that doesn't actually solve any of humanity's problems, that's not terribly interesting. ... it has to be for a purpose."

As Lawrence Livermore National Laboratory continues to expand and enhance its computational prowess, El Capitan stands as a beacon of what collaborative vision, innovation, and relentless pursuit of excellence can achieve.

Host (30:41): "The pursuit of what comes next, anticipating future limits and pushing past them. That's the enduring mission."

Notable Quotes:

Expert 1 (00:33): "Exa is a Greek prefix meaning 10 to the 18th. And exascale system nominally is about how many calculations can it perform per second?"
Expert 1 (04:30): "To maintain confidence in our nuclear stockpile... we use supercomputing and modeling and simulation as one leg of a new program called Science Based Stockpile Stewardship."
Expert 1 (17:12): "The APU was an innovation that AMD came up with... They were separate memory. And one of the complications... The APU now gets rid of one of those complications that makes the system more efficient."
Expert 3 (15:46): "To make sure it has the payoff... tools, first used for Sierra, sped up the work for El Capitan, dramatically reducing code implementation time."
Expert 1 (29:56): "A fast computer that doesn't actually solve any of humanity's problems, that's not terribly interesting. ... it has to be for a purpose."

Final Thoughts

The Big Ideas Lab episode on El Capitan offers an in-depth look into the intricacies of building and deploying an exascale supercomputer. It highlights the convergence of visionary planning, cutting-edge technology, and collaborative effort necessary to achieve such a feat. El Capitan not only redefines computational boundaries but also sets the stage for future innovations that will continue to shape the landscape of science and national security.

Loading summary

Transcript71 lines

[00:05]
Host
For decades, the U.S. department of Energy has been pursuing a bold vision. A system powerful enough to tackle the greatest challenges facing humanity.
[00:15]
Expert 1
Fears of a serious new threat to.
[00:18]
Host
US National Russia has begun major nuclear weapons exercise.
[00:21]
Expert 1
The World Health Organization has declared a.
[00:23]
Host
Global public health emergency system is undergoing a once in a century transform.
[00:28]
Expert 1
What would the energy of the future.
[00:30]
Host
Look like that vision? Exascale computing.
[00:34]
Expert 1
Exa is a Greek prefix meaning 10 to the 18th. And exascale system nominally is about how many calculations can it perform PER SECOND?
[00:44]
Host
The U.S. government and the National Nuclear Security Administration's trilabs, Lawrence Livermore, Los Alamos and Sandia National Laboratories needed a machine capable of operating on a scale that had never been done before.
[01:00]
Expert 1
You can think of it as a billion billion. And so just the sheer number of calculations that you can perform in a fixed amount of time is beyond anything that we've been able to do in the past.
[01:13]
Host
The NNSA labs needed a computer that could simulate nuclear reactions to the tiniest detail, discover new materials, boost energy, advance inertial confinement fusion, and meet the nation's evolving national security demands. But building a machine of this magnitude required vision and a willingness to gamble on the unknown. More than a decade of work went into building something that would push the boundaries of what was possible. Capable of performing more than 2 quintillion calculations per per second at its peak. And now that vision is a reality. This machine doesn't just turn on, it awakens piece by piece, system by system, each one coming to life in perfect synchronization. And then it happens. The future has arrived. Welcome to the world, El Capitan.
[02:27]
Expert 2
We expect El Capitan to offer more total compute capability than any previously built system.
[02:40]
Host
Welcome to the Big Ideas Lab. Your weekly exploration inside Lawrence Livermore National Laboratory. Hear untold stories, meet boundary pushing pioneers, and get unparalleled access inside the gates. From national security challenges to computing revolutions, discover the innovations that are shaping tomorrow. Today, on November 18th, El Capitan was officially launched at supercomputing's biggest showcase, the SC Conference, where it was announced as the world's fastest supercomputer. At a peak speed of more than two exaflops, El Capitan is not just a technological marvel, but a machine that holds the future of national security, scientific research and breakthrough innovations in its hands.
[03:36]
Expert 1
El Capitan is one of the first exascale systems deployed in the United States. It is the third in a series that the United States has been developing and is the first of these exascale systems to be Deployed for the national security mission.
[03:53]
Host
Rob Neely is the Associate Director for weapons simulation and Computing at Lawrence Livermore National Laboratory. The immense computational power of El Capitan and its unclassified companion system, Tuolumne, holds the potential to solve some of humanity's biggest challenges, from fusion energy to climate modeling, to renewable energy research, to breakthroughs in drug discovery and earthquake simulation. However, at its core, El Capitan was designed with a singular mission to ensure the safety, security and reliability of the.
[04:30]
Expert 1
US Nuclear stockpile for the United States. To maintain confidence in our nuclear stockpile Prior to 1992, if we wanted to understand if a change or a new design worked, we would go off to Nevada, drill a big hole in the ground, put the weapon down there and set it off. And that's called underground testing. And that's how things worked for decades. We stopped doing that in 1992 under the Clinton administration. And that left us with the big question, how are we going to retain confidence in our nuclear stockpile? And so that really spearheaded a big push in the United States to use supercomputing and modeling and simulation as one leg of a new program called Science Based Stockpile Stewardship, designed to make sure we could retain our confidence in these weapons.
[05:26]
Host
With new global threats, emerging and a Cold War era arsenal still in play, ensuring that the US Maintains its nuclear deterrence and its competitive advantage over its adversaries have become some of the nation's most critical challenges.
[05:40]
Expert 1
For the first time in decades, we're designing new weapons that are similar but safer, higher performing. So that national security mission is core to a lot of what we do and what we plan to use.
[05:54]
Host
El Capitan for this mission is not new. It's one Lawrence Livermore National Laboratory has been working on since its founding in the 1950s.
[06:05]
Expert 3
The nice thing about Doe Labs is they do make these long term investments in science and technology that we think we're going to need for the mission, so they can take 20 and 30 years to come to fruition, which is a really interesting work environment for us.
[06:21]
Host
Teresa Bailey is the Associate Program Director for the Weapons Simulation and Computing Computational Physics program at Lawrence Livermore National Laboratory. Her job is to oversee the development of a wide range of modeling and simulation tools that can be run on Lawrence Livermore National Laboratory's high performance computers. She points out that El Capitan in many ways represents the culmination of decades of research and development, bringing to life the vision of the Accelerated Strategic computing initiative, or ASCII, that was established over 25 years ago.
[06:56]
Expert 3
The ASCII program was designed to deliver modeling and simulation tools aimed at stockpile stewardship using high performance computers, so that we would never have to go back to nuclear testing. So El Capitan really represents that end product. For the original vision of ascii, supercomputing.
[07:20]
Expert 2
Is bringing to bear as much computational power as we can assemble to solve the hardest problems that are out there.
[07:30]
Host
Bronis Diesepinski is the chief technology officer for Livermore Computing.
[07:35]
Expert 2
We do modeling and simulation of a variety of processes. Most of them are related to stockpile stewardship, but climate science, things like that. To do those kinds of simulations in ways that actually models things close to reality, it takes quite a lot of computation. And so it takes much more computing capability than you have in, say, your laptop or your phone.
[08:00]
Host
Today. Supercomputers far surpass the computational power of any device you have at home. What truly sets them apart is their precision and interconnectedness. Supercomputers are designed to have thousands of compute nodes work together to run simulations that mimic reality with incredible accuracy, which requires immense computational power to achieve the 60, 64 bit double precision calculations necessary for reliable scientific results.
[08:29]
Expert 2
Mathematically, real numbers are infinite precision. In a computer, you have to choose some finite precision. The fidelity with which you're representing that infinitely precise number in the computer is limited by the number of bits you devote to it. And so getting an accurate answer depends on how many bits you use for it.
[08:51]
Host
Think about it this way. When you write down the number three, you're not just writing three. In precise terms, it's 3.00000000 continuing infinitely. Computers, however, have finite precision, meaning they have to cut off those trailing zeros at some point. In scientific computation, every extra decimal place of precision can be the difference between a simulation that is reliable and one that isn't.
[09:26]
Expert 2
Doing large simulations requires fairly significant precision, so most of our computations require 64 bit computations.
[09:36]
Host
64 bit precision allows supercomputers to handle numbers with up to 64 binary digits, enabling the highly accurate calculations needed for complex simulations, such as those used in nuclear weapons research in national security, where the smallest margin of error can have critical consequences. Close enough simply isn't an option. This is why Lawrence Livermore National Laboratories and the NNSA have been relentlessly focused on hitting that exascale computing target. But achieving this level of technological advancement requires more than just improving current capabilities. It requires holding a vision so bold and far reaching that the path forward may not always be immediately clear.
[10:24]
Expert 1
Back in about 2008, there was a seminal paper that was released by DARPA, the Defense Advanced Research Projects Agency, foreshadowing the difficulties that the computing industry was going to have reaching this exascale target. So for decades, computers had been getting a thousand times faster, approximately every decade or so, built on first just smaller transistors, the more things you could pack on a chip, then by parallel computing, putting more of these chips together into a single system. But getting to exascale. Very early on, almost 15 years ago, it was recognized this was going to be a challenge like we haven't addressed before. So we started thinking about these systems long before we decided what the systems would actually be, because we knew there was going to be a lot of research needed to be able to utilize these systems efficiency effectively.
[11:19]
Host
This DARPA report foreshadowed the immense challenges on the horizon. It highlighted key issues like power, memory and system resiliency.
[11:29]
Expert 1
Fast forward ahead to about 2015, 2016. The United States funded something called the Exascale Computing Project, which was really about the research needed to develop the software and the applications that would ultimately run on these machines. And it also funded some research for companies like AMD and Intel and Nvidia and hpe, big players in the supercomputing industry, to help them develop technology faster so that we could deploy those at the laboratory sooner for our mission. So all this was happening about six, seven years ago. And at that time is when we began thinking about, what's our next system going to be? What's the NNSAS exascale system going to be?
[12:15]
Host
As the Exascale Computing project took shape, it became clear that the path forward would require new solutions for both software and hardware.
[12:24]
Expert 1
One of the original challenges we had when thinking about exascale computing was really around the power requirements of these computers.
[12:32]
Expert 2
Historically, the earliest supercomputers were just the earliest computers, right? Over time, though, they became dominated by something called vector systems. So that's a way of computing a bunch of things at the same time in parallel. There's kind of limitations on that. And so over time, we moved to networked systems of CPUs, which is the standard way of what people use in their laptops. And so for a long time, we were building systems with CPUs.
[13:01]
Expert 1
That's how for decades, we've been getting faster and faster performance on these supercomputers. But if you drew sort of a straight line on where we knew technology was going in the late 2000s, out to 10 years later or so, the amount of power it was going to require to field one of these systems. We were going to have to think about putting a nuclear reactor next to the building because it was in the hundreds of megawatts, which the operational costs for that were more than the Department of Energy even was willing to accept. And so a lot of the initial challenges and a lot of the initial research was, how can we continue to ride this wave of improved computer performance without expanding the amount of energy and power that's going to be required?
[13:47]
Host
The power challenge was immense. Exascale systems like El Capitan would require a completely new approach to energy efficiency, pushing computing experts to explore new ways to design and build these machines.
[14:02]
Expert 2
In order to get more parallelism, we moved to processors that are used to drive the graphics on your screen, so GPUs.
[14:10]
Host
In 2018, Lawrence Livermore National Laboratory launched Sierra, a groundbreaking supercomputer that combined CPUs with GPUs, making it one of the first large scale systems to use this integrated, heterogeneous approach. Sierra delivered 125 petaflops at its theoretical peak, roughly one eighth of the computational performance of exascale.
[14:35]
Expert 1
Part of what we were able to do between the community and our vendor partners, like folks at Nvidia and AMD and intel, were to make these graphics processing chips suitable for scientific computing. It was really scientific computing in partnership with companies that helped us recognize that, yes, we could do this, this could become the basis for the next generation of supercomputers. And it's going to be something like that technology that's going to be required to get us to exascale computing in a power budget that we can manage.
[15:08]
Host
Sierra's design was a huge leap forward. But the shift to GPUs introduced a new challenge. Many of the existing codes weren't built to run on GPUs. These codes had been designed for CPU based systems, and adapting them wasn't a simple task.
[15:26]
Expert 3
These aren't just little codes that you can rewrite over and over again. They're big codes. They're sometimes millions of lines of codes coming together. So to make these big shifts in algorithmic type takes a lot of upfront thought and research. To make sure it has the payoff.
[15:47]
Host
That we need, imagine being tasked with translating a complex manual into a different language. Except this manual isn't just a few pages, it's millions of lines long, and every detail is critical. Even the smallest error could derail the entire process. This was the challenge Lawrence Livermore National Laboratory's developers faced when adapting CPU Based codes to run efficiently on GPUs. To overcome this, the team implemented Raja and Umpire coding tools that simplify and streamline the process of adapting and using the codes. These tools, first used for Sierra, sped up the work for El Capitan, dramatically reducing code implementation time and pushing the exascale transition forward. The next pivotal step toward a fully functional exascale machine came with the introduction of AMD's next generation processors, known as APUs, or accelerated processing units. These chips combined both CPUs and GPUs into a single hardware package, making them more efficient and easier to program. This invention marked a major leap forward in computing technology, not just for the lab, but for the world.
[17:13]
Expert 1
The APU was an innovation that AMD came up with one of our partners in El Capitan to basically integrate the idea of the CPU and the GPU all on a single package. Sierra had GPUs in it, but they were really completely separate from the CPUs. They were separate memory. And one of the complications of using those systems and using accelerated computing in general was that the programmer now had to make some explicit decisions about when to move data between the CPU and the gpu, and when to transfer control of the program from one one type of device to another. The APU now gets rid of one of those complications that makes the system.
[17:56]
Expert 2
More efficient from an energy standpoint, because you're not doing those useless movements of data. And it also makes it easier to program because you don't have to program that movement of data. That's technically a big advantage.
[18:11]
Host
El Capitan is made up of tens of thousands of these APUs, each one linked together to create a vast system capable of calculations on a scale never before seen.
[18:22]
Expert 1
The way these large supercomputers are assembled is you have the basic unit of compute that's called a node, and a node in our case is actually already made up of multiple APUs. Then you take nodes and you assemble those into blades, and then blades get assembled into like a commercial grade refrigerator sized rack that sits on the floor and weighs a lot. And then we assemble those racks together on the order of about 100 of them for El Capitan to make the entire system.
[19:07]
Host
One of the biggest challenges with exascale computing wasn't just designing the machine, but building the infrastructure to support it. At Lawrence Livermore National Laboratory, they had to overhaul the entire electrical and cooling infrastructure, doubling their capacity to handle the immense energy demands of El Capitan. A new utility yard was built, supplying enough energy to power tens of thousands of homes. Just to ensure the supercomputer could run at full capacity without interruption.
[19:39]
Expert 2
As part of something called the exascale computing facility modernization project, we deployed significant increase in the electrical infrastructure to our main data center. And so that took us from 45 megawatts to 85 megawatts. So we're essentially 2x the energy that we can deliver to the computer floor.
[20:05]
Expert 1
Now, El Capitan is not going to use all 85 of those megawatts, but it's going to use somewhere around 30 of those at any given time.
[20:15]
Host
That power is enough to supply around 30,000 homes. The extra energy capacity of the Livermore computing facility ensures they can sustain existing supercomputers alongside El Capitan. Despite its substantial energy requirements. El Capitan is one of the most energy efficient supercomputers ever built in terms of performance per watt. But all that power generates heat. A tremendous amount of liquid cooling is.
[20:45]
Expert 1
Required to keep these systems from literally melting, because they run at sometimes over a thousand watts. So you think about how hot a 100 watt light bulb can get. Magnify that now by 10, 20, 50 times. That's how much heat you're trying to dissipate in a very small package in one of these nodes of a supercomputer, and then multiply that by the tens of thousands of nodes that make up these systems. That's a lot of heat that you've got to try to make sure you can get rid of. And liquid cooling is the idea that you bring in cool water, you then run water across cold plates, it dissipates some of the heat away, goes out the other side of the rack, and then eventually through heat exchangers, goes out to a cooling tower, and then that water is cooled and the cycle repeats.
[21:37]
Host
At full operation, El Capitan will cycle through 5 to 8 million gallons of water every 24 hours to keep its systems cool and running efficiently. Building El Capitan required more than just cutting edge technology. It took a coordinated effort across multiple organizations. Years of collaboration between Lawrence Livermore National Laboratory, the Department of Energy and NNSA and private industry were essential in overcoming the immense technical challenges of exascale computing.
[22:14]
Expert 1
Back around the time where we were first starting to talk about exascale and recognizing the challenges, we created a term that stuck called co design, which was really the idea that we're going to have to take this from a standard customer client relationship with these companies to something much more collaborative. We need to understand more about the long distance roadmaps of these companies so that we can begin to angle our research and our applications development toward what their roadmaps are. But probably more importantly, these companies really need to understand where the bottlenecks are in our applications so that they can think about how to design their hardware in ways that are going to best address our concerns and our needs.
[22:59]
Host
Co design emerged as a way to blur the lines between hardware and software development, bringing together experts from both fields to work side by side. This deeper level of collaboration, often involving clearances for security sensitive work, allowed teams to quickly identify and address the most critical challenges, speeding up progress in ways that wouldn't have been possible without a standard customer supplier relationship.
[23:25]
Expert 2
Being innovative is really critical to doing new things, taking new approaches. But if you have a completely new approach all on your own, you're not going to get much done because is big things take lots of people.
[23:38]
Host
Together they've built something truly extraordinary. El Capitan is not only faster and more powerful, but is also able to tackle problems that were once deemed intractable.
[23:51]
Expert 3
So there's a class of problems that are big 3D problems that we want to run at high resolution. We've been studying this class of problems for years, since the beginning of ascii. And the first time we took it out for a spin, it took like half of our biggest supercomputer and it took over a month to run it. And in 2015 we checked again and it took maybe 20% of our supercomputer and it took a little less than a month. Then we took that same calculation out for a spin on Sierra and all of a sudden it took 3.3 days. Whoa. Okay. That is like game changing, right? Think about it. Think about what you can turn around in 3.3 days as opposed to a month. 10 different types of those calculations. Think about if you're designing something, how that changes what you can do, right? It's just night and day. I get 10 shots as a designer to make a choice in a month. That's incredible. Oh, and by the way, that 3.3 days took less than 10% of Sierra. That was like, this is going to be tractable on El Capitan. We need to continue pushing. We need to get to higher mesh resolutions and do a better job with the physics. And that is our goal. Our goal is a reasonable turnaround time for a medium to high resolution, full physics calculation in three dimensions that we have never been able to do before.
[25:33]
Expert 1
It's an open question for whether or not El Cap will be the fastest computer in the world for one year, two years, maybe. Three years. We can't predict that right now just because everybody's working always to build faster and faster computers.
[25:49]
Host
El Capitan's achievement as the world's fastest supercomputer isn't just about speed. It's about what that speed can accomplish. It signals a new era of computational capability that will tackle some of the biggest challenges facing humanity, whether it's understanding complex physical phenomena or advancing national security.
[26:10]
Expert 3
We have a series of problems that are just going to challenge the entire scale of the machine. There are problems I can imagine. They're big problems. They're things that no one has ever dreamed really trying. We have laboratory directed research and development projects that have put things in place where like if we could do something massive, we could solve this problem. And there are a few of them that over time, using El Capitan, we will get the codes aligned and arranged and go after those problems as well. They're probably not going to be the first thing we try, but over the life of the machine, we will take a shot. I'm very certain of that.
[26:49]
Host
El Capitan's journey did not begin yesterday and it won't end tomorrow. Decades of planning, innovation and collaboration have led to this moment. And now, even as it comes online, the lab is already looking to the future.
[27:05]
Expert 1
We always are planning ahead as much as possible, and we're already starting to think about what the next system in the 2030 timeframe is going to be. And we want to make sure we can begin to stand it up. And we're anticipating it's also going to be a very power hungry system while still keeping El Capitan running because it will of course be being used for the mission during that time. So we sized our facility to be able to support multiple exascale systems at one time during that overlap period, when we're at end of life for one system and beginning of life for the next system. So that was one of our big challenges, was making sure we have the infrastructure, the power and the cooling required to field these systems.
[27:50]
Host
Building world changing technology requires looking far ahead, anticipating the limits of today's capabilities, and constantly pushing the boundaries. The team at Livermore isn't just solving problems. They're planning for challenges that may not even exist yet.
[28:08]
Expert 3
What's the next big thing? That is the billion dollar question, I think taking a step back and looking at the numerical methods that we can apply on these machines and looking at different ways to run sensitivities or to understand how we not just get one solution, but a solution, plus ensembles of answers or getting gradients of the solution is probably what we should be doing once we get through the El Capitan challenge. Hardware advances are uncertain. The machine learning market is driving hardware changes that are complex for our codes to deal with. They don't need the precision we need. And so the hardware that vendors are creating take that into account to sell to that market. So that's going to be a challenge for us.
[29:08]
Expert 2
We're looking at both technology slowing down and prices going up and very worried that for the same dollars, we're not going to get a lot more compute than we have in El Capitan. So we're thinking about how can we make the systems get more work done for the same compute capability.
[29:29]
Expert 3
Because of that, the future of the architectures is unclear. So for the code teams, I think thinking about new types of calculations we can do, new types of numerical methods we can employ because we do have huge compute is what we'll do in the short term.
[29:49]
Host
The real power of El Capitan isn't just in the numbers it can crunch today, but in the new frontiers it opens for tomorrow.
[29:57]
Expert 1
This is a signal of the United States continuing leadership in high performance computing that we can continue to do something that is the best in the world. And that alone makes El Capitan interesting. And that alone is one reason to be proud of what we're doing here at our national laboratories with US Industry to do something that is the best in the world. But of course, a fast computer that doesn't actually solve any of humanity's problems, that's not terribly interesting. So it's not enough for us to just be the fastest in the world. It has to be for a purpose.
[30:42]
Host
As the team at Livermore has shown, reaching for the pinnacle is just the beginning. The pursuit of what comes next, anticipating future limits and pushing past them. That's the enduring mission. Thank you for tuning in to Big Ideas Lab. If you loved what you heard, please let us know by leaving a rating and review. And if you haven't already, don't forget to hit the Follow or Subscribe button in your podcast app to keep up with our latest episode. Thanks for listening.