Podcast Summary: Liftoff with Keith Newman
Episode: Chris Sosa on Scaling AI at AMD: The GPU Efficiency Challenge That Will Shape the Future
Release Date: July 16, 2025
Host: Keith Newman
Guest: Chris Sosa, Director of Engineering at AMD
Introduction
In this engaging episode of Liftoff with Keith Newman, former journalist and Silicon Valley dealmaker, Keith Newman, sits down with Chris Sosa, the Director of Engineering at AMD. The conversation delves into the intricacies of scaling artificial intelligence (AI) within AMD, focusing particularly on the challenges and innovations related to GPU efficiency. Drawing from Chris's extensive experience, the discussion offers valuable insights into the current AI landscape, AMD's strategic approaches, and the future of AI-driven technologies.
Current AI Workloads at AMD
Chris Sosa kicks off the conversation by addressing the evolving nature of AI workloads:
"They're looking faster and faster, really getting more and more out of these machines. It's really helping us move faster."
(00:10)
He emphasizes the increasing demand for high-speed processing and the necessity for AMD's infrastructure to keep pace with the burgeoning computational requirements of modern AI applications.
Challenges in Scaling AI
A significant portion of the discussion centers around the hurdles AMD faces in maximizing GPU utilization across large-scale deployments:
"One of the biggest set of challenges is really about really maximizing utilization... Anytime you have a workload that isn't consuming the GPU, you're actually wasting money because these things are expensive and you want to really optimize what you paid for."
(00:34 - 01:22)
Chris highlights the complexities of managing multiple machines at scale, where optimizing a single machine differs vastly from orchestrating fleets of GPUs. The focus is on ensuring that every GPU is efficiently utilized to justify the substantial investment in hardware.
Streamlining Processes for Efficiency
Addressing the need for streamlined operations, Chris discusses the balance between leveraging existing technologies and implementing custom solutions:
"We make it easy to leverage Kubernetes, leverage Slurm...but they're not enough...you have to build dashboards for tracking utilization."
(01:27 - 01:59)
He points out that while tools like Kubernetes and Slurm are foundational for workload orchestration, additional layers such as utilization dashboards are crucial for identifying and rectifying inefficiencies, especially when dealing with diverse teams and varying workloads.
Balancing Power, Performance, and Efficiency at Scale
The conversation underscores the perpetual challenge of maintaining an equilibrium between power, performance, and efficiency:
"It's a constant balance, power, performance, efficiency at scale."
(02:40)
Chris succinctly captures the essence of AMD's mission to deliver high-performance AI solutions without compromising on efficiency, all within a competitive and rapidly evolving marketplace.
Distributed Inference and Training
Looking ahead, Chris delves into the complexities of distributed inference and training, particularly when handling heterogeneous workloads:
"How do you do smarter orchestration to actually optimize how those are placed...while trying to optimize."
(03:08 - 03:50)
He elaborates on the challenges of managing multiple distributed inference tasks across different AI stacks, emphasizing the need for intelligent orchestration to maximize GPU utilization across varied and simultaneous workloads.
ROCm and Instinct Development
A pivotal segment of the discussion focuses on AMD's ROCm (Radeon Open Compute) stack and Instinct GPUs:
"We're really trying to make it really easy for everyone to effectively access our AI stack...empower that."
(03:56 - 04:53)
Chris explains AMD's commitment to supporting developers by ensuring compatibility and ease of use across different hardware generations. By facilitating contributions to frameworks like PyTorch and enhancing support for older Radeon GPUs, AMD aims to foster a more inclusive and versatile AI development ecosystem.
Future Opportunities and Excitement for the Next 12 Months
When probed about upcoming opportunities, Chris expresses enthusiasm about advancements in workload orchestration and platform composition:
"I'm really excited by...being able to leverage these stacks together as part of one platform...optimizing on not two different platforms you have to build, but one platform that you're really optimizing for driving the overall utilization up."
(05:35 - 06:26)
He envisions a future where training and inference platforms converge, simplifying the development process and enhancing GPU utilization through unified orchestration frameworks.
Notable Quotes
-
Maximizing GPU Utilization:
"Anytime you have a workload that isn't consuming the GPU, you're actually wasting money because these things are expensive and you want to really optimize what you paid for."
(00:34 - 01:22) -
Streamlining Workloads:
"We make it easy to leverage Kubernetes, leverage Slurm...but they're not enough...you have to build dashboards for tracking utilization."
(01:27 - 01:59) -
Balancing Act:
"It's a constant balance, power, performance, efficiency at scale."
(02:40) -
Future of Workload Orchestration:
"I'm really excited by...being able to leverage these stacks together as part of one platform...optimizing on not two different platforms you have to build, but one platform that you're really optimizing for driving the overall utilization up."
(05:35 - 06:26)
Conclusions
The episode provides a comprehensive look into AMD's strategic approach to scaling AI, particularly through the lens of GPU efficiency and workload orchestration. Chris Sosa articulates the multifaceted challenges of maximizing GPU utilization, managing distributed AI workloads, and fostering an inclusive developer ecosystem. Looking forward, AMD's focus on integrating orchestration platforms and enhancing the ROCm stack positions them well to navigate the complexities of AI scalability. The conversation underscores the importance of continuous innovation and strategic optimization in maintaining AMD's competitive edge in the AI landscape.
For more insightful discussions and stories from tech leaders, explore over 80 episodes of Liftoff with Keith Newman here.
