Podcast Summary: The Twenty Minute VC (20VC)
Episode: AI Chip Wars: How Cerebras Plans to Topple NVIDIA's Dominance | Why We Have Not Reached Scaling Laws in AI | What Happens to the Cost of Inference | How We Underestimate China and Shouldn't Sell To Them with Andrew Feldman
Release Date: March 24, 2025
Host: Harry Stebbings
Guest: Andrew Feldman, Co-founder and CEO of Cerebras
Introduction
In this episode of The Twenty Minute VC, host Harry Stebbings welcomes Andrew Feldman, the co-founder and CEO of Cerebras, renowned for developing the fastest AI inference and training platform globally. Andrew shares insights into Cerebras' strategy to challenge NVIDIA's dominance in the AI chip market, the inefficiencies in current AI algorithms, the future of AI scaling laws, and the geopolitical implications of AI technology.
The Genesis of Cerebras and AI Architecture
Timestamp: [04:32]
Andrew Feldman discusses the inception of Cerebras in 2015, driven by the recognition of evolving AI workloads that traditional GPUs couldn't efficiently handle. He emphasizes the inefficiency in current AI algorithms, noting that GPUs are only about 5-7% utilized during inference tasks, resulting in significant wasted computational resources. This inefficiency stems from the fundamental architecture of GPUs, which rely heavily on off-chip memory, making them suboptimal for inference operations.
Notable Quote:
“We won’t be as dependent on transformers in three years or five years as we are now. 100%. The fundamental architecture of the GPU with off-chip memory is not great for inference.” — Andrew Feldman [00:00]
Wafer-Scale Architecture vs. Traditional GPUs
Timestamp: [06:08]
Andrew elaborates on Cerebras' innovative wafer-scale architecture, contrasting it with traditional GPU designs. While GPUs excel in handling large-scale training tasks, their architecture poses limitations for inference due to high memory bandwidth requirements. Cerebras addresses this by utilizing SRAM over HBM (High Bandwidth Memory), enabling faster data movement and higher efficiency during inference. This approach allows Cerebras to maintain high performance without the administrative complexities of scaling across thousands of traditional GPUs.
Notable Quote:
“By going to wafer scale, we were able to put down a huge amount of SRAM and get the benefits of speed and enough capacity.” — Andrew Feldman [07:39]
Efficiency and Cost of Inference
Timestamp: [12:02]
The discussion shifts to the cost dynamics of AI inference. Andrew highlights that the primary components driving inference costs are power consumption and the physical space of data centers. Cerebras' wafer-scale chips consume less power by minimizing off-chip data movement, a significant drain on energy resources. Additionally, they tackle the traditional yield issues associated with large chip designs by implementing redundancy across thousands of tiles, ensuring higher yields and cost-effectiveness.
Notable Quote:
“We use less power because one of the most power-hungry things on a chip are the I/Os moving data off chip.” — Andrew Feldman [12:43]
Scaling Laws and Algorithmic Improvements
Timestamp: [23:45]
Addressing the debate around AI scaling laws, Andrew asserts that there's ample room for algorithmic advancements. He challenges the notion that we've hit an asymptote with current scaling laws, emphasizing that many AI algorithms, especially those governing transformers, are far from optimal. Cerebras aims to enhance efficiency by developing more effective algorithms, thereby increasing chip utilization and reducing inference costs.
Notable Quote:
“I don't think there's a lot of debate among senior ML thinkers that we have tremendous room for algorithmic improvement.” — Andrew Feldman [24:06]
Market Dynamics and NVIDIA's Dominance
Timestamp: [35:24]
The conversation delves into NVIDIA's stronghold in the AI chip market, particularly through its CUDA platform. Andrew contends that NVIDIA's dominance, while formidable, isn't insurmountable. He argues that CUDA lock-in isn't a significant barrier for inference applications, as many AI frameworks like PyTorch are hardware-agnostic and can be compiled for different architectures. This perspective suggests that challengers like Cerebras can effectively compete by offering superior architecture tailored for AI inference.
Notable Quote:
“There's no CUDA locking in inference. None.” — Andrew Feldman [35:24]
Geopolitical Considerations and Export Controls
Timestamp: [46:20]
Andrew discusses the complexities of export controls, especially concerning sales to China. He underscores the challenges in preventing the progression of another nation's technological capabilities despite stringent policies. Cerebras' decision to refrain from selling to China stems from ethical considerations, particularly the potential misuse of AI technology in areas like facial recognition and military applications.
Notable Quote:
“The deal on the table wouldn't be used for good. And I wasn't comfortable with that.” — Andrew Feldman [48:54]
Future Predictions and Industry Outlook
Timestamp: [53:08]
Looking ahead, Andrew predicts that AI's integration into daily life will mirror the ubiquity of telephones within a couple of years. He anticipates that AI will become seamlessly embedded in various applications, enhancing user experiences without overt acknowledgment. Furthermore, he foresees significant growth in the AI chip market, with Cerebras aiming to drive transformative societal advancements through their technology.
Notable Quote:
“Within a year or two, AI's penetration will be approximately the same as telephones.” — Andrew Feldman [53:27]
Overcoming Challenges and Building a Resilient Company
Timestamp: [55:10]
Andrew reflects on the lessons learned from past mistakes, emphasizing the importance of adaptability and humility in leadership. He shares his experience of initially resisting water cooling for Cerebras' systems, only to later adopt the approach successfully as industry standards shifted. This narrative highlights the necessity of being open to change and continuously iterating on strategies to align with evolving technological landscapes.
Notable Quote:
“If you're not prepared to be wrong a fair bit, you ought not to be making a lot of decisions because it comes with the territory.” — Andrew Feldman [55:07]
Conclusion and Final Thoughts
In the concluding segments, Andrew outlines Cerebras' vision for the next decade, aiming to solve significant societal challenges and integrate their AI technology into everyday applications. He underscores the company's commitment to ethical considerations, technological excellence, and strategic partnerships to navigate the competitive AI landscape.
Notable Quote:
“I would like our inference to be powering a collection of apps that don't exist today. And I would like that a meaningful portion of the population in the US and in Europe inadvertently uses our technology.” — Andrew Feldman [58:28]
Key Takeaways
- Innovation in AI Hardware: Cerebras is pioneering wafer-scale architecture to address inefficiencies in traditional GPU designs, particularly for AI inference tasks.
- Algorithmic Efficiency: There's significant potential for improving AI algorithms to enhance chip utilization and reduce inference costs.
- Market Competition: While NVIDIA holds a dominant position, Cerebras' specialized architecture offers a viable challenge, especially in the rapidly growing inference market.
- Ethical Considerations: Cerebras prioritizes ethical implications in its business decisions, particularly concerning sales to regions where AI technology might be misused.
- Future Outlook: AI technology is poised to become as ubiquitous as telephones, with immense growth opportunities in hardware and applications.
For More Information:
To explore more episodes and resources from The Twenty Minute VC, visit www.20vc.com.
