Catalyst with Shayle Kann – Episode Summary
Episode Title: Will inference move to the edge?
Date: December 18, 2025
Host: Shayle Kann
Guest: Dr. Ben Lee, Professor of Electrical Engineering and Computer Science at University of Pennsylvania, Visiting Researcher at Google
Overview:
This episode explores a provocative and timely question: As artificial intelligence (AI) workloads—especially inference tasks—grow, will they remain the domain of massive, centralized cloud data centers, or could they decentralize toward the “edge,” closer to users or even directly onto devices? Shayle Kann and Dr. Ben Lee discuss the technical, economic, and energy implications of this possible shift, examining what might push AI inference toward the edge, how that could fundamentally alter energy demand and grid management, and what obstacles stand in the way.
Key Discussion Points and Insights
1. Defining the Compute Landscape
-
Three Layers of Compute:
- Cloud (Centralized Data Centers): Massive structures run by hyperscalers (Google, Amazon, Microsoft) handling most compute today.
- Edge Computing: Intermediate layer; smaller, still sophisticated data centers closer to users (same city/region), providing lower latency.
- On-device/Edge-Edge: Compute taking place directly on consumer hardware (phones, laptops, etc.)
([05:54])
-
Current State:
- The vast majority of classical and AI inference compute still happens in large, centralized data centers due to their unmatched energy and cost efficiency, resource sharing, and economies of scale.
- Edge computing, though frequently discussed (e.g., for autonomous vehicles), has not taken off as initially speculated due to lack of urgency in latency-sensitive applications.
([08:12])
2. AI Workloads: Training vs. Inference
-
Training:
- Remains the sole preserve of centralized data centers, requiring massive coordination of GPUs, enormous datasets, and high energy efficiency.
- Privacy-based or specialized distributed training exists in research, but not in production.
([10:28])
-
Inference:
- Historically a smaller portion of AI costs, but set to rapidly increase as model adoption expands.
- “Inference costs are large and potentially will grow very rapidly.”
(Dr. Ben Lee, [11:39])
- Training and inference workloads differ in technical needs, influencing where they can efficiently run. Inference requires less coordination among machines than training.
3. Why Move Inference to the Edge?
4. Obstacles to Edge and On-device Inference
-
Infrastructure & Economics:
- Edge data centers often lack infrastructure (e.g., adequate power, optimized cooling) for dense GPU deployment.
- Presently, massive demand for centralized training leaves little incentive to invest in wide edge buildouts until inference workloads are more predictable and profitable.
([32:01])
-
Technological and Energy Trade-Offs:
- Decentralizing may reduce efficiency (higher PUE—Power Usage Effectiveness), increasing total energy demand compared to hyperscale data centers.
- “Total energy costs may go up as a result [of moving inference to the edge].”
(Dr. Ben Lee, [45:34])
-
On-device Inference: Pros & Cons:
- Pros: Ultra-low latency, privacy (data remains local), tight hardware/software integration (e.g., Apple devices).
- Cons: Need to significantly shrink models; less capable chips; battery life and thermal management become limiting factors; only a subset of tasks will be feasible on-device.
([34:42–38:28])
- Custom AI chips and smarter resource management can help, but constraints will remain.
5. Future Scenarios – 2035 Outlook
6. Types of Inference Workloads: Human vs. Agent
- Increasingly, AI agents and software will generate the bulk of inference requests, not end-users—potentially raising compute demand even higher and influencing optimal siting (often still centralized).
([46:26])
Notable Quotes & Memorable Moments
- Edge Possibilities:
- “We could be getting 80% of our compute done locally and leaving 20% of the heavy lifting for the data center cloud.”
— Dr. Ben Lee ([41:18])
- Technical Trade-offs:
- “As you shrink the system down, you will lose in efficiency… So, yes, I think total energy costs may go up as a result.”
— Dr. Ben Lee ([45:34])
- On-device Limitations:
- “That smaller model will be less capable. It will give you less capable answers. It will be capable of doing fewer tasks. But maybe that’s okay because you’ve identified only a handful of tasks that you really care about on your personal device.”
— Dr. Ben Lee ([36:02])
- On the Wildness of Data Center Power Management:
- “They create dummy workloads so they keep the power profile basically flat. But you are literally just wasting energy on absolutely nothing...”
— Shayle Kann ([18:46])
- Why smaller edge data centers might take off:
- “If one of these model providers or one of these application developers makes performance a distinguishing feature of their offering... then we’re going to see, well, I may have a thousand GPUs in the middle of Nebraska that are already deployed, but if I really want to break into the San Francisco market, I’ve got to build my GPUs right there and have them available.”
— Dr. Ben Lee ([32:44])
Timestamps for Key Segments
- Defining Compute Layers: [05:54]
- Data Center Energy Efficiency: [08:12]
- AI Training vs. Inference Workloads: [10:28–11:39]
- Edge Computing, Latency & AI Applications: [13:39–16:24]
- Training Requires Centralization: [17:08]
- Inference Technical Feasibility at the Edge: [22:24]
- Data Center Siting and Power Issues: [24:45]
- Obstacles and Market Dynamics: [32:01]
- On-device Inference Pros and Cons: [34:42–38:28]
- 2035 Scenario & 80/20 Rule: [41:18–42:51]
- Energy Implications of Edge Inference: [45:34]
- Inference Workload Diversity: [46:26]
Tone and Final Thoughts
The conversation balances deep technical insight with practical, market-oriented perspective. Shayle’s energetic curiosity complements Dr. Lee’s clarity and expertise. Both are cautiously optimistic about edge inference’s potential but realistic about economic bottlenecks and energy trade-offs.
Bottom line:
While AI inference currently clusters in mega data centers, technical and market signals suggest a future with much more decentralized compute—at the edge, if not (yet) on-device. This shift will fundamentally reshape where energy for AI is consumed, how efficiently it's used, and what investments get made in infrastructure across the grid and technology stack.