Kubernetes Podcast from Google
Episode: Kubernetes AI Conformance, with Janet Kuo
Date: December 17, 2025
Hosts: Kaslin Fields & Abdel Sghiouar
Guest: Janet Kuo, Staff Software Engineer, Google Cloud
Episode Overview
This episode focuses on the launch and impact of the new Kubernetes AI Conformance program, which was announced at KubeCon North America 2025 in Atlanta. Host Kaslin Fields interviews Janet Kuo, a leading contributor to the program, to unpack its purpose, how it builds on existing Kubernetes conformance, and how it addresses the unique challenges of running AI workloads in Kubernetes environments.
Key Discussion Points and Insights
1. Kubernetes Conformance Basics
(06:15 - 06:53)
- Standardization: The primary aim of the original Kubernetes conformance program is to ensure consistency across platforms, so users experience predictable workload behavior no matter the environment.
- Quote:
"Kubernetes conformance is a program that a platform must pass a set of tests to say it's Kubernetes conformant. It makes sure that every platform provides similar experience for running workloads on Kubernetes."
— Janet Kuo, (06:36)
2. Why AI Conformance?
(06:53 - 08:33)
- Superset of Existing Conformance: AI conformance is additional to regular Kubernetes conformance; a platform must be Kubernetes conformant before achieving AI conformance.
- Addressing New Requirements: AI workloads have specific networking and hardware (accelerator) needs distinct from typical stateless workloads.
- Goal: To harmonize the experience of running AI workloads everywhere, just as Kubernetes has done for general workloads.
- Quote:
"We are starting to see AI workloads running on Kubernetes... they start having different requirements. For example, different networking requirements or accelerator how things run... we see an opportunity for us to again bring the conformance to the AI space."
— Janet Kuo, (07:36)
3. Platform Capabilities and AI Workloads
(08:33 - 09:34)
- Cluster-Level Guarantees: AI conformance is about what the platform guarantees (e.g., hardware reservations, low latency networking), rather than user-specific workload styles.
- Encouraging Standards: One aim is the adoption of industry-wide standards for exposing accelerator resources and metrics.
4. Dynamic Resource Allocation (DRA) API
(09:34 - 12:04)
- Key AI Feature: DRA allows for precise, granular specification of required accelerator resources, such as GPUs and TPUs.
- DRA in AI Conformance: Platforms must support this API to be AI conformant.
- Quote:
"DRA is dynamic resource allocation. And that's one of the... big highlighted features that Kubernetes has created... to serve AI workloads."
— Kaslin Fields, (10:40)
"DRA is really useful for you when you want to specify really sophisticated or fine grained requirements for asking for accelerator."
— Janet Kuo, (11:21)
5. Demo Highlights from KubeCon
(12:04 - 14:06)
- Demo Recap: Janet demonstrated running an AI inference workload on a conformant platform, leveraging DRA, auto-scaling based on custom metrics, and extracting accelerator performance metrics.
- Key Requirements Showcased:
- Support for DRA API
- Platform-integrated monitoring and metrics
- Auto-scaling based on custom metrics, e.g., request rates
- Quote:
"I showed how to auto scale based on the number of requests returned from my inference workload. And eventually I also showed that I can also get the performance metrics out of the accelerator in the platform."
— Janet Kuo, (12:38)
6. Open Source Collaboration and Future Goals
(14:06 - 17:17)
- Community-Led Standards: The program is being built through an open working group, with ongoing efforts to standardize metrics and DRA attributes for improved interoperability.
- Ownership and Evolution:
- Sponsored by SIG Architecture, with plans for automated testing via SIG Testing.
- As it matures, could merge with general Kubernetes conformance or have its components absorbed by long-term SIGs.
- Aspirational vision: one day AI conformance is simply a part of regular Kubernetes conformance.
- Call for Community Input: Openness to feedback and contributions from users and operators to guide the direction and set priorities.
- Quote:
"We want to provide a consistent experience and make it easy and portable for everyone."
— Janet Kuo, (14:23)
Notable Quotes
-
On AI workloads in Kubernetes:
"Kubernetes is really well known for running stateless workloads really well. And AI workloads are often stateful. They often have these really strict hardware requirements."
— Kaslin Fields, (08:33) -
On the conformance test philosophy:
"We are not enforcing how people want to run AI workloads, but it's more about walking capability or guarantees that the platform should provide so that... they know what to expect on the platform."
— Janet Kuo, (09:34) -
On community-driven standards:
"We want to help the industry and help the ecosystem to come up with new standards out of the conformance so that we have a common way to execute."
— Janet Kuo, (09:34) -
On the program's future:
"Maybe one day Kubernetes AI conformance is just Kubernetes Conformance, but that's a very bold goal and for now we just want to make sure that we are doing everything that covers the community need."
— Janet Kuo, (16:19)
Key Timestamps
| Timestamp | Segment Description | |------------|---------------------------------------------------------------------| | 06:36 | Janet explains Kubernetes conformance basics | | 07:36 | Introduction to AI Conformance and rationale | | 09:34 | Platform guarantees vs workload specifics; industry standards | | 10:40 | DRA (Dynamic Resource Allocation) overview | | 11:21 | Use cases for DRA in AI workloads | | 12:38 | Janet describes AI Conformance demo at KubeCon | | 14:23 | Community involvement and standardization goals | | 16:19 | How the working group operates and future handoff to SIGs |
Memorable Moments
- Janet’s live demo at KubeCon (12:38–14:06): Showcased the actual capabilities required for AI conformance, making theoretical standards tangible for the community.
- Vision for Future Conformance (16:19): Dreaming of AI conformance merging into the core of Kubernetes symbolizes the ambition to make AI workloads a first-class citizen in the ecosystem.
Conclusion
This episode provided a comprehensive introduction to the Kubernetes AI Conformance program, its motivations, key features like DRA, and the importance of platform-level guarantees for complex AI workloads. Janet Kuo emphasized the collaborative nature of the initiative, its foundation in community working groups, and the roadmap toward greater standardization and simplicity for AI on Kubernetes. The episode is essential listening for anyone interested in the intersection of Kubernetes, AI, and cloud-native infrastructure evolution.
