Transcript
Kaslin Fields (0:01)
Hello and welcome to the Kubernetes Podcast from Google. I'm your host Kaslin Fields. And I'm Mofi Rahman. This is our Kubecon EU 2025 episode. We usually like to get the event episodes published closer to the actual event date, but we also are participating in Google Cloud Next, which ended up being right after Kubecon. But as we like to say, better late than never. Yes, you and Abdel recently had an amazing time at Kubecon EU 2025 in London. I wish I could have been there. And you ran a series of live streamed interviews straight from the conference floor. I know that the energy was high and you chatted with some really interesting folks across the community. So in this episode we're bringing a curated selection of those conversations, diving into the rise of platform engineering, exploring some cutting edge technologies, getting updates on core Kubernetes components, and hearing some truly unique user stories like running Kubernetes on a dairy farm. But first, let's get to the news. The Cloud Native Computing foundation has announced the release of the Automated Governance Maturity model, developed by CNCF's Technical Advisory Group or TAG Security. This model aims to help organizations evaluate and enhance their governance policies, particularly focusing on automation in an era of rapid development and increasing AI system usage. The goal is to ensure systems operate according to organizational expectations, comply with regulations, and meet strategic objectives by embedding automation into traditional governance tasks. Kubernetes 1.33 release feature blogs are continuing to come out, so if you are interested in learning more about the features in 1.33, make sure you check out the blogs on Kubernetes IO. Some cool features with new blogs include dynamic resource allocation or dra, image volumes and horizontal pod auto scaling. The CNCF recently featured a new blog called Understanding Kubernetes Gateway A Modern Approach to Traffic Management. As we've talked about before on this show, the kubernetes Gateway API is a really cool update that enables several improvements to the way you can manage ingress for your workloads on Kubernetes, So check out the blog Open ObservabilityCon has been renamed to Open Observability Summit to avoid potential confusion with other similarly named events. While the name is changing, the purpose remains the same to bring together developers, operators and business leaders to explore and enhance open source observability projects and and practices. The event takes place in Denver, Colorado on June 26, 2025 and that's the news. One of the most talked about topics at Kubecon this year was undoubtedly platform engineering. It's all about enabling developers and streamlining operations. Abdel kicked off his kubecon livestreams by speaking with several experts who are deeply involved in building and defining these platforms. And they had some fantastic insights to share. Note that these were recorded live on the kubecon show floor. So there is a little bit of noise and some audio issues. So we hope you'll be able to bear with us through those. First up, Abdel was joined by Hans and Adun from nav, the Norwegian labor and Welfare Administration. They've developed an impressive internal platform called NICE, or nais, which we did an episode on in January. There's a link to that in the show notes. And they've been instrumental in fostering a significant platform engineering community in Norway. Let's hear about their journey, how they're using OpenTelemetry and their plans for the platform. You have a talk tomorrow about platform engineering. Can you tell us a little bit more about that? Yes. Well, it's a talk about two things. Basically it's a talk about the platform we have at N called nice and also a community we built around platforms in Norway where we bring people together from the whole public sector in Norway, where I think we have like a few thousand members and 60, 70 companies. So that's where they can meet Slack, basically meetups where they can meet and share experiences. Awesome. Yeah, cool, cool. So we actually covered NICE once on the podcast. So there is a episode. Make sure to link it somewhere. I don't remember who did. We had. No, it was through Danyoni. Yes, yes, I remember now. That was a while ago. And so Hans, can you explain to people quickly what NICE is? Yeah. So NICE is a Kubernetes based application platform. Started on premise running our own Kubernetes the hard way and then transitioning over to Google Cloud and running it on top of gke. So that's the high gist of it. And NICE is an abstraction layer on top. Yes. So we have built our own Kubernetes operator, our own manifest. This was way back where this was very brand new, way before Knative. Yes, yes. And it's serving us so well because then back then it was only on premise, which allows us to just transition the applications over to the cloud based environment, add on more features and really not change the way that our developers were working and sort of configuring their applications. Awesome. And so last time, last year I was in a meetup in Bergen because these guys in Austin, now I remember you had to talk about integrating opentelemetry into the nice platform such a way that it becomes abstracted in a way from the developers. Right. Can you talk more about that? So the manifest allows us to have additional add ons as well. So you can have databases on cloud, SQL and storage. And one of the things that we added recently was that you allow auto instrumentation of opentelemetry. So that's been sort of one of my projects and coming in and taking, taking a little bit more charge of the observability part of the platform and then very quickly figuring out that we needed to standardize on opentelemetry. I really believe that this was a standard worth implementing for nav. We have had different observability tools in the past and sort of constantly had to sort of change out the libraries, change out the agent. So this allows us to sort of instrument once and then run anywhere or get the display the data anywhere. So we are using the OpenTelemetry operator for Kubernetes. And then this is just adding on the metadata and the configuration which injects the agent depending on which runtime you have. So that would be Node, Java, Net, Python. Yeah, so that's actually pretty cool because last time I looked into OpenTelemetry and I talked to a lot of people, one of the main complaints, it's very complex. So auto instrumentation essentially means people can just add some annotation labels or whatever and then your runtime will figure out what kind of programming language or. Well, the developers specify that so they have a very, very simple. They just say that, oh, observability, enable, auto instrumentation. And by the way, my runtime is Java and then we set up REST so they don't need to sort of know how the agent functions and all the different parameters, setting the resource name and resource namespace and all the extra attributes, etc. That's all set up for them and it just starts producing open telemetry metrics and then developers can then add additional metrics or instrument their application with the business logic. So this only gets sort of like, oh, you have this number of requests and making traces for that doesn't really understand the business. That's what. That's where the developers need to sort of add on an extra span, add some extra attributes to say that, oh, by the way, this is the core functionality that we're doing, this is the customer or this is the process that's happening right here and just adding, adding that on to that data that's being produced. Awesome. Cool. And so can you tell Us then what's next for, for, for nice. Like okay, pushing it through the government, pushing through other agencies like how are you, how are you approaching that? Well that, that's what's next now trying to make it a platform for more than just nav. Yeah. And that means first of all we have to solve some technical problems. We have to scale out and make it possible to make a proper multi tablet platform where they have different projects in gcp so we have no. So there's proper isolation for the data and then we, we need to figure out how to get through all the red tape. There's loads of red tape sharing stuff in the government it turns out. And the most, most difficult thing that you probably have better than us, we have to learn how to sell things. I don't know how to do that but other people know. We have no idea. We just came there and said well this is good and this is bad and please use our product. Kubernetes 1.27 introduced this thing called In Place POD updates where you can update the resources on a pod, the resource request specifically on the fly without having to restart the pod. And that's very useful for Java applications. Yes, yes. Have you been looking at that? We have definitely been looking at that. Which version of GKE is it? So the capability is in Kubernetes127 so it should be available in GKE. I think right now it's better. Yeah, but then, so that's. So what's available in Kubernetes is being able to update the resources. Yeah, yeah, but you need an operator to do that. Yes, yes. And so what we're doing is we're actually proposing that upstream to the VPA auto scalar. That's really interesting. So there is a proposal. Yeah. And that's coming, but it's not, it's still a proposal. So it has to be merged. Yeah. So currently. So we have built this developer portal where the developers can see their usage and sort of like adjust it. But that's not sort of like the same. Yeah, it's completely different. Yes. Because what we have, we have the same challenge there that we need a lot of resources starting up and then we need to. It would be really beneficial to scale that down or reduce it once the application is up and running. So yeah, we have been looking at it but we haven't sort of been able to adopt it yet. All right, awesome. Cool. Well thank you very much for your time guys. Continuing the exploration of platform engineering, Abdel had a great chat with Andy and Max, these are the authors behind the book Platform Engineering for Architects. And they offered Abdel some insights into the philosophy of platform engineering, why it's considered an evolution of DevOps and how to approach building these platforms as a product by focusing on developer pain points. You guys wrote a book about platform engineering, right? So can you tell us a little bit about it? It's called Platform Engineering for Architects and the tagline is building or crafting modern platforms. And as a product, I think the key point is that you want to build a product and also treat it as a product. That means before we build any type of product, we first want to understand who is our end user, what is their pain point, what problem do we solve for them? Then build something that solves their problem. And then ideally, if you really solve it, they want your platform and not you're forcing it on them. Awesome. So maybe Max, what's, what's. I've been hearing about this thing called platform engineering for a while and it seems like if you, if you, if you believe the Internet, DevOps is that then platform engineering is alive and kicking. But what's platform engineering? Can you give us a definition? Well, I mean it's like the evolution of it, right? So it uses the same methods, the same procedures and approaches, but it tries to get a little bit away from this. You build it, you run it. That's maybe true for the platform itself, but it's more like being in a open environment, in an integration layer for any kind of speciality, you often have like the target users for developer. Yeah, but I mean we also wrote in the book it doesn't need to be the developer itself. Right. With all the tri on AI and so on. It can be also data scientists, it's people around security. But it's also a question of like how do we can optimize the environments for operations. Yeah, that all comes together on platform engineering. Break up the silos, put the silo into some horizontal approach, but open it up. This is the key success factor of it. Yeah, because it seems to me like again, excuse my ignorance problem because I'm not a very expert on the topic. It seems to me like platform engineering is just putting back the barrier the way it used to be back in the days of system administration and application development. Because the whole point is that as developer you have a platform that you deploy on, well, a self service platform, all that stuff. Right. But it's kind of like going back into the kind of upper layer and bottom layer. Is that true? Is that like A fair way of describing it, I think I cannot say something differently about it, but I would say the perspective was more like you open up for any kind of people who wants to adopt it. Right. It's like not like an extreme technical area. You have in some of the development platforms, you have a lot of documentation. You have product owners going into. How many product owners in the past went to the server? 0. Nowadays they can at least have a direct interaction and see what's going on between development, between the operation, maybe what DevOps are sie doing. But they also can see a business perspective if they want to have it. All Right. And so Andreas, you've been talking about. The book is about building or considering or treating platforms as a product. Right. So I assume this means product managers, developers, ops, people, and sort of creating a company within a company. Sort of. Or like a group within a group? Yeah, not a company within a company, but building a product that has internal users. Right. Okay. And I think if you look at here, I think they said 13,000 people are at Kubecon. I think the majority is here for the first time. Which means these organizations that they represent, they probably have started, some of them in pockets, have started with kubernetes years ago. Some of them are experts. Now more and more of the same organizations try to also adopt kubernetes. Now the question is, do they all need to learn about service meshes, about network, about Argo? Definitely. So the question is, do they all need to reinvent the wheel or can the experts that have already built up the knowledge build an internal platform to make it easy for everyone in the organization to do, build, deploy, operate, observe and secure. So abstract away the complexity of kubernetes behind a platform. That's kind of the idea. Yeah. So my question to you is, I think I have another kind of stupid question, Bobby. Does a platform mean you have to have an idp? Not necessarily. I think what a platform needs to have is first of all a pain that you solve. Right. If the pain in your organization is. And I think we have it in the book as well. We came up, we described a fictional company with teams and like, problems. And the first problem we explained is engineers have a hard time getting access to their logs in a production environment. It's a very tedious and long process with a lot of tickets involved in mental work. So how do I get to my logs? Do I need an IDP for that? Maybe, maybe not. Maybe I just built something where I can use Slack, Ms. teams, whatever, and I can just talk With a chatbot that says I am Andy. Give me the logs that I need for this particular problem. So how you solve depends on which problems you really want to solve, which tools and processes you currently have in place and then you try to find a smooth way of making or solving this problem in a self service way and whether it's an IDP or not, that depends. Got it. I like that. Start simple, start with the pain points and then go from there. Cool. So I'm going to ask you one last question and this is question to Max Cloud Native Summit Munich. Yes. When is that going to happen and am I invited? I'm just putting. I'm just kidding. You're always welcome. Thank you. It's happening on the 21st, 22nd of July, so perfect time to come to Munich. Enjoy two days of open source conference. Very, I would say cozy environment to network with everyone there. Yes. And at the same time it's a perfect time of the year for enjoying also the beer gardens. Well, you said that last year on the train, so I. I wouldn't. Yeah, the beer was really good. But German. There's no bad weather, it's just the wrong clothes. I live in Sweden, so I know. That's all right. Awesome. We'll make sure you have the link to the conference in the show. Notes. Thank you very much guys for being with us. Building platforms at scale brings its own unique set of challenges. Amit and Ronak from LinkedIn, who listeners might remember from their recent Deep Dive episode with us, joined Abdel at Kubecon following their talk on LinkedIn's scalable compute platform. They discussed their experiences with operators and CRDs at massive scale and the critical importance of node lifecycle management for demanding AI and machine learning workloads. Hi guys. Hello. How have been Kubecon for you? Very energizing. I always come thinking I'll go back exhausted, but I always go back very energized. Awesome. So you folks were in the show, we talked about LinkedIn, you were our first end user guest and we got a lot of very good feedback. The episode was published like two weeks ago, so it's still too early to kind of figure out the numbers. But usually we look at the numbers a month later to see like how the stream or like how the downloads have been progressing. And we talked about a bunch of things in the episode so people have to go check it out. But you had a talk at Kubecon? We did, yes, earlier today. So what was that about? We talked about how we are building a Scalable compute platform at LinkedIn, managing bare metal servers all the way, how we deploy apps on top of it. Awesome. All using Kubernetes of course. Awesome. Did you share anything in the talk you didn't say at the podcast? I think we went into a little more depth in the talk itself and we encourage people to check it out. But we talked about how we are trying to improve user experience special specifically around failure categorization so that we have less support more. So that's what we're touching. So let's work for you, let's work for us. That's the goal. Awesome. And so Amit, you, you have a reputation for having a thing for operators and CRDs. I mean your blog is all about that. I'm not putting you on the spot. So can you tell us a little bit about like the learnings, what did you learn from building operations? I will say a lot of things that we do at LinkedIn they have to be like fully automated because our scale is so big. Right. So we have to basically rely on operators to do the job for us. So a lot of us are end up building a lot of operators at the end of the day. So as a result scalability of the operators, correctness of the operators becomes top of mind. And Kubecon is a great place to figure that stuff out because all the maintainers of the controller, runtime, API machinery, they're all here so we had to kind of exchange ideas and you know, learn from them. Awesome. Did you, did you folks went to the maintainer summit by any chance? I did. Okay. So how was that? It's the first time in that format, right? Yeah, it was pretty cool. Everyone was there as usual. A lot of interesting breakout sessions, especially around like Node Life cycle, which is also very close and Node life cycle. Oh it's to our heart. We deeply care about how the nodes get, you know, life cycle managed and for evictions, drains and stuff like that. So yeah, a lot of fun there. Yeah, it's actually interesting for me how like if you just come to Kubecon and you look at, you know, the show floor and the exhibition hall and the lightning talks and the boot Talks, everything is AI, AI, AI and LLMs and you guys are still doing like nodes life cycle like stuff, the actual thing that matters. Right. So do you have to, do you handle any LLM workloads like in your day to day? We do, we handle LLMs and what we call LPMs. Large personalization models. Yeah, yeah, both of them. So again the node lifecycle becomes pretty crucial even when it comes to GPUs, for example. Of course. Yeah, people want to launch large workloads with lots of training and when their GPU goes down, they're pretty sad about it. So Node Lifecycle kind of makes its way there. So that's where the interest comes from. Of course, of course. And so I have one more question. I don't know if you saw the announcement we made about MCO Multi Cluster Orchestrator. We heard about the announcement, yes. All right. Did you have the chance to look into it or just not yet? All right, so I'm excited about that one. So it's essentially Multi Cluster Orchestrator as the name indicates. Right. And it's like an open source tool for managing multi cluster that we open sourced. So I'm excited to see how that progresses. Pretty cool. It looks pretty cool on the surface. We would love to check it out and definitely we'll be in touch. Yeah. So maybe in the year from now when you actually get time to use it, we'll bring you back on the show. Definitely. We can share more feedback. Awesome. Thank you very much guys. Thank you for being on the show. Thank you for having me and yeah, thank you for coming back. Amidst the Kubecon whirlwind, Abdel and Mophie took a moment to discuss the significant interest in running large language models or LLMs on Kubernetes, including the why and the how for auto scaling these workloads. They also touched upon some exciting new developments with the Gateway API, particularly an inference extension that could reshape our interactions with AI models. So many things coming together at very last minute. I feel like, like again yesterday was a lot of bunch of colo events. So a lot of people are a jet lag, be tired from the colo events and then running into this big operation. If you have been to Kubecons before, by the way, this is the biggest Kubecon so far. 12,500 people. So about 15,000 people. 12,500 people all in roughly about I think 14,000 people maybe. So yeah, like first day, I don't know if you can hear this, but right behind me is a lot of like trumpet playing. So I don't know how that is translating over there. Yeah, I don't think we can. There are people dressed up like British soldiers with like those like very funny long hats and they are playing trumpets, which is definitely very loud. So we did our talk today. We did, we talked about running LLMs on Kubernetes. Yes. We posted a post about it. And we got a lot of questions about why would you want to do that? Do which one? Run LLMs or Kubernetes. Okay, right. So can you tell people why would you want to do that? Yeah, so I think it's a matter of. So okay, so for most people, right, like if you're trying to build an application, taking an off the shelf LLM like Gemini GPT or Claude works fairly well. You could just like pretty much run your application using those API and scale up, scale down PayPal token. But for certain use cases, for example, if you're running in a situation like you are financial company or government or healthcare, you can't really send your API request over to some third party company like Google Anthropic or OpenAI. In those situations you want to keep the model located in your own data center, have all the control over running those models. Some other use cases could be is that when you pay per token, as your research goes up, your cost also goes up. But as you slowly get to the point where you can basically pre provision all this resource, run it yourself, your cost is limited. Then you can increase your usage without increasing your cost. So these are probably the biggest reasons you would want to run your models yourself. Awesome. There was quite a lot of questions coming during the session. We talked about running LLMs or Kubernetes. We showed this awesome demo that you showed which is running like 11 LLMs, 1010 with like a single, a single like UI and asking it knock knock jog which hopefully did not say anything bad. So I don't think it's going to be looking bad on the recording. And there was questions about resource optimization, there was questions about auto scaling. We didn't really get to cover that in details. Can you briefly touch on auto scaling specifically what optimizations Kubernetes has for auto scaling LLM workloads? Yeah, so usually when you are like auto scaling Kubernetes workload you are looking at CPU and RAM usage. You could also look at custom metrics like number of requests. But when you are doing LLM workload, number of requests does not really necessarily tell you how much resource usage you have currently because some of the LLMs you can a single request can ask for up to 32,000 token a million tokens. So you could have that wide range of how many tokens being generated. That is kind of a hint how you can use a custom metric like tokens per second as a metric, how you should scale up your workload so that you are kind of getting the Most token generated for your user. So instead of looking at CPU usage or number of requests per second, you kind of flip the script and start scaling based on tokens or GPU usage because that is the bottleneck. And in this again GPU is one of the examples. Tpu, same idea. You want to scale on GPU and TPU usage because that is your bottleneck in terms of scaling. All right, so good. There is one more topic that I'm going to pretend to ask you a question, but then you ask me a big question so I can talk about it. The inference extension, the Gateway API inference extension. I mean passing it back to you, you tell me what an inference extension is supposed to be. Thank you for asking me that question. Yeah, so we talked, I did a talk about the inference, the Gateway API inference extension, which is an extension of the inference the Gateway API to be able to do inference. And specifically it has capabilities around multi modality. For example, you can set up a single load balancer that routes traffic to models based on what the user is trying to do. Are they trying to do text summarization or trying to do like video or picture or stable diffusion or whatever. Then there is the second thing is like model based routing. So you send a request that you say I want to talk to this specific model and then the gateway knows exactly which backend to send your request to. And then there is all the stuff we talked about which is getting custom metrics out of those inference servers and use those to make like intelligent routing decisions. Right. So this is an early access, early access. But second early work. There is like a spec, there is a website, you can go check it out, the talk will be recorded so you can check it out later on YouTube and then yeah, I'm excited actually there will be a lot of interesting things coming out of that. Kubecon is always a fantastic opportunity to get updates on the foundational technologies that underpin everything in the cloud native world and to hear about the future direction of the Kubernetes project. Abdel had the chance to connect with some key figures who are, who are shaping exactly that. First in this block, Abdel spoke with Ivan Valdez who shared the news that he has just become a co chair of SIG Etcd. Ivan gave Abdel an update on the much anticipated ETCD 3.6 release and the brand new ETCD Operator which is set to simplify running standalone ETCD clusters within Kubernetes. Can you introduce yourself really quick? So hi, I'm Ivan Valdes, co chair of Siget CD as like last week. I think we. Everybody should know this by now, but if you're not aware, Etcd became a SIG about a year and a half ago. Yeah, like year and a half. Two years. A year and a half. Two years. It used to be a standalone project at the cncf and now it's a sig, a special interest group. So what's new in ecd? What's going on? So the big news, and hopefully everybody can join our talk on Thursday in the maintainer track. But the idea is that we want to finally release ETCD 3.6. Yeah. Last version of ETCD was 3.5. That was released four years ago. So it's not even a major version, it's a minor version, but it has been. So the problem with the. Well, one of the issues with Dead City team is that it's a small team and so there has been historically a lot of rotation with the maintainers. So it's very difficult to do a minor release. So that's basically what we have been working on. And also we are releasing, we just released on Monday ETCD Operator version point 1. So finally ETCD is going to have its own official operator. And at this point it's like very alpha. It's just kind of like a toy project. Okay. But we are actively looking for contributors because we are. We definitely need more contributors in the ETCD operator. One more question. So that's basically what has been like. Those are the big news from the form I received. So ETCD operator, is that an operator to run ETCD inside of Kubernetes? Yes. So it's an operator to run ETCD inside Kubernetes. But of course it's not the data store for kubernetes, it's for the use case if you want to use ETCD standalone inside your own Kubernetes cluster. All right. Because like last time when we did the interview about the SIG etcd, one of the use cases I remember is that Cilium, I think for some of its requirements or for some of the functionalities of Cilium, it requires etcd. So you run ETCD standalone inside of Kubernetes if you need it for something else. Right. So it's not the storage layer for Kubernetes itself, Right? Yes. Okay, cool. So, all right, so that's pretty cool. And my last question, how was Kubecon for you so far? So far so good. On Monday we had the Maintainer summit. It's. Yes, it's cool to meet with all get like Put a face on names that you don't know from the Kubernetes team. And also now it's open to all to other CNCF teams, CNCF projects. So it's, it's a nice experience to just meet and like say hi and understand and like, like talk about your, your own issues. Because we're all like solving our. So we think we're solving our own issues but in the end everybody is like doing something, something similar. Yeah, in my field that I specialize in CI CD tooling and that. And then yes there were the co located events, some interesting talks and then today of course is the main event and so far it's been great. All right, well there is two more days to go and we're going to be hanging out here. So first of all, thank you very much Ivan for jumping last minute. Thank you for being on the show. For a broader perspective on the Kubernetes project, Abdel sat down with Dzego McLeod, who leads open source Kubernetes engineering at Google. Let's hear Dzego's insights on the overall health of the project, especially how Kubernetes is evolving to support the new wave of AIML workloads and his thoughts on how AI agents might simplify Kubernetes interactions in the future. Can you introduce yourself? Let's start there. Hi, I'm Djego McLeod. I'm an engineering director and I work on open source Kubernetes and gke. I lead open source Kubernetes at Google. Yeah, he makes Kubernetes happen, so he's very humble. My team makes Kubernetes happen. You just tell them what to do. Awesome. So I think it has been a while since we had somebody on the show kind of talking about the overall Kubernetes open source side of things, but kind of like an overview of the project, not just because we tend to talk to people who are working on very specific parts. So how is Kubernetes? I know this is a big question, but how is Kubernetes going? It's going great. A year ago I was a little nervous. I wasn't sure in the face of this new tide of AI and ML workloads, if Kubernetes was going to retain its dominance. And we've been working really closely with the schedmd folks who support Slurm, the RAY community and the Run AI community. So there are all these like other schedulers that run these higher level frameworks. A year ago they were succeeding in spite of Kubernetes and now we've sort of evolved Kubernetes to be be more aware downward facing of the hardware. New evolution in hardware accelerators are really different than CPUs. Yep. And upward facing on these frameworks that are pretty special purpose. So it's a super exciting time. I'm. I think the momentum is really showing that Kubernetes runs all of these big AI workloads and it seems to be gaining even more momentum. So it's exciting. It's super fun. Awesome. So my follow up question would be like the way to describe it. This is amazing way of describing it. Like the downward facing and the upward facing. I think Kelsey Hightower is quoted for this phrase. He said at some point Kubernetes is going to become the platform. Do you think that that's what's happening? Like where Kubernetes is just an API that people assume it exists and we build on top of it. Is that where. The way I like to think about it is like there was this hourglass model of the Internet idea that emerged about 40 years ago where IP was kind of the center of the Internet and you have different protocols run on top and different physical layers below. And I think Kubernetes has emerged as the narrow waste of the hourglass model of infrastructure management. So there are all these frameworks that run on top. So I think there are maybe multiple platforms within that platform that we used to describe. Got it. We kind of embraced the idea that Slurm is really great at HPC and there are users who have used and will continue to use Slurm no matter how good Kubernetes gets at batch and HPC workloads. So I think we need to embrace that idea that there are APIs and user experiences that people are attached to and should continue to use. Right. Awesome. We wanted to go through and use the underlying hardware in the same way as others do. Got it. So you spoke of Slurm. I spent a bunch of time with Daniel Martini from Google writing a Slurm guide for gke and that was definitely an eye opening experience because I've never had to deal with HPC type orchestrators. Right. Which is what Slurm is. Right. And so I think my question to you is, I know this. We're doing a lot of work with tools like xps XPK to make to abstract away Kubernetes for people who doesn't have to deal with that. Right. Do you see like more kind of tools like this where either we're making A tool that uses Kubernetes under the hood without exposing the complexity of Kubernetes to data engineers. Specifically because we tried that with Knative for apps. Right. Kind of the same idea. Or where we are integrating tools that people are familiar with like Slurm into Kubernetes. Slurm is just an example. Ray is another one. Do you see we'll be doing more of those going forward? I think we will. And I think that the intermediate layer is more often going to be AI agents. Like if you use cursor at this point, you never have to go to the gcp, the UI or necessarily even use kubectl to manage the Kubernetes cluster itself. It's kind of a byproduct of what you're trying to accomplish. And so some of those sharp edges start to fade away if it's enough, if it's abstracted at that layer as well. So I think we'll see a few different models that converge over time. Yeah. But it's really exciting. Awesome. Thank you very much for coming on the show. Thank you, Jigo. One of the real highlights of any Kubecon is discovering the incredibly diverse and sometimes surprising ways people are using Kubernetes. And of course, it's the vibrant community and the individual tools within it that make these events so special. Abdel's final set of conversations from the show floor really capture this spirit. In this interview, Abdel chatted with Clement who by day works on the Kubernetes platform for Post Finance, a Swiss bank. But in his other life, Clement lives on a farm and has ingeniously used Kubernetes and Prometheus to automate his family's milk dispensary and monitor their cows. He also shared some insights from his banking role about migrating from kubeadm to cluster API. We have Clement on the show. Can you introduce yourself? Clement? Yeah, sure. So my name is Clement, I'm a Swiss, Swiss software engineer working at Post Finance on the Communities platform and I live on a farm with my wife. She's a farmer. I live amongst the coast, which is quite fun. And yeah, so last time I met Clement was in Switzerland for KCD Zurich and you've been telling me this is. My first question is off topic at all. So you, you, you live on a farm and you produce milk and you had to automate your milk dispensary. Well, true, well. Using Prometheus. Right. Can you talk about that? So the basic idea is that we have a self service shop and we want Customers to be able to get some milk from the farm. Raw milk. And so we built the machine from scratch with some mechanical devices, electrical devices and everything. And then the question is when there is no milk anymore, you want to get an alert for that. So I installed a small lidar so a small laser meter so I know exactly how many centimeters of milk there is remaining in the. In the boil. Yeah. Then they received my wife and was a family. They receive a telegram alert through Grafana through Victoria Metrics. Metrics. Yeah. About the, the level of status on the, on the machine. So people always have milk. Awesome. Is there any kubernetes behind that? Of course, it has to be. What is the Kubernetes cluster running at home? I run a four node cluster at home which is super reliable actually I run Dallas Linux on it and works quite well. And so yeah, that's how I do it. So Kubernetes operators, then a few agents to scrape the metrics and then I can gather some data and producer that's useful at it. All right. So it's almost like an edge case actually. Like it's kubernetes on the edge kind of situation. Yeah, true. Yeah, yeah. So it's, it's. That's pretty much what you're running like a Kubernetes cluster on the edge connected to just locally and the local network of the farm. And I gather data. Well this time it was from the milk dispensing machine but I'm also gathering data from the. The coast production. Oh all the codes. Well we have 65 milking coast and they produce milk. Yeah. I want to know how much milk the cows produce and you can get alerts if the coal is going low or. Oh wow. Okay. I didn't know. Okay. Connected cows and also with graph and dashboards so we can check the production live. Awesome. And so so in your daily job, which is not the farm job, you are running the Kubernetes platform for post finance, which is a bank, right? Swiss bank, yeah. So can you tell us a little bit about what kind of work we are doing there? Like. Yeah, sure. So we, we basically provide an open source vanilla communities platform for all banking applications that are then deployed on the cluster. So the work is really about provisioning it from the get go. We start with nothing. We have provision, configure load balancers, terraform, we provision vms, we install kubeadm and then we start to configure the clusters. That's what I do during the. The day. Yeah. And you told Me, you're moving towards cluster API, Right? Right. Yeah. That was actually the topic of my talk yesterday. Migration from kubeadm to cluster API and the live migration because otherwise it wouldn't be fun. And I also have. Which didn't break, which is also good. Awesome. So the idea is that really we have this old Kubernetes cluster that we want to get rid of or we want to extend those clusters with new cluster API nodes so that then after some months of stable testing, we can remove the old QVDM nodes and then we have migrated to a much simpler streamlined solution to manage our clusters. That's the idea. All right. Are you planning to like make this open source or like open source this work? Yeah, like the guide for how to do a migration from qdm. There was the talk yesterday and I think I will write something on my. On my blog. On my blog website as well because there was quite some demons, many questions after the talk yesterday. So I think it would be quite helpful to. Yeah, I would assume there will be a lot of interest for people moving from a CLI based tool to like an API based tool. Right. So like showing people how they can do that would be probably useful and without downtime because that was also the key driver. We have a lot of applications running on our clusters. We don't want to migrate 500 applications, we want to update in place one cluster. That's the idea. Awesome. Awesome. Well, thank you very much Clemo for coming. Thanks for talking to us. Thank you. Finally, Kubecon is a gateway for so many into the cloud native world. Abdel caught up with Nick Taylor, who was attending his very first Kubecon. Nick shared his journey of pivoting into infrastructure in Kubernetes, his learning process and his initial impressions of the conference and the community. A perspective that I'm sure will resonate with many of you. I am here with Nick Taylor. Hello, Nick. Hey, Abdel. Thanks for having me on. How you doing today? I'm good, how are you? Pretty good. I'm starting to get a little tired as well. It's like we talked about briefly. It's my first Kubecon and we're at a sponsor booth as well. So I've been working the booth a lot this weekend. Busy and it's fun, but it does take it out of you. So. Okay, so yeah, you told me yesterday that you were. It's your. It's your first. It's your first Kubecon. Every year when there is a Kubecon, especially the big ones like the European one, the U.S. one. Yeah. There is a report that comes out and every year it's always the same thing. It's more than 50 of the attendees. It's their first time. Okay, so you are this year in the probably 50%. Yeah, yeah. So can you tell us a little bit about how the experience. How like, how was it for you? It's been great. It's like for some context, I've typically been an app developer and I've pivoted into infra and security. So it's literally my first kubecon. I am brand new to Kubernetes like green. My past experience with kubernetes at a startup was a front end dev restarting pods. Sometimes that's like the extent of my kubernetes experience. So I'm just really excited about it and it's a lot to take in. So you switch to the dark side of the world. So can you tell us a little bit what have been your learning process? Like this is something we get all the time. People who are new, they just ask questions like how do I get started and where do I find things and how do I learn? Yeah, well, I'm fortunate enough to work with a lot of people that have some really good kubernetes experience. But I think the way I've been approaching it is just, you know, not doing it the hard. I know there's Kelsey Towers Kubernetes the hard way. Yeah, I've started to look at that on my local machine. I've. I've started to install K3s just to get a cluster up and stuff and. But it's, it's still literally early days for me. So like, you know, I'm starting to look into the ingress controller and stuff. There's like stuff that I'm sure everybody here is pretty familiar with, but it's like really green for me. So it's. I don't know, it's fun kind of being in the unknown. I'm always comfortable getting uncomfortable. So it's just been exciting to dig into these things. Awesome. So, yeah. So K3s, the Kubernetes hard way learning, the ingress controller, these are all good resources. If I may add, there is Kubernetes up and running, which is a really cool book and I think the documentation and also certifications are pretty good resources for learning. Yeah. All right. Did you have any talk or any sessions or. I've mainly been working our booth, but in my time off I've been actually focusing a lot on meeting up with people because I'm new to this community. But I did, I was able to catch some keynotes this morning and it's kind of funny, I never thought of places where Kubernetes would be. So I saw the Oracle talk this morning from their SVP and hadn't occurred to me that F1 racing would have Kubernetes in it. It makes sense, but it's just. I think I always associate it with tech and software or dev tooling. I haven't really associated it with real world stuff yet. So it's kind of cool to hear that. It's like Kubernetes is powering like a pit in F1 racing or something. Yeah, I mean there are a bunch of use cases we have seen in the past like F16, the fighter jets running. Running. Not on the jet but like the software behind it. Running Kubernetes. Yeah, we've seen like on boats or on vessels. Like some of these cruise companies are actually using Kubernetes on their vessels. So yeah, I think Kubernetes gets. Gets sneaks into places you wouldn't. You wouldn't expect. Right? No, totally. We actually had somebody come to our booth yesterday and they have a completely air gapped Kubernetes running in a submarine. Oh yeah. And I was like, it just hadn't occurred to me, you know. Yeah. So to your point about boats and ships, like it's pretty wild where it can be, I guess. Yeah, yeah, yeah, exactly. Awesome. Well, thank you very much, Nick. Thanks for coming on the show. Thanks for having me. Thank you to Mophie and Abdel who's not here with us but for. For doing those interviews at Cube County. You. I didn't get to go this year, so I'd love to hear more about how the event went. Tell me about it, Mophie. Yeah, I mean this was the biggest Kubecon to date. Kubecon year 2025, I think had roughly about 13,000 attendees or more than 13,000 attendees. You might have the actual number which you can put on the notes later. But yeah, it felt I could probably ask someone for it. But the transparency report will come out later which will tell us exactly how many came out. So yeah, I haven't seen that yet. One of the things they had done this year is that the show floor was actually split, split in half across either side. So the venue itself had the show floor divided in two sections. So even though we had a lot of people, some people actually talked about how it felt a little emptier because again it was spread across two whole section. But again, that was a pretty. Well, I think it was a good idea because too many people in the same place would have been crowded. But it was a lot of walking for the few days. Every Kubecon needs a lot of walking. You're just like walking everywhere. But other than that, I think the big thing for me, and not necessarily would probably not come as a surprise to anyone, is that AI had a big presence even this year. A lot of the keynotes talked about AI. A lot of the keynotes talked about how AI can be done better on Kubernetes and a lot of other CNCA projects. Also in the show floor in the project pavilion, there's a lot of projects that were either showcasing running AML or using AI for some sort of a benefit, either in the CI platform or code generation or monitoring observability. So a lot of these things were tooling that are being built around Kubernetes and the sub projects in CNCF that are using aml. So if you walked around the show floor you would see, see a lot of the word AI was everywhere this year as well as is to be expected. Yeah. And so we also, I mean last Kubecon, Kubecon North America, we announced the 65,000 nodes on a single cluster. I actually got to do a demo on a live cluster this Kubecon to showcase running batch workload. And these kind of features are also being built with AML and batch in mind. So a lot of the new features that are coming out of Kubernetes, the project as well as a lot of the vendors are also kind of in service of running AML and batch workload. So it is. Some of these features are obviously making the project itself better because this kind of stress testing a cluster at that scale is not something we do usually like the Kubernetes project. What we test I think up to 5,000 nodes in the open source. But to allow for massive training like this, now the entire project is having to be. We are finding out some edge cases that otherwise probably wouldn't have been found. So even though the purpose of this is to enable AI ML use cases, but I think the project itself is getting better and more robust because we're trying to test for these hyperscales. It's the Pokemon Go story all over again. Yeah, yeah. So anytime the kind of like a moment like that happens and everybody's trying to do something that the tool was, I'm not going to say not ready because Kubernetes is one of the biggest scalable thing in the world. But there still you can find the upper limit of certain things. Then we push that boundary and make it better. This is why we test in production, right? Yes, absolutely. But also this year is going to be the first or second year, I think the first year with five official Kubecons. So. Yes. Yeah. So basically while we are. Yeah, EU, China, Japan, India and NA. NA. Yeah. So basically in a given year of 12 months, you are roughly having one Kubecon every two months and one week, which is fantastic. A lot. It's a lot for sure. Yeah. Every. Every nine weeks there would be a Kubecon basically, which is more chances for announcements spread out throughout the year and involving more different regions too. So I'm curious to see if we hear about different things from the different regions as India and Japan especially get spun up. Yeah, I think so. Basically Kubernetes has three releases a year and you're going to have five of these Kubecons who are not going to be able to coincide a release to a conference as much anymore. Also a lot of the maintainers and the organizers are involved in constantly organizing the next one. I hope it doesn't cause any sort of burnout because organizing a conference is a huge undertaking. So from a nonprofit, how could that happen? Yeah, yeah, nonprofit and a team of volunteers that kind of like work in making this happen. Maintainers have to like create up. So it's almost like a pressure to have some sort of an update to share on every Kubecon release. But yeah, so hopefully that doesn't happen. But overall I think Kubecon EU in my experience at least was a great event. There was a lot of great talks that I got to attend and speak to people. So that's kind of the main goal for me, to just speak to the attendees and hear how people are doing and dealing with Kubernetes. So yeah, it was a strong few days at Kubecon and for us, the GKE team, we were also traveling before Kubecon for some container events that we have in the area. Not just like London, we actually had a couple of events in Norway and as well as Sweden. So we had basically a two week roadshow of a bunch of things. So it was a lot of talking, a lot of learning and excited for the next installment of Kubecon, wherever I get to go. Yeah, a lot of folks forget that there's so many co located events that happen on site at Kubecon and when companies can, they'll also create other events kind of in the local area around the Kubecon. So Kubecons bring in a lot of stuff. A lot of stuff goes on around them. Absolutely. And it's so great to hear from the community as we did in this episode. We hope you enjoyed hearing from folks at Kubecon Eu. Thank you Mophie Thanks Kazlyn. That brings us to the end of another episode. If you enjoyed this show, please help us spread the word and tell a friend. If you have any feedback for us, you can find us on social media Kubernetes pod or reach us by email@kubernetes podcastoogle.com you can also check out the website@kubernetes podcast.com where you will find transcripts and show notes and links. To subscribe, please consider rating us in your podcast player so we can help more people find and enjoy the show. Thanks for listening and we'll see you next time.
