
Modern software platforms are increasingly composed of diverse microservices, third-party APIs, and cloud resources. The distributed nature of these systems makes it difficult for engineers to gain a clear view of how their systems behave,
Loading summary
A
Modern software platforms are increasingly composed of diverse microservices, third party APIs and cloud resources. The distributed nature of these systems makes it difficult for engineers to gain a clear view of how their systems behave, which can slow down troubleshooting and increase operational risk. Groundcover is an observability platform that uses EBPF sensors to capture logs, metrics and traces directly from the kernel. Critically, Ground Cover runs on a bring your own cloud model so all data remains within the user's own environment, which gives increased privacy, security and cost efficiency. The company is also focused on adapting to how AI generated code is changing. Observability code can now be produced at superhuman speed, which increases the challenges for reviewing code before it enters production. This means that observability is likely to play a growing role in code validation and providing guardrails. Yehezkel Rabinovich, or Chez, is the CTO and co founder of groundcover. He joins the podcast with Kevin Ball to discuss his journey from kernel engineering to building an EBPF powered observability company. The conversation explores the power of ebpf, the realities of observability in modern systems, the impact of AI on software development and security, and where the future of root cause analysis is headed. Kevin Ball, or K. Ball, is the Vice President of Engineering at Mento and an independent coach for engineers and engineering leaders. He co founded and served as CTO for two companies, founded the San Diego JavaScript Meetup, and organizes the AI in Action discussion group through Latent Space. Check out the Show Notes to follow K. Ball on Twitter or LinkedIn or visit his website K. Ball LLC.
B
Hey Chaz, welcome to the show.
C
Hey, thank you. Nice to be here.
B
Yeah, I'm excited to have this conversation. Let's maybe start a little bit about you. So can you give our listeners just a little bit of your background and then how you got to groundcover where we are today?
C
Yeah, sure. More than a decade in software engineering specializing Linux and distributed systems. So mainly worked on kernel models until I got sick of it and then kind of got in love with ebpf. Then we founded Ground Cover, which I'm the CTO and co founder and that's what I'm doing for the last four years.
B
EBPF is really cool. We did another episode about that and I hadn't been exposed before. My previous Linux background had been 15, 20 years ago and then coming in and being like, wait, you mean to integrate with the kernel? I don't have to go through this arduous patch Process and kernel process. It was mind blowing. So actually let's maybe even start a little bit there. So how are you utilizing EBPF for ground cover?
C
Yeah, so as you said, after a few years with kernel models, you kind of fall in love with the power of extending the kernel, which is very cool, but it's also very, very hard. The development cycle is so slow because any mistake you do basically can crash the system. So after a few years as the R and D manager and leader, me and Shah, my co founder, kind of realized there is a big problem with instrumentations. You know, we all know the SDK instrumentation. You have all the classic vendors, you have autos, the open source standard, but still you have to instrument your application, right? And it sounds very easy, you have a lot of tutorials, but in real life it's very, very hard. Most companies has a lot of different runtimes, different versions, and you need to keep on track on all those SDKs. So why won't we do it with EBPF, which basically let us instrument the application from the kernel side without any risk for the application itself? You run in a sandbox, you have the kernel verifier that will not load you if you're doing something wrong or that can potentially happen the application, but you still get 95% of the value. So we can inspect any syscall, we can intercept HTTP requests, SQLs, Ready S calls, whatever you're doing, we can probably see it. So maybe leads to groundcover, which is an observability company, I would say more than observability company. We utilize EBPF alongside with classic instrumentation. But our main sensor is based on ebpf. So you deploy the sensor in less than a minute. You get traces, logs, metrics, everything in one place without changing your code. The other cool thing that we do is our backend is based on bring your own cloud, which means your data stay inside your cloud account. Very nice in terms of privacy and security and also allow us to reduce costs because we're not charging by volume. And your customers usually happy with it?
B
Yeah, for sure. Well, one of the things that you alluded to, there is an area I think we can dive into a lot. So you said a modern observability company. And I think a lot is changing right now in terms of how we build applications, how we deploy applications, how we need to be observing them. So what is needed for modern software development from the observability side?
C
I think nowadays average platform is so complex with integrations, third party cloud resources, different SaaS, vendors that you're using for feature flag or hosting, and the average engineer kind of struggle to even know what components the platform is rely on. So that's where classic instrumentation fail. Right. Because it's like the unknown. Unknown. Right. You don't know what you don't know. And this is a nice experience we see with customers that deploy the sensor for the first time. All of a sudden they see those links between their application to third party. It could be even something very mild or very small like fetching an avatar from a third party website that they didn't even know or things like that. So I think nowadays the basic of modern observability is to have all the information in one place. You'll be surprised how many of our customers before they use ground cover used five, six, seven different tools. So their signals were across different platforms and just a correlation between that. It's so hard. So I think having all the data in one place, this is the very bare minimum of observability platform in these days. We're Most companies have 100, 200, 300 microservices and the number is just increasing.
B
Yeah, well, and the fact that you're able to gather all that data without the engineers having to add the instrumentation to their code means. Yeah, you're able to capture those unknown unknowns. Because if an engineer didn't think of it, if they had to instrument it, it wouldn't be there.
C
It's 100% visibility on what your application is doing. It's very nice and very mind blowing for the first time.
B
Yeah, well, and I think there's another big trend which makes this very interesting, which is it just feels like with in particular, like the way that software is changing within the AI development world. Like the volume of change going on is so high that being able to even keep track of like what's going on has just gotten harder.
C
You can see it even from how the development lifecycle changed. You know, a few years ago, not that many. It would take you a lot of time to write the code, something that product could explain in a few words. It just takes time. You know, engineers optimize for typing speed. Right. There are competitions around how fast can you type. Currently that barrier basically does not exist. Any software engineer can basically write superhuman code. Speed, velocity. And you know, this is not no longer a barrier. You basically can print unlimited lines of code in minutes. Are they doing what you think they're doing? Maybe that the architecture that you planned to be maybe. And how do you code review it.
B
I am feeling that pain tremendously right now. So what's your answer?
C
How do you code review it with an AI? Of course. No, I'm half joking, but to be honest, like obviously at Ground Cover we, we use AI to write code. We use AI to do code reviews. We even. One of our engineers just wrote a utility that use AI to create the pull request with. In its. With Ground Cover flavor, right? So it collects the information from the ticket, maybe the figma that correlates to that ticket, and basically generate an PR with AI and that the engineer could just edit the last mile. So obviously the short answer is of course we use AI for code reviewing. But at the end of the day, engineers still need to be accountable on the software that we ship. So I personally think that testing should be very, very mindful because I've seen tests that are actually making sure there are bugs because the tests are also written by AI. The code review in the AI era for me starts with first of all, let's look at the test. That's been true forever, but with AI it's just sometimes impossible to read all the code and maybe sometimes it's unneeded. There are different software and different requirements. For instance, me personally, I just wrote a very simple library that can pass metricsQL to logical representation. We integrate that in the platform. Currently this was not exist. I was shocked that this does not exist in TypeScript and I wrote it and I don't know TypeScript. Never, never coding TypeScript. So I manually crafted all the scenarios. Half manually, right? I described all the scenarios and then press tab. But I manually crafted the scenarios that I think logically would challenge the platform, the leap and then follow on the implementation and seems reasonable and I've never read the code, but for this kind of library, it doesn't really matter, right? It's simple input of string. The output is very simple. There's no side effects for using that library. It's not something in the hot path of the platform. So that makes sense. But on the other hand, if we implement a new parser inside the sensor which runs high throughput, zero locations, code itself matters. You have to know how much do you allocate? Do you have any memory leaks? So I think the question is very dependent on the context of what are you building? And that's something that we at Ground Cover started to differentiate. Like, let's think, what are we checking now? Does the code matter? Maybe it's not. Maybe we can replace it in a week and it doesn't matter.
B
Yeah, I mean, I think to some extent what you're describing there reminds me of a metaphor I've used before, which is increasingly the codegen is essentially like a compiler, right? Like, when was the last time you read the binary that a compiler for your code generated? You didn't, probably, but you did check. Did it do what I expected? Did it pass the tests? Does it behave as I anticipate? And for a lot of code now, that's essentially what it is. Does it pass my tests? You described a few other things that might matter, like performance, functional pieces. How do you validate those in a world in which agents are building all of your code?
C
Yeah, I think that the difference between transpiling maybe compiling C code to assembly is you don't miss out on the architecture when you change C language to assemble, but when you only test for input output, you do make sure that that piece of software does what it needs to do. But you don't really, you can't make sure it does it in the way you want it to be. And it's also not deterministic. And as compilers, which tend to be very deterministic for the most cases. And I think it is different in that. What about the architecture? How does it fit with future features that you want to integrate? I think when you talk about architecture, the comparison between classic compilers doesn't represent it well enough.
B
Yeah, well, that's definitely true. Or at least you need a new source code, right? You need technical documentation or tests or some sort of validation. But yeah, no, you're right, not all the metaphors align. I think this does though get you into this sort of interesting thing with what you all are doing, which is tests are one form of validation. Reading the code is another form of validation. Another form of validation is like, what do the logs say? Is it allocating memory? Is it performing? Is it doing all of these different things, like having some sort of feedback loop between the actual running code and whatever's writing it, whether it's an engineer or an agent, is extremely important.
C
Yeah, the problem with LLM that it can lie, right? It's statistics. So I don't trust logs that being generated by LLM because, you know, it just can make up logs. Maybe it refactored that code and forgot to rename that variable. Maybe it thought it would write it and it eventually did not. So I personally less rely on logs and also markdowns or cursor rules. I less tend to use it because I think if you want to embrace AI with the options that it will fail. You have to put some guardrails that are very deterministic. So you mentioned tests. Tests are brilliant for that. Tests usually don't lie. I've seen cases where, you know, AI kind of inject every scenario in that code base to falsely pass the test. But it's, it's very rare. It's very rare. And I anticipate we're going to see it less. But I think clinters should be very fashionable now. Like, I think we should have more complex rules to make sure the complexity of functions is something that we can live with, or even conventions are something that we want to enforce. Because eventually you are going to read that code. There is a chance you're going to read that code. We have to make that assumption. Because end of the day, when you wake up 3am, something is wrong. You can tell anyone that you're trying to craft a prompt to fix it, but it's your responsibility that you're not sharing with that AI bot. Right. And we're still relying on humans at the end of the day for the future, for the near future. It still looks like it. So I personally think EBPF comes very nice with this because EBPF will tell you the truth.
B
This is the HTTP, it's not dependent on your code. This is what happened.
C
Exactly. We even start to instrument our testing environment, getting the traces back to the AI and saying, look, this is what happened. What do you have to say about that? But you have to create guardrails that rely on a solid ground truth.
B
And it's a good observation for why your observability should be separate from your code base itself, because you don't want the LLM to write lies into it. That's really cool. Now, a challenge I've seen before with observability is it can be very verbose. Right. There's a lot of stuff that happens in a modern system and you get thousands and thousands of lines of logs or what have you from relatively simple interactions. That's one challenge on the pure data storage and transfer side. But if we're feeling reading these back into LLMs, there's a context management challenge too. How do you all think about that?
C
Yeah, I've seen a lot of people saying, hey, let's just send all those logs to, you know, OpenAI and they will tell us what happened. And then you realize in that five minutes there were 20 million logs. You're right. And when we introduce AI based code, it actually increases the data that we're sending so that even yeah, just make things worse. What we're heavily trying to do at groundcover is being able to summarize data on a stream aggregation fashion to basically represent trends or patterns. So for instance, logs are the simple use case of, you know, you have 50 log lines. And if we can nail the patterns, we can actually convert them to kind of time series. Time series is very compact. Any AI agent happily look at a graph and say, oh, this is interesting. And then you can narrow it down to a service or timeframe that will allow you to dive deeper. So this is one way, this is maybe the simplest signal to, to summarize. But we're also doing it for APIs. We create baselines on interactions. This is a bit more tricky because the dimensions on what is the same communication pattern is very complex issue, especially when you look at it from network perspective. Just to give you an example, think about query paramount. That kind of generate 1 million different routes. You need to understand that those are the same API and this is a variable. That's a very simple example, but we are trying to compact all those signals to baselines. And then this is something that you can speed the AI for trends and then narrow it down until you can find the raw data but still manageable to send to an agent. Another interesting idea is to look for other signals, for instance, change management. So think about image change, something happened. This is probably a good place to start looking around that time. And you basically narrow the time and with that you narrow the context needed.
B
So if I understand a little bit and just to kind of replay back to make sure I'm on the same page. So you've got kind of, you're looking for a set of patterns, the ability to build a baseline, the ability to kind of pull in any sort of external information, like when an image changed to sort of show here's a relevant area and a high level description. And then you sort of expose some sort of explorability so that the agent that you've passed that off to can say, okay, great, this looks like it's a problem. Let me give me more data for this spot.
C
Yeah, yeah, exactly. And you can also go wrong with this process and then go back because sometimes every engineer knows that, you know, you see some suspicious log and you, and you think you found it. And two hours later someone's coming, someone comes to come to the office and say, oh no, this is, nah, this is not the issue. I know this. And the same thought process will happen to AI. It's an iterative process. But to make it efficient, you have to start with some kind of patterns or time series represented signals, and then.
B
Do you expose those to your LLM via an MCP server or how do you approach that?
C
Yeah. So Ground Cover was one of the first observability vendors that exposed MCP. It actually, we started before MCP had the OAuth authentication and we were almost ready to release and then this got announced and then we recreated the entire authentication mechanism. So that was fun. Yeah. And while we worked on the mcp, we learned those kind of stuff. At the beginning we were very, very naive. We thought, hey, we have the swagger, the open API, very simple. We're just going to convert it to mcp, which basically it's another HTTP server. But eventually we realized agents are not that good with reading open API docs and using it, especially if you have.
B
A very large API. Right. With a lot of different things.
C
Yeah, yeah. And you can imagine that our APIs are very sophisticated in terms of you can use a bunch of operators and conditions with recursive groups and some kind of operators between those groups and, you know, things that the UI is doing constantly. And even human, when you want to write a query language, you pour a lot of intuition about how you using those conditions. And we found out that the agent didn't really like it. We had to limit the amount of options in order for the agent to make a reasonable call. So we kind of encouraged the agent to use simpler APIs and let it kind of narrow the search and only then expose more complex APIs. But you have to keep the API in some way a bit more relative to what you would do with an SDK where you want the developer to have all the options in the world to find what's relevant for that scenario.
B
Yeah, no, I've definitely seen something similar. If you have too many options available, it just gets confused and starts throwing random stuff at you.
C
Yeah. And also I must admit that at the beginning I thought we can also feed LLM with the open API and tell it to create an MCP that works well for it. But that didn't work well as well. It didn't understand how an agent will effectively consume those APIs. We were very disappointed with that process. We had to go back and manually craft the endpoints and think about the use cases. And we were worried about are we leading the agents to do things that we think are the right thing, but maybe it would prefer to do something else. But end of the day we saw 100% better results when we closed some APIs limited the number of results. We forced the agent to get up to 20 results, for instance. Otherwise it just got 2000 log lines and kind of got stuck with random nonsense.
B
Yeah, that's interesting. So if I'm hearing correctly, some of what you did is you kind of, one, applied your expert judgment. Here's a set of things that probably will be helpful. We're just going to expose those. And two, kind of gave it like this progressive disclosure where it's like, here's where you start. Okay, now we're going to expose a little bit more. Now we're going to expose a little bit more along the way.
C
Yeah. We also make some parameters required, although they're not required from the official SDK. For instance, it meant to tell the agent, if you are looking for traces and you don't know what cluster are you looking at, something is off. You need to do something else. Like, if you're that clueless, you're using the wrong API. So it has to go through a certain way of thinking because it understands, it has to know first what cluster are they looking at. So now it can think, oh, how do I know what cluster do I want to check those traces? And then that led to some kind of flow where eventually it got the right cluster and some APIs where the output was less verbose and more high level. We allowed more primitive set of variables. So you could ask, you know, questions like what changes happened in my entire production? Or what incidents do I have? And then get the labels of those incidents and then think, where do you want to check? Right. So some kind of leading indicators to where should I look for the heavy stuff? Where should I look for the actual RAW logs, actual raw traces, which could be very overwhelming. It could be. And that kind of API could return easily 10, 20 megabytes of response.
B
That's fascinating. So essentially you're making the API much more restrictive than you would for a human because you are saying, hey, there's a right way and a wrong way to do this. And if you're calling this without these variables, you probably have not thought about or figured out enough to get a useful response here.
C
Yeah. And when you think about it, this is very human nature. So imagine you're standing behind a junior software developer looking on some kind of an incident in production, and all of a sudden you realize that person just read all those log lines randomly. You're like, hey, stop for a moment. Like, what are you doing? Let's figure it out. Like, high level, where is the issue? Let's filter all those logs. Let's focus on what we know. Probably going to lead us to some kind of realization on what happened. So we tried to put that notion into the flow of the APIs. It's not 100% successful still. It can still get lost. But it helps to keep the wild investigations in somewhat direction of narrowing down.
B
Yeah, that makes a ton of sense. Now we've talked a little bit about identifying patterns, doing that very deterministically. I was curious if you had explored any sort of statistical pattern recognition or LLM based pattern recognition internal that you then can expose up to a end user agent or anything like that.
C
We play with it and we're still playing with it because I think this is obviously the future. We have something, we have a feature that is copy to agent which you can see in a, in a. You know, modern tools today have this kind of copy to agent. When you have this prompt that can guide the AI agent to what are you doing? We started doing that for more than just a section. So imagine that I can represent flow you did in the platform visually and explain it to an agent saying you went from traces to that workload page and then you clicked on logs, you filtered those, you added those keys and you landed here and then take a screenshot or something like that and give it to an AI. This is like an alternative universe for MCP. Instead of letting it use APIs to your platform, it kind of let it experience the UI in a way, in a markdown way or in a screenshot way. And the results were actually surprisingly good. Because you would think that we put so much effort within ux, right, to make the app human friendly or help you slice and dice a lot of data and then you throw everything and tell, okay, the AI agent will just use MCP or APIs, but if you just take a screenshot of your application and send it to an AI agent, you'll be amazed how much does it understand from that scenario. So we're definitely looking at this kind of stuff. Nothing yet production ready, but we are playing with it.
B
It is interesting, right? These things are trained on human thought processes and they can only incorporate so much information. And so all of this thinking we've gone into, how do we help person reach the right things they're looking for can be applicable switching threads a little bit and looking at another hot topic in this sort of AI expansion space is privacy, security, those sorts of things. As you're building an observability solution for this modern era, what types of privacy and Security needs are there that are maybe more relevant today than they were a few years ago.
C
Yeah. So logs and traces have the potential to contain PII at a minimum. We all try not to do those kind of things, but this happens, right? Someone just put an object inside a logger and all of a sudden you printed some PII into your logs and now you need to understand how you delete it or how you contain it. Ground cover is built on bring your own cloud, which I think is becoming very, very popular with the llama era. You know, you have non humans sniffing into your code or your data or your most private data. So all the data contained inside your environment. So imagine if you have a very sophisticated agent that run in some kind of sas. You don't know where it is and it learn from your, from your data, analyze it, summarize it. Now you're more exposed to prompt injections and even just bugs. Right. Like even, even just someone didn't think that will happen. So ground cover is built to use your LLM inside your account. So the risk is a lot smaller. You basically have all the data with the agent inside your cloud provider. So at least you know what are the physical perimeters where data being served and saved. I think this is the future for agents in general. You don't want a very sophisticated agent to have your data externally as we do with humans. Right. When we onboard new employees, when we onboard contractors, we want them to use our laptops, we want them to stay in our offices or in our network environment. I think AI agents just make it the potential, the risk is just a lot bigger.
B
It does feel like that's kind of the next generation of SaaS is everything can be done within your cloud, wherever it is, at least if you're on the big cloud providers. But increasingly whatever cloud you're in.
C
Yeah. And I think if we're already added, I think Kubernetes changed the world in standardizing where do you deploy code. So now Kubernetes is commoditized like an RDS or it's easy to assume you're going to have a postgres managed postgres instance and compute and maybe even object storage. And to be honest that's 90% of what you need to build a very complex platform. So this is what groundcover is actually doing. Use very simple services to run that production and we make sure we can run it everywhere. We have obviously the big cloud providers. We also have on prem and air gap. Once we rely on Kubernetes a lot of things got way simpler.
B
Well, now that we're sort of talking about the way things are evolving, what do you see as like what needs to happen in the observability space over the next five or ten years? Where are we not yet solving the problems that are really there?
C
The holy grail is root cause analysis. Everyone wants the agents to tell us what went wrong and how to fix it, right? You see it everywhere. Everyone is targeting root cause analysis and I think this is the very expected future. We're still not there, it's still very complex. So you can see root cause analysis with very basic scenarios, right? You have that kind of log and error log and you can explain it, you can correlate it with infrastructure changes. That's very basic. And this is already happening. But what about like a two hour research? You want a copilot, you want someone to be there with you and do the research with you while you are investigating an incident in production. And we all been through this incident where you are six hours, eight hours into the incident and trying to figure out not only what happened. And you also want a way to remediate how do I recover from it? And for that we need a deep understanding of the architecture, a deep understanding of the changes, and also a lot of world knowledge on engineering and that's still not there. And I think this is definitely what we are targeting, but it's going to take some time to get to it.
B
What would you say are kind of the underlying components that would go into making that possible?
C
Good question. We need to have this model of how production looks like. We need to have the notion of how we learn production, right? How we learn the architecture. This is still something that usually represented by a simple graph. But think about how long it takes to an engineer to understand the entire architecture. It takes years in modern company and for some companies it never been one person that can understand everything. So we need a way to create that knowledge and also keep it up to date. Because things change so fast, it has to get the live data from production to understand how it's built and also the consequences of the architecture. So I think for instance, EBPF is a very good way to understand the behavior of applications, the interaction between applications, the dependency between applications. So you can definitely build that graph represented with some kind of degrees of what's the side effect of this component failing. We also can inspect disks and network calls. So that's a very good start. But you also want to understand the interactions of people with those components. You want to understand the teams and the responsibility of the teams on those components. So we are thinking of a lot of things happening on Slack, right? We need to understand the interaction there, a lot of planning on new features. So okay, we need Jira or Linear to also understand the planning for the future. Maybe someone already wrote this is going to break. We need that plan and we need to understand that as well. Like what's the future? A lot of organizational knowledge on Dr. And D and product should be there as well. That's a lot of knowledge.
B
That's interesting. So I was thinking about like you brought up Kubernetes and how that has really shifted deployment and I think one of the things that it does is it takes at least one part part of your stack and makes it kind of declarative. You can see some piece of your architecture and pull that knowledge out. Terraform is similar like any of these sort of declarative infrastructure as code types of solutions give you the ability to analyze that part of architecture and then as you highlight EBPF gives you live behavior data. So that's another piece. The organizational side is an interesting one, right? Like how are these systems used? What's going on with them? I wonder if there's other places where like Kubernetes, there's some sort of declarative structure we can put in place that would let us short circuit that a little bit and just get a sense of like oh, here's how info flows in this system.
C
I think you can learn a lot from code reviewing. If you read on code reviews you probably get a lot of insight on where are the weak spots of the components. You basically want to understand why things are like they are because you can always jump to conclusions, right? We maxed out the database connections and that's why it broke. But there is something more than that, right? There is a reason why we limited that number of connection. Maybe the developer kind of created this kind of new query that consumed a lot of connection and we actually rotted that to protect that database from being over consuming CPU or something like that in order to serve other applications. So a lot of going on between the APIs and the applications and the Kubernetes. I think we need to find a way to understand why things are built in the way they built. What was the architecture decisions? Did we make it intentionally? Maybe we didn't. Maybe we just didn't know. This is the default behavior. Is it default behavior? There is a lot of things to know that are between the code and the application and documentation of those Services that we still need to figure out how to get that information. But I think what's comfort me is that end of the day engineers solve those problems. So they go to Slack, they search Everywhere, they reading GitHub issues, they open AWS documentation, understand if that's the default behavior, they open ground cover, see how many connections they have currently. What's the storage throughput at that moment? Is it different than it was yesterday? So the knowledge is there, we just need a way to represent it and also to make it efficient because humans usually have intuition on issues. We need to understand how that works.
B
Yeah, no, I think that the modeling piece is really interesting, right. Like in some ways even you mentioned with log data, right, you're finding that time series is a very effective way to model things for an LLM to utilize and to use in different ways. If LLMs become the glue that is running through like these different inferences, it matters how are we representing this data to them. They're often quite linear thinkers from what I can see using them. And so like if there's non linearities to it that needs to be represented somewhere else in how you're then presenting that info to them. Yeah, it's fascinating.
C
I think we also need to decide how much are we going to help it because most use cases, most incidents can be represented as 20, 30 questions that you have to answer. Right? Most incidents are missing resources, noisy neighbors, bad version exceeding quotas, some kind of infrastructure failure. We can help with crafting, as you said, 50, 60, 100 scenarios that AI can go through linearly and then start to think on its own. Basically that's what we would do as an engineer, right. We're going to have those 10, 20 playbooks. If it doesn't fit none of those, you know, you're in trouble, but you still have some kind of, you know, learning that you did because you know, it's not that, it's not that, it's not that now you're looking for something weird. Something weird means you're going to call X, Y or Z and ask for help. And I think even if we just do that, right, we can do that before an incident maybe because we can have more questions answered or even just present a report saying here is a custom dashboard I created for those scenarios. This is what I checked, blah, blah, blah, blah. And that's what I know. I'm stopping now. And now you help me help you. That's a big milestone. That's a big milestone. If you can have a coworker that is Actually a bot that help you actually investigate and taking branches. I'm thinking of mentioning, you know, ground cover agent, go look for storage issues in the cluster. Coming back after a minute saying, you know what, I found this. I don't think it's relevant with a graph. That's a big thing.
B
No, that would be super nice. And to your point, right, Like I think a lot of the folks who are most effectively using AI for whatever purpose right now, they're building out playbooks, they're building out scenarios where someone has thought through it, maybe using an LLM, maybe on their own. They've sort of described what they think should happen and that can be then reused and built upon. And so yeah, you could have these 60, 100, 200 scenarios. Some of them might be purely deterministic. You could just go and look and see like, hey, did this happen or not? Others might require an agent to do some decision making. But that, yeah, that would be a really nice roll up.
C
Well, we're working on it.
B
You are. Okay, so that actually feeds into this question, right? So what, what is coming next from groundcover? What's or coming soon? Like what are the dimensions on which you guys are pushing this forward?
C
Yeah, so there are two main verticals. One is observing LLM and the other one is LLM for observability. We just released the LLM observability piece. We use ebpf, we talked about it. It's very easy for the sensors to pick up LLM calls and we now can see every API call to LLM inside your production or dev or staging. So we can now track token consumptions and comparison between models and correlate that to workloads and even showing you what caused that API call. So this is a track of LLM observability. We're just getting started and this is a huge milestone for us because until now we focused on classic APIs and now we're trying to get the bedrock API call or Azure OpenAI and all that. And it's very interesting and very fascinating. We're hearing customers request for rendering images, sending to LLM and a lot of wild stuff. So this is one vertical that we're going to invest in it. The other one is using LLM for observability. We are taking a very different approach. We started by making sure the data, the observability, Most data you would think is JSON logs. No, that's not true. Most data is just random printing of logs with weird formats and multi lines and a lot of Sorry to say that, but garbage data. And no matter how much your agent is smart, if the data is garbage, you know, garbage ain't garbage out. So we are actually now focusing on using LLM to create observability pipelines to clean the data. So we offer to parse those log lines into a much, much more meaningful format. We suggest passing specific fields that we think is are meaningful to create time series from it and represent it in a much more sophisticated way that will allow you, the user, to create more advanced queries. So that's the first step. The second step will be after the data is organized and clean, we're going to use obviously LLM to analyze the data. We already started doing that with patterns and as we talked before, the baselines of the signals. But the target is to have that copilot running with you when you investigate or even just doing a research. It doesn't have to be just on crisis. I found myself using Claude code for understanding repositories. Not even just writing code, just asking questions about that repository. Sometimes it's just, you know, making things up, but sometimes I can actually get very good answers from from it and that saves me a lot of time. So imagine you can have it on your observability data. You can ask questions like I'm looking for a cross a consumer that is inflicting $10,000 a month because of cross a network that easily can be a two hour research for engineer and maybe two minutes for an AI bot that you can just mention on Slack and get dashboard answering your question. So yeah, that's the goal.
B
No, I love it and I think the dashboards and being able to surface up the right dashboards at the right time is a really nice thing. I read an article about looking for more AI enabled heads up displays rather than copilots where it's like it's just going to show you what you need to see at that time so that it's easy to understand what's going on in your system.
C
Yeah, I love it.
B
Awesome. Well, we're getting close to the end of our time. Is there anything we haven't talked about today that you think would be important to leave folks with?
C
I'm going back to the beginning. We want to leverage AI to write a lot of code. We need to find smarter ways to make sure in a very deterministic way on what the code actually does or how it behaves, what is the architecture that it represent. And no matter what we we're going to end up, it has to be deterministic. Otherwise we're just, you know, lying to ourself with another layer that is not deterministic, that is judging another layer. So I don't know who's listening, but we need better linters. I feel like this is, this is the answer. We need new tools. I feel like this is the era for having another stage for the CI to make sure things we care are being forced on the AI agents that write the code. So tests are very good. Linter is a good start, but we need, I feel like something smarter need to be created.
B
Yeah, no, there's something interesting there. I almost wonder, like I observe a wide difference in how well LLMs write particular software languages. And I wonder if there's a designed for LLM generation software language that needs to happen where you can kind of enforce those architectural pieces at the link level.
C
Interesting. Probably there are a lot of developers that will code in just English. Right. Because if you think about it, code language is built for being compact, but for specific tasks. Right. Very efficient for specific tasks. But now with AI, I feel like English is probably a good way to represent an idea.
B
But they're not only compact, they're also, to your point, deterministic. Right. This thing means exactly one thing that can be reproducibly created. English is terrible for that.
C
Yeah, I agree.
B
So I love English as the first specification level, but you need something that you can, as you highlight, deterministically verify, evaluate, enforce restrictions on something where it's very, you can draw very clean lines.
C
Yeah, sounds interesting.
B
Well, this has been an absolute pleasure. Thank you, Ches, for the time today and we'll call that a wrap.
C
Thank you.
Episode: Engineering in the Age of Agents with Yechezkel Rabinovich
Release Date: October 16, 2025
Host: Kevin Ball (K. Ball)
Guest: Yechezkel “Chez” Rabinovich, CTO & Co-founder of Groundcover
This episode explores how observability and engineering practices are rapidly evolving in a world increasingly shaped by AI-generated code and agent-driven development. K. Ball and Chez discuss the power of eBPF (Extended Berkeley Packet Filter), the challenges and opportunities AI brings to software development and operations, new approaches to observability, and the future of root cause analysis. Throughout, the conversation is grounded in Chez’s journey from kernel engineering to building Groundcover—a novel, eBPF-powered observability platform.
“EBPF... lets us instrument the application from the kernel side without any risk for the application itself...you still get 95% of the value.” —Chez ([02:57])
“The basic of modern observability is to have all the information in one place...this is the very bare minimum.” —Chez ([05:30])
“EBPF will tell you the truth.” —Chez ([16:14])
“It has to go through a certain way of thinking…Because end of the day, when you wake up at 3am, something is wrong...you're not sharing [responsibility] with that AI bot.” —Chez ([16:14])
“Ground cover is built to use your LLM inside your account. So the risk is a lot smaller.” —Chez ([30:12])
“You want a copilot, you want someone to be there with you and do the research with you...And for that we need a deep understanding of the architecture, a deep understanding of the changes, and also a lot of world knowledge on engineering and that's still not there.” —Chez ([33:36])