Podcast Summary: Software Engineering Daily
Episode: Engineering in the Age of Agents with Yechezkel Rabinovich
Release Date: October 16, 2025
Host: Kevin Ball (K. Ball)
Guest: Yechezkel “Chez” Rabinovich, CTO & Co-founder of Groundcover
Overview
This episode explores how observability and engineering practices are rapidly evolving in a world increasingly shaped by AI-generated code and agent-driven development. K. Ball and Chez discuss the power of eBPF (Extended Berkeley Packet Filter), the challenges and opportunities AI brings to software development and operations, new approaches to observability, and the future of root cause analysis. Throughout, the conversation is grounded in Chez’s journey from kernel engineering to building Groundcover—a novel, eBPF-powered observability platform.
Key Discussion Points and Insights
Chez’s Background and the Power of eBPF
- Chez’s journey:
- Over a decade in software engineering focused on Linux and distributed systems ([02:13])
- Early career spent on kernel modules before discovering eBPF
- Why eBPF?
- Traditional kernel extension is arduous and risky ([02:57])
- eBPF allows for sandboxed, dynamic, and safe kernel instrumentation—empowering organizations to observe their systems without modifying application code.
“EBPF... lets us instrument the application from the kernel side without any risk for the application itself...you still get 95% of the value.” —Chez ([02:57])
- Groundcover’s approach:
- Main observability sensor is eBPF-based; deploys in minutes, collects logs, traces, metrics in one place, "bring your own cloud" for privacy and cost efficiency.
- Allows full-stack visibility, even for third-party interactions engineers may not know about.
Observability for Complex, Modern Systems
- System complexity is the new norm:
- Modern stacks involve 100s of microservices, 3rd party resources, and integrations ([05:30])
- Unknown dependencies often fall outside the scope of classic instrumentation, but eBPF-based observability captures these “unknown unknowns”
- Prior customer pain: using 5–7 tools to piece together a complete picture
“The basic of modern observability is to have all the information in one place...this is the very bare minimum.” —Chez ([05:30])
- eBPF-based instrumentation is “mind blowing” for first-time users ([07:13])
The AI Code Generation Surge—Opportunities & Challenges
- Unprecedented code velocity:
- AI accelerates code generation to “superhuman” speeds ([07:40])
- Reviewing and validating rapidly generated code poses new challenges
- AI for code reviews—and its limits:
- Groundcover uses AI to both generate and review code ([08:34])
- Engineers remain accountable; context matters in deciding review rigor
- “I think the question is very dependent on the context of what are you building.” —Chez ([10:58])
- For simple libraries: manual scenario crafting + tests are often sufficient
- For performance-critical code: close inspection, resource awareness, and traditional review remain essential
Validating in an Era of Agents and LLM-Supported Engineering
- Tests, not logs, are the ground truth:
- LLMs can “lie” (hallucinate logs, mis-refactor), making logs or markdown less trustworthy ([14:06])
- Deterministic guardrails (tests, linters, code conventions) are critical in the AI coding era
- “You have to create guardrails that rely on a solid ground truth.” —Chez ([16:18])
- EBPF as a trustable observer:
- Independent of app code or AI potential mistakes
“EBPF will tell you the truth.” —Chez ([16:14])
Managing Data Volume and Summarization for AI Agents
- Observability at scale brings information overload:
- AI-generated code increases log and trace volumes ([17:13])
- Summarization is essential—stream aggregation, time-series analysis, and building baselines/patterns from data ([17:13])
- “If we can nail the patterns, we can actually convert them to kind of time series.” —Chez ([17:13])
- API and change management for targeted exploration:
- Baselines for API traffic and external events (e.g., image changes) help drastically narrow investigation timelines
Exposing Observability Data to Agents
- API Design for Agents:
- Exposed their data to agents via MCP (Model Control Protocol), but found that too much complexity confused LLMs ([20:52])
- Needed progressive disclosure: start with simple APIs and only then reveal more complex interfaces ([24:23])
- Forced agents towards disciplined investigation flows akin to how a human mentor would steer a junior engineer ([26:23])
“It has to go through a certain way of thinking…Because end of the day, when you wake up at 3am, something is wrong...you're not sharing [responsibility] with that AI bot.” —Chez ([16:14])
LLMs as Observability Users: Learnings and Patterns
- Constraining agent access works best:
- Crafted endpoints/use-cases thoughtfully; progressive complexity limits ([24:23])
- Made some API parameters “required” for agents (e.g., cluster specification), even if optional for humans
- Experiments with alternative agent interfaces:
- Tried screenshot-based, UI-driven observational agents with surprisingly strong results ([27:47])
Security and Privacy in an AI-Agent World
- Risks of PII in logs and traces persists ([30:12])
- Bring-your-own-cloud (BYOC) design:
- Keeps sensitive data in customer environments—critical for privacy when non-humans (agents, LLMs) process data
“Ground cover is built to use your LLM inside your account. So the risk is a lot smaller.” —Chez ([30:12])
- Kubernetes and standardized deployment make BYOC easier and more widespread.
The Future of Observability & Root Cause Analysis
- Root cause analysis is the 'holy grail':
- Current solutions handle simple log-to-error correlation; multi-hour, highly complex investigations still defy automation ([33:36])
- Ideal: AI copilots able to investigate incidents alongside engineers, not just after the fact
“You want a copilot, you want someone to be there with you and do the research with you...And for that we need a deep understanding of the architecture, a deep understanding of the changes, and also a lot of world knowledge on engineering and that's still not there.” —Chez ([33:36])
- What’s needed?
- Up-to-date, live model of production architecture—including services, teams, feature plans, social/organizational context ([35:05])
- Blend static (Kubernetes, Terraform), behavioral (EBPF), and organizational/documentation sources for holistic understanding
- Build playbooks and model human intuition for efficient incident triage ([40:48])
What’s Next for Groundcover
- LLM Observability:
- New features to observe LLM call patterns, token consumption, correlation to workloads ([43:27])
- Increasing demand for monitoring AI-specific behaviors (e.g., prompt, image generation)
- LLM for Observability:
- Using LLMs to clean/massage log and observability data, create advanced queries, and ultimately serve as investigative copilots
- Stepping toward dashboards and “heads up displays” surfaced by AI at the right time ([47:02])
Notable Quotes and Memorable Moments
- “Any software engineer can basically write superhuman code. Speed, velocity...this is no longer a barrier.” —Chez ([07:40])
- “I don't trust logs that being generated by LLM because, you know, it just can make up logs.” —Chez ([14:06])
- “If you are looking for traces and you don't know what cluster are you looking at, something is off. You need to do something else.” —Chez ([24:23])
- “Imagine you can have it on your observability data. You can ask questions...that easily can be a two hour research for engineer and maybe two minutes for an AI bot that you can just mention on Slack and get dashboard answering your question.” —Chez ([46:23])
- “We need better linters. I feel like this is the era for having another stage for CI to make sure things we care are being enforced on the AI agents that write the code.” —Chez ([47:30])
- “Probably there are a lot of developers that will code in just English...But now with AI, I feel like English is probably a good way to represent an idea.” —Chez ([48:51])
Key Timestamps
- [02:13] — Chez’s background and discovery of eBPF
- [05:30] — Modern observability needs and pitfalls of classic instrumentation
- [07:40] — Changing dev lifecycle and AI code generation pace
- [08:34] — How Groundcover uses AI for code review and code generation
- [14:06] — Why tests matter more than ever; LLMs and trustworthy signals
- [17:13] — Summarizing observability data for AI agents; baseline/trend modeling
- [20:58] — Using MCP, designing better APIs for agents vs. humans
- [24:23] — Progressive API disclosure and constraints for agent effectiveness
- [27:47] — Experiments with UI-based agent guidance
- [30:12] — The critical role of privacy/security, bring-your-own-cloud (BYOC)
- [33:36] — The long-term vision: root cause analysis and AI copilots
- [35:05] — What’s required for comprehensive, up-to-date architectural understanding
- [43:27] — What's next for Groundcover: LLM observability and AI-powered pipelines
- [47:30] — Final thoughts: AI guardrails, the need for new tooling
Final Takeaways
- Software observability is becoming increasingly important, especially for AI-generated, agent-driven systems.
- eBPF provides a secure, high-fidelity, low-code mechanism to gain deep insight into complex, distributed platforms.
- The future of code review, validation, and incident response will blend deterministic guardrails, AI/agents, and human expertise.
- The industry must develop smarter, more context-dependent tools—linters, test frameworks, and architectural models—to ensure that AI and engineers ship high-quality, understandable, and secure software in the age of agents.
