Transcript
A (0:00)
Enterprise IT systems have grown into sprawling, highly distributed environments spanning cloud infrastructure, applications, data platforms and increasingly AI driven workloads. Observability tools have made it easier to collect metrics, logs and traces, but understanding why systems fail and responding quickly remains a persistent challenge. As complexity continues to rise, the industry is looking beyond dashboards and alerts towards agentic AI systems that can reason about operations, data, reduce, toil and take action when things go wrong. SolarWinds offers solutions to monitor, understand and remediate issues across complex distributed systems. The company began as a leader in network and infrastructure monitoring and has evolved to support modern applications, cloud environments, containers and AI workloads with a growing focus on reducing operational toil. Krishna Sai is the Chief technology officer at SolarWinds. He joins the show with Shawn Falconer to discuss how SolarWinds is rethinking observability in the age of AI, what it means to design agentic systems for mission critical environments, how AI assisted programming is reshaping engineering workflows, and why the future of operations depends on building platforms where humans and autonomous agents work together. This episode is hosted by Shawn Falconer. Check the show notes for more information on Shawn's work and where to find him.
B (1:44)
Sai, welcome to the show.
C (1:45)
Thanks Sean, it's great to see you and meet you. Big fan of the show, so thanks for having me here.
B (1:51)
Oh well, thank you so much. That's nice to hear. Yeah, I'm looking forward to this as well. So I wanted to start off talking a little bit about Solar Winds and kind of set the stage there because I think a lot of people know SolarWinds maybe as a single tool that they used years ago, but you guys do a lot of different things. So given where you are today, how would you describe what SolarWinds actually is today to someone who hasn't looked at the space in a while?
C (2:16)
No, absolutely. If we take a step back and say how IT and Ops teams who typically use SolarWinds products have been using SolarWinds for the past like 25 years or so. Our product portfolio broadly expands three domains observability, incidence, response and service management. And to put it simply, IT and ops teams use us to help detect and remediate issues across variety of workloads in their environments, network and infrastructure, which is where we started and have been a leader for a very long time, but also applications, databases, containers, ML workloads, et cetera. Our solutions cover this from a horizontal perspective, meaning give you the ability to look at the general basic health of the typical workloads compute, storage, network, et cetera, but also vertical cross cutting concerns like performance, reliability, cost, security and so on. Right. And what happens is typically IT and Ops teams are accountable for SLAs and SLOs. Right. And that kind of drives your day to day behavior. More mature teams, of course manage error budgets at scale and they have nuances of that same dimension. But all of this is much simpler said than done. I was talking to a cio, was part of a customer call recently is CEO of a major system integrator responsible for running big managed global services for an organization. And you know, he said it. Well, he said like I'm responsible for SLAs, but honestly I can't tell you everything that contributes to an sla. Right. Which is a statement of complexity in these environments. But it's also increasingly these teams have to deal with large, you know, microservices, distributed systems, et cetera. And so complexity is very real. And so what we target is especially in the context of AI and so on. Our goal is to reduce toil. We've all been there. Waking up at 3am alert storms and getting into a war room. And the problem with that is that even today a lot of the tools just ingest a whole lot of data and show you a lot of dashboards with red lights and so on. But still finding out why something is red is still a big challenge. So we've been thinking about this challenge. So when we think about AI assisting with this, traditionally we've gone from statistical approaches, things like anomaly detection, machine learning, basic stuff, to now there's a very clear shift to agentic AI, not just in our industry, but just across the board. And so that's something that we want to focus on and increasingly index on. And the way we talk about that is we just call it Sullivan's AI more broadly, but in particular the agentic portion of it we call Sullivan's AI agent. It often gets confused with AI observability, which is something that comes up a lot. The way we think about AI observability is that as a more of a vertical use case. Right. Rather than a lot of a horizontal thing. But that's also something that we're starting to do here.
