
Observability emerged from the need to understand complex software systems, and involves tracking metrics, logs, and traces so engineers can detect and diagnose problems before they affect users. However, modern applications often encompass hundreds of...
Loading summary
A
Observability emerged from the need to understand complex software systems and involves tracking metrics, logs and traces so engineers can detect and diagnose problems before they affect users. However, modern applications often encompass hundreds of services, containers and dependencies, generating more observability data than dashboards and alerts alone can effectively surface. New Relic is a leading observability platform with a history that spans the full arc of modern software operations. Today, they are working to apply AI to move observability beyond passive monitoring toward active intelligence where systems can surface what matters, reduce alert noise, and ultimately take autonomous action before problems reach engineers or users. Nick Benders is the Chief Technology Strategist at New Relic, where he has worked for 16 years. In this episode, Nick joins Lee Acheson to discuss the evolution of observability from dashboards and alerts to AI driven intelligence, how LLMs and statistical tools work together to surface meaningful signals from massive data sets, the emerging challenge of observing AI systems themselves, and what the rise of AI means for the future of software engineering as a profession. This episode is hosted by Lee Acheson. Lee Acheson is a software architect, author and thought leader on cloud computing and application modernization. His best selling book, Architecting for Scale is an essential resource for technical teams looking to maintain high availability and manage risk in their cloud environments. Lee is the host of his podcast Modern Digital Business, produced for people looking to build and grow their Digital Business. Listen DB FM, follow Lee@softwarearchitectureinsights.com and see all his content@leeacheson.com.
B
AI and Observability how exactly do they work together? My guest today is Nick Benders. Nick is the Chief Technology Strategist for New Relic, one of the major observability platforms that is now focused on AI. And Nick is also a personal friend of mine. So Nick, welcome to Software Engineering Daily.
C
Thanks Lee. It's great to be here.
B
Now, you and I go back a long time from New Relic days. Early in the New Relic days. Matter of fact, we've just spent a little bit of time talking about some of those earlier days. But can you catch listeners up on what you've been doing since those early days in New Relic and what your role as Chief Technology Strategist really involves?
C
Absolutely. So when I think about New Relic's journey and really in some ways the industry's journey, back when we started at New Relic, we were very much solidly in this instrumentation era. Like the thing that we sat down at our desks and tried to figure out every day was how can we instrument more of the systems that matter to people? Oh, well, we started with Ruby. How do we get Java, how do we get Net, how do we get Python? How do we get into the browser or onto mobile apps or add a new library? And pretty soon we were instrumented so many things that there was more data than we could deal with. And so it moved from this instrumentation era into this data platform era. And for new Relic, that shift was around 2013, 2014, when we introduced NRDB. NRDB gave people a way to ask questions of a system that you didn't know you needed to ask. So all the data goes into it and. And then after the fact, you're like, oh, where are my slow queries coming from? Oh, well, that's mostly a test system. Exclude that test system. Where are the rest of the slow queries coming from? Can you break that out by country? And so this type of like interactive questioning system that powers dashboards, it powers like just kind of a interactive data explorer, it powers alerts. There's all these things you can do with a data platform, but that was over 10 years ago now. And what we've seen is the ability to have all the data and to put it all into place so you can ask any question is no longer enough because you have so much data, you don't even know what to ask from it. And so instead of just being about the ability to ask something or to make a dashboard out of anything, you need a tool that tells you what are the questions you want to ask. It tells you what are the things you want to look at. And so that's that shift from the data platform era into this intelligence era. And intelligence everybody jumps on. Immediately you're like, oh, it's AI. I'm like, yes, AI is a piece of intelligence, but it's also about product design. It's about the way that we use something, having those built in opinions, those flows. Because when somebody sits down at a tool, they don't want to just see a prompt and say, oh, I can dashboard anything. Great. They want answers, they want to know what's important in their system. And so that's that intelligent shift. That's the era we're in now. I'm going to talk about this a little bit later. But also, the intelligence era won't last forever. And it may already be bringing to a close as we need to move into an action era as an industry and for New Relic. And so New Relic's journey since those early days has been getting into each of these Kind of pioneering it, figuring out what has to be done, and then asking the question of what has to be done next. As the chief technology strategist, that's where I come in, is I've been with the company now for 16 years, just working with every piece of the system. And I've been using observability tools for 30 years now. And back from when we used to call it monitoring was just ping.
B
Right.
C
The question isn't where are we? As much as where are we going? And that's my job.
B
Got it, Got it. That makes sense. So I was going to ask you, what's the one biggest change that occurred in your mission? But it sounds like there's really two big changes. There was a change from instrumentation to data, and then from data to intelligence. And are those really the major changes in New Relic that have occurred over time? I mean, in New Relic product, obviously
C
in the product, there's a million things that go on in any tech company. But I think that realization originally back a decade ago, that the secret was to have a strong data platform and that you had to be able to ask anything of your data platform at any scale. And then more recently, just a few years ago, that realization that we've built this industry, like I said, I've been using these tools for decades that dashboards that we build today are fancier, they're easier to build. Like, they monitor different things, but fundamentally they're not that different than dashboards. I built in the 90s where you sat down and you're like, I don't know, I'm going to put a dashboard together. It's going to monitor, here's how much free memory we have. Here's my CPU usage, here's my network usage. I mean, I did it in like TK tcl, then instead of just doing it in nurkle in a browser.
B
The difference, though, Instead of having three or four graphs, we have three or 400 graphs.
C
Yeah. Oh, yeah, right. And people want to build these charts, but there's no widget I can make that small enough to actually watch everything in the system. And when that struck us as a company, we realized dashboarding and alerting as the way you do observability has reached its conclusion. You can't go forwards anymore with a I'm going to make fancier dashboards or tinier widgets, or I'm going to give people more tools to set alerts. Nobody wants to set alerts. Nobody wants to build dashboards. They want to know answers. They want to sit down and know what's going on in their system. And if dashboards and alerts are a tool, you can use that tool, but it's not really people's objective. And that to me, that's that change that's occurred to us a couple of years ago. And I think you're going to see all across the industry, especially with these AI powered capabilities.
B
So I remember back in the early days of New Relic, in the early NRDB days in particular, we talked a lot about machine learning and the value of alerts generated from some magic machine learning algorithm that did magic things and said, oh, this is a problem, because such and such pattern exists. And we use that as we were very clear. Lou. I remember standing up at conference after conference saying, this isn't AI, this is not AI, this is something else. I forgot exactly the term we use. He had a term that he used for it, but basically it was machine learning is really all it was. So what's changed? Obviously, machine learning isn't the core of what you do anymore. From an intelligence standpoint, what really changed? Why did this change occur?
C
Yeah, it's a good one. I like to think of those techniques as falling into three buckets. Obviously it's all computers, it's all just software. Like there's no magic here. But when we look at simple things, like you're going to do some type of component analysis on a signal, or you're looking for baseline deviations, you've got mathematical formulas, you apply them. The parameters are static, even if the math is fancy. So I said like, the first category of this is like, this is an AI, it's just math, this is just statistics. You can look it up, you can execute it on it, and you'll see a lot of functionality that's like that across every product that tells you when there's a significant deviation or things like that. Then there's machine learning where we're taking those parameters and instead we're defining like hyper parameters. We're saying, oh, hey, could you tune these baseline alerts so that there's only one alert per period or kind of make these hyper parameters work out and some simple algorithms there. And that made up a lot of what people thought of as MLOps or these AI functionality up until about two, three years ago. And everything just got clobbered by the return of neural nets. Neural nets obviously are not a new technology. We've been kicking them around in the CS department since the 1960s. But suddenly they started to work with that transformers architecture and some of these other Optimizations and infinite money poured into it. Now we have neural nets that work. And so I think of the neural net based systems as being what we currently call AI, although maybe five years from now we'll feel differently. And those are more complicated and less predictable than machine learning based systems which have more constraints set, but they tend to not actually use any neural nets. And those are simpler but a little bit less predictable than static statistical systems. Just like using math, a good product is going to have all three. I think that if your view of AI is I send everything to OpenAI and they send me back answers, you get a lot out of that, that's cool, but it's not the right tool for every job. There's a place also for the more traditional machine learning approaches and even for just good old fashioned statistics. I think that as a company and as an industry, we're exploring where those boundaries exist and how to tie them together. Today, the neural net and especially the foundation API. So like calling OpenAI, calling Gemini, calling anthropic, that's the bo, that's the piece of the system that's in charge. But the tools that it has access to in an intelligent system should include lots of traditional ML and statistical tools.
B
So it's kind of easy to imagine this transformation from the static to the ML to the AI when you think about like an individual stat, like I have a server and it's got memory usage, right? And the simple analysis, 80% right above 82% send me alert. Fine, you hit 80%, you get 50 alerts an hour and you try and deal with the noise. What ML did then is say, well, 80% is not the right number. Let's make it 86.2 and we'll leave it there for five hours and then we'll make it 82.1 and we'll make those adjustments as we go and get rid of more of the noise. And that's great as well too. But what the AI, you know, the LLMs of the world now are trying to do is say, well, you know what, this looks kind of odd. What does this mean for this pattern to occur? And maybe I should alert on that. These are the growth patterns that you occur. But I think one of the big values that I'm not sure yet, we're seeing it yet, but I think could be coming and I'd like your opinion on this is the value of LLMs in the breadth of data versus tell me everything about this piece of data, but instead say here's a whole ton of data in my mind, observability, not monitoring, but observability is really about making a complex system understandable. Right. That's really what you're trying to do. And what's better at making a large complex system easier to understand than an LLM? Right. That's one of the things they're very, very good at is summarize and explain what's going on in a large system without a human having to go through and analyze it bit by bit and understanding everything that's going on. I don't see that yet, but I see it coming. What do you see? The role of LLM in this large scale system understanding model versus just the. Yeah, this is anomalous data model.
C
Yeah, I mean I completely agree with you. I think that that's the key step to move from that like data era to the insights era is that I said you can't make dashboard widgets small enough to watch everything that's important in your system. Today we have these kubernetes, clusters that have hundreds of nodes. We're running thousands of pods. We have many, many clusters. Like there' there's so much data in the system today and a human can't possibly search through it all. Nor can you pre write ahead of time alert rules or build dashboards ahead of time that will show you everything that matters. The thing that matters might be a signal you've never looked at before, but you could see a major excursion on one particular metric that correlates with this failure on the user side and you say, ah, that's what I was looking for. So that's the place where you want to combine those statistical tools and the LLM tools. Searching across huge amounts of data is tricky for LLMs. I'm not going to tokenize a petabyte of data and then feed it into an LLM. I mean not certainly without making anthropic even richer than they are. But what I might do is perform a statistical analysis, look for anomalies using well understood methods, then take those anomalies, take those timing ranges and send that into my reasoning system and say hey, are any of these interesting? So the missing pieces to do this, like if you wanted to sit down and we're going to build this this afternoon, we're going to vibe code up whatever our future system. Like what do I need? I need lots of data in, but I need structure to the data I need to understand, well what metrics are about what systems, how do the systems relate to each. It has to be all temporally organized so that I know when something happens, but I also need it to be spatially organized. Like the user is on this browser, it's calling this service. This service relies on these other services. They rely on these infrastructure. They rely on those because you want to draw across that graph when you are performing, whether it's like a root cause analysis of a failure or even a predictive fault where you say, what's the linkage between response time and database? Like, how do I piece this together? I think that there's a ton of work that can be done there with general purpose LLMs. I think that you need a lot of tooling. When I say tools in this case, I mean like MCP tools to give it that statistical capability. Since where you need that like statistical understanding and the ability to feed relevant context into an LLM. LLMs are fabulous at summarizing and they have huge context limits on some of these systems. Now you know, 100,000 tokens, million tokens. But the data world we live in is billions.
B
Billion tokens is nothing.
C
So we've got to go, right? We got to go from the billions down to the hundreds of thousands or thousands in order to make it processable.
A
Today's episode of Software Engineering Daily is brought to you by Unblocked. Your coding agents have access to your code base. Maybe you've even connected other tools via mcps. But access doesn't mean context. Agents can't reason across mcps. They don't know your architectural decisions, your team's patterns, or why the API was shaped the way it is. So agents look in the wrong place and deliver bad outputs. Then you spend time correcting turn after turn. Unblocked is the context layer your agents are missing. It synthesizes your PRs, docs, Slack and tickets into organizational context that agents actually understand. So they make better plans, write higher quality code, use fewer tokens and require fewer correction loops. If you're running Claude code, cursor or any agentic workflow, Unblocked is worth a look. Get a free 3 week trial at getunblocked.com Sedaily TurboPuffer is how companies like
C
Anthropic, Cursor, Notion, Atlassian and Ramp ship their most ambitious search features. TurboPuffer is a serverless vector and full text search engine built on object storage. It's up to 95% cheaper than traditional search databases and just as fast. With TurboPuffer, you can index and search 50 million documents at 10 millisecond P90 query latency for less than $100 per month. Head to TurboPuffer.com sed to get your first month free.
B
So one of the things that this kind of sounds like to me is like back in the early days of monitoring, the problem was alert fatigue, right? We could get all sorts of alerts and all sorts of things, we could have all sorts of notifications of all sorts of anomalies, but we just got so many of them that we ignored them. Is essentially the type of thing you're talking about is now a system that can take all of those alerts and not be fatigued by them and find the patterns that exist and find what's really going on. Is it that simple or is there a lot more to it than that?
C
I think of it as occurring at two levels. So it's funny you said in the early days, it's still. This is one of the top problems I hear from every customer and from every team I talk to. As they said, when something goes wrong, we do a retro. And what happens at the end of the retro? Every retro ends kind of the same way. And it says, oh, well, one of the things that we should do to not repeat this incident is we should establish alerting on monitoring for X, Y and Z so that we can spot it earlier next time. And by the time you run that forwards for a few years, there's an alert for everything in the universe. And I remember Aaron Bento, who we used to work with, gave like a fabulous talk one year on how additional alerts does not improve responsiveness because what it does is it trains the users to. Instead you get an alert and you're like, I'm going to give this a minute and see if it resolves. And what you've done there is you've delayed your time to response. So noisy alerts corrode that response time, even though teams think it's going to add. So one place that we can use intelligence is we can be consuming those alerts and trying to determine if they're serious or not before signaling a human. Right, great. You can look at that and say, well, what's the things that every on call engineer knows? If I just see the one blip, I'm like, oh, that's kind of funny. If I get like six alerts from six different systems, I better run to the keyboard. So, okay, AI can do that, an ML system can do that. It's not rocket science. Another thing we can do is we can actually look at the setup of the alerts. Are a lot of your alerts useless? Do they always go off at the same time as something else? Or are they never actionable where they always come and go? They're not associated with real downtime. Let's look across the whole system. Let's tune some of those down so we can improve the human configured alerts. But both of those are kind of after the fact. The real root of this is why do we even have these damn alerts? Like what are alerts for? We create the alerts because we are afraid that something is broken in the system and we don't know about it. Can we get to the root of that issue instead using these techniques and say, well, when I see an anomaly, I'm going to evaluate it and figure out is this interesting? If so, is it actionable automatically? Can I just roll something back and notify the user? Is it actionable but it needs a human? I should grab a user now? Or is it interesting but not yet over that threshold and I should kind of like feed it to my accumulator and then wait for a few other signals to see what's going on. I believe that in the near future when a user sets up their observability solution, which as you correctly point out should be called an understandability solution because nobody actually wants to observe. You want to understand, when I'm setting up those observability tools, I don't want to go through and have to configure a lot of alerts. I want to say here's my signals that matter the most. These are people using the system. These are the source of truth for whether it's working. Walk through it yourself and tell me when something's going to break before it breaks. And sometimes this sounds like really pie in the sky. Oh, you're going to have that HAL 9000 telling you that sensor array is going to fail in two days. But a lot of this is actually really straightforward. It's like the same stuff that we do today as humans. We say, oh well, here's a CPU threshold that we're going to set before it's a problem. Here's a memory threshold that's going to set before it's a problem. But then we are responding to those on human scale time and not on machine scale time. And if we can detect it, respond on machine scale and correct it, it stops looking like an on call rotation and starts looking like other systems that we consider so boring that they're barely even technology. Like when programs crash and they get restarted automatically, you know, that was like when did we start doing that in the 90s when Kubernetes, Kubernetes says, oh well, you know that node went away, put a pod back up. Like that's self healing. But it's just so nuts and bolts, we don't even give it credit anymore. We're like, oh well, Kubernetes just took care of that. All of our systems should be like that. So many things that require a runbook today are going to be bridged over into just something that you find out about. Like when I lose nodes and my pods get rescheduled, I don't get paged, I don't even get emailed. It's just a thing that happens. And I think that a lot of incidents that exist today, they belong in that category. They should be things that I don't get paged. Maybe I get emailed about it. Maybe it's just a thing that shows up in the log. It's like, oh yeah. And then there was a problem and we had to deal with it.
B
Kind of like customer support in the olden days. Did you try turning it off and back on again? I mean that's really what we are getting to is the model where literally doing a runbook equivalent for a problem is what things should I turn off and restart this one, this one and this one. Did that solve the problem? Nope. Let's try this one. Okay, that solved the problem. Done. And that's the cloud did a lot of that. Kubernetes did a lot of that. Both of them working together really was the core of that change. But that's really the way most of our infrastructure works nowadays, right? Is even our networks, right? The network's causing problems. We'll reboot our network and can't imagine doing that in the past, but you can imagine doing that now. So you're basically saying that all can be automated so that it just happens naturally. And AI obviously being central part about that.
C
I think you mentioned cloud, which is a great example of this. With cloud, especially with these container orchestrators like Kubernetes, even system management software, there's so many things that we expect will just take care of themselves. And we were able to do so because those problems became really well defined. And we said, okay, these are all the things that can happen. So if this happens, then this. If this happens, then this. And what are we left with our operations engineers? Our SREs say, well, the parts of the system that can't be automated are. It's like I got to pay attention because if we ran out of IPs, I had to get called in because there was an IP blockage or there was a certificate issue upstream, and I had to go and jostle it. There's a bunch of stuff that's kind of fuzzy. You would never put it in, like a runbook. If we had to sit down and build a piece of software that looked at every possible case, we would finish out our lives covering, like a tiny percentage of all the things that can go wrong. But if you can have a reasoning system that has a fuzzy interface between the structured data and that's what LLMs do, and that's what these neural net systems do, is they say, well, this looks pretty similar to this, then we can start grouping problems into things we know how to fix. And we just take action on things that we believe are interesting, but they're not critical yet. And so we'll just kind of put it on a list. For humans, we say, oh, I've created a ticket for you, that certificate's going to expire. And things that are an emergency and do require human intervention because we don't know what to do. And then we're paging people. But we can take, I think a lot of what people believe today is just unavoidable work and move it into that automatable bucket. And we've redefined because of this, what qualifies as toil and say, oh, you know, a lot of things that we thought of as human unique are actually just automatable toil. Let's go automate it.
B
That's an interesting tape. Because one of the first things that I come to is that means we'll need less human toil in order to maintain our systems. But on the other hand, our systems become more and more complex. And so ultimately the amount of toil that we go through ends up being the same or maybe increasing still. But we're now more dependent on these things to make things happen, which is the normal space. That's the way things work, that's the way things grow and the way things expand. But that's really the case, right? AI is going to make this easier for us so we can do harder things. Really is the way it works.
C
Exactly. Now I remember talking to an engineer. We have implemented all this fabulous platform work. So many things from, like, stuff you would barely recognize the way deploys work at New Relic now everything is so much better. We're like, if everything's so much better, why are we still working so hard? Everyone's like, oh, well, because we just do more. The constraint on what we can achieve is that human bandwidth. And so if we let our humans accomplish more, then we can just get more done. We've never backed down and said, oh well, we're just going to work less now. Somehow that never pans out.
B
I wish I could get the mainstream media to start reporting AI and job use in those sorts of terms. Right. It's like AI doesn't remove jobs, it makes it so each person does more in their job and so therefore we do more. It's not that we do less, we do more. So anyway, that's a whole different discussion than this discussion.
C
Yeah. But it touches on a good subject, which is I do think, I don't want to be overly rosy on the economy at large. I do think AI will change jobs and while it won't remove jobs, it may move jobs. And so people have to change the way they do their jobs and how they do them. And I think back to the scary side of this would be the Industrial Revolution 200 years ago, where obviously the economy today is massively larger than it was before. So industrialization created so many jobs. But for a lot of people whose jobs were directly, you know, it wasn't impacted.
B
It wasn't that people were afraid their jobs are going to go away.
C
And they did. Their jobs did go away and it didn't turn out well for them, even though it turned out well for society. And so I think that we have to be always a little bit wary on that one. But you and I were talking a bit before we started as to like, how does that change the software engineer's job? I think that's an important question. Thankfully, I'm not writing production code for New Relic anymore. I'm safely away from the production keyboard, but I still keep my hand in. And I've been working with AI tools and things like that. I feel like it shifts the level at which you need to think. And none of this should be a surprise. The march of higher and higher level languages, of virtual machines, of cloud, in just our careers in a few short decades has been radical. The amount to which I tend to have to think about the infrastructure I run my software on has already changed entirely. And I couldn't tell you when the last time I inspected assembly language was. Whereas when I started doing your TRS
B
80 model one,
C
please. I was a TI 99. Four a guy.
B
Oh no, okay.
C
The. I couldn't afford a TRS 80. That's.
B
Well, I worked the radio, I could use the computer.
C
There you go. But you'll remember that there was a very long time Where I would feed something into a compiler and it would misbehave and I would pop open the hex editor or I would just be stepping through. I'm like, okay, what code did you generate, you dummy. And we don't do that anymore. When's the last time you looked for a compiler bug? It's super rare and that's great. That's been freeing. I let those parts of my brain be filled in with bigger architecture patterns. How do we apply these data structures? How do we move data between systems at scale? How do we work with petabytes? And that, I think, is that shift that we have to be ready for. And, you know, also looking back at new Relic, the purpose of our business is to make life easier for developers and software operators. So as their jobs are changing, it's not just that we ourselves change what we build, it changes who we build for and what problems they're facing. If people are in a world where it's easier to create software, well, did we also make it easier to operate software? Because all that software that everybody you know is like banging out with vibe coding, it's running somewhere and it's going to break. And now you can't just ask the person who wrote it, is it supposed to do that? Because maybe no person wrote it and it changes the field because it changes the problems that our users face.
B
Right, exactly. I was talking about that in the role of growing people from developers into more of software architects and how that's going to be critical. But it's the same thing. You move up the value curve because your tools are now smarter and more involved. I feel that's the way we're going to go. I worry that I see so many people talk about this is just going to replace all of that. It's like, no, no, no, no, no, no. There's always still going to be humans involved in developing software, but what we do to develop software is going to change drastically. But we're still going to be involved in writing software. So it's just kind of the way it works. So when I think about AI and observability, those two things together, this is a shift, by the way, we're change subjects a little bit here. When I think about AI and observability, I actually think about two different roles. And we've talked about one of those roles and that is using AI to help improve your visibility into an application. Great. And that's wonderful. And that's obviously from a product offering standpoint, something new Relic is very interested in et cetera. But let's talk about the reverse of that. And that is AI itself is still an application that needs to be observed. So how do you monitor AI systems?
C
Yeah, we somewhat confusingly try to because people love to lump this stuff together. We've tried to distinguish between there's AI for observability, which is what are all the ways that we can use AI to make a better observability product. And that's important because the job of the software developer, the job of the software operator gets harder every single year. And AI gives us a chance to make that life a little bit better for them and to try to claw back some of that complexity in today's super complex systems. And then the other side of this is, as you pointed out, it's observability for AI. So for people who are building these non deterministic, kind of weird AI systems, often at an API length, but it's a whole new tool chain. And so how do we help them? So we look at that. There's a couple of different levels similar to how we were just talking about the shifts of AI to the industry, that although they're unprecedented in speed and the types of things we're doing, the pattern smells awfully familiar. Like if history doesn't repeat, at least it rhymes. You can see that also on the observability for AI side that we've introduced disruptive new technologies in the past. Cloud, the web, some of these, no SQL databases. Each of these has required a different approach to what matters. What are the real golden signals of an AI system? They're going to be different than of your static web system. Just the same way static web systems are different than kubernetes where the way we deploy today has changed what we consider to be those like golden signals. So we're working to develop those AI golden signals. We're working this time in concert with the open telemetry groups. There's so much enthusiasm and so much breadth in this and so much stuff that's just on the other side of an API where you won't necessarily even be running the software systems yourself, but you're relying on them. And so open telemetry gives us a way to work together with the other members of the industry and to have something that each new framework that comes out, it can be instrumented from day one. It can be born observable, whether it's by new relic or by another open source system or another commercial system. So our initial offerings for AI monitoring that observability for AI. We started with our own agents, which are open source but are proprietary to new relic. And now we are looking to evolve that as we've been working with the OpenTelemetry Group on Generative AI.
B
So I'm not sure that answers one of my base questions yet though, which is there's fundamentally different signals you need to monitor AI, but is there fundamentally different observability models in general, or are they the same model or just different triggers?
C
I believe it's the same model, it's just different triggers. That a AI system is no more different to a web system than a database is that they have their own language, they have their own metrics. Like, we need to track tokens, we need to track sentiment, be set up for performing judges against. Like, I want to know, am I getting the right answers? Are my users seeing the things that I expect them to see? I also know I'm going to be paying costs. I know that there's parameters to this that are a little bit different and that a lot of people who, when they sit down and they build AI into their platform for the first time, it is their first time, they're not experts, they don't know necessarily what they're going to be looking for. And they're probably, and very reasonably, a little bit worried, like, is this going to do something weird? Is it going to, like, you know, eat all my money or like, anger my users? Like, I got to keep it in a. That's right. This mode of we're doing this for the first time and that's a place where as a tool vendor, you have to come in with a set of opinions. You can't just say, oh, users, you can do whatever you want. Yeah, of course you can observe any aspect of this that's not helpful to somebody who's trying to bring a weird new technology into their production app. For the first time, you got to say, hey, this is what you look for. These are the most important things. This is where a reasonable limit is. Structure it like this, Watch this data. Here's the type of thing you should be worried about and walk them through that process. I think that that's a tremendous opportunity for everybody in this space, is not just, are you providing that observability of the data, but are you structuring the understanding so that people know what they're supposed to even be worried about?
B
So when I build an application that relies on AI to do some things for me, one of the things that I buy into when I do that is the AI system is non deterministic. And that's both a benefit and it's also a curse. Both at the same time, Right? This non deterministic nature is critical to how AI systems function. It creates the ability for it to come up with unique ideas and hallucinate both at the same time. Those both become viable outcomes. So is there a role in observability, in monitoring or observing, excuse me, or understanding. There we go, that aspect. Like my system is hallucinating more today. Is that a problem? Or the data I'm getting in is causing more variation in my responses than is typical. I mean, are these the sorts of signals we're talking about here or is that a level higher than what we're really thinking about?
C
No, I think that's exactly correct. That's one of the key signals, is you should be able to take, let's say I'm feeding all of my questions to a third party API. I'm using their fast and affordable model. Maybe I want to sample one out of a thousand of these and run it against somebody else's model or a more expensive model to say, like, hey, is this question, did it get answered well? And so to judge it. And so that action of continuously supervising, right, you've created essentially like a call center with a bunch of pretty unreliable actors here with all these AI agents, you need to have a different model, a supervisor of some form that's walking down the aisles of this virtual answering system and making sure that those agents are giving the answers you want. And I think that this is one of the new signals that's different than I said a database or a piece of cloud infrastructure in that you have to evaluate quality of answers. But we can think of it in some ways as being equivalent to looking at response times or error rates. It is just a signal. And in the same way that AI is capable of turning these unstructured questions into unstructured answers, it also gives us a tool that we can use to take those unstructured answers and map them back and evaluate them on something that in past you say, well, I don't know, how am I going to tell whether these answers are any good? Go feed it to a human and see what they think of it. We want to flag them so that a human could review them. But if we're going to do this at scale, this has to also be done by AI.
B
So do you see companies like New Relic, do you see it as part of the observability space to do that level of analysis or simply to report on that level of analysis being done.
C
This is a dynamic question. Because it's such a fast moving industry at this point, we think of it as being we want to guide users towards doing this and give them reporting tools and kind of a structure so that you say, hey, you should be doing these type of sampling and evaluation or judging of your answers and we'll give you a way to like fold that in. It's not something that we are today doing for users and some of this is just because of things are moving so quickly and some of it is around data privacy. It's just this question of like what do you want to send upwards? But I do think that this is something that we have to consider very closely as more and more people go into this field and need an easy getting started that they want that kind of five minutes to Joy of I set this up, I started running it and now it told me that when I moved from Sonnet 4.5 to 4.6, all of the questions that got asked of my system about this particular like, you know, purchase flow started to get weird. Oh, okay. That's really important to me. I need to go back and look at my prompts or to look at maybe I've done a fine tuning or something that doesn't work anymore. And so I need to pay attention to that quality. And I wouldn't know that other than by getting angry customer tickets and Right. The goal of being an understanding or observability company is you should know what your system is doing. You should understand it before your customers complain. And I think that that extends not just into technical elements like are you using CPU or traditional web elements like is your page slow? But also into quality of responses.
B
You could expand that well into beyond just is your AI giving your customers answers that are not nonsensical? But you could expand that into where are my customers focused on my webpage and why are my customers going from here to there? And what I'm really trying to say is that once you expand into the how your application works with your customer, you open up a whole realm of areas beyond just the AI communication piece. Do you see observability moving into all of that as well?
C
Absolutely. I see observability as being in all of that, like moving nothing. I think I go back to my start. I said, I don't know, this is increasingly a while ago I remember working at a startup in the mid-90s or early 2000s. It's foggy, probably early 2000s for this one. And we had so many different alerts and all of these things that would tell us when something broke. But there was one TV up in the corner and it had the one chart that mattered which it showed our sales per minute. And we knew that if you broke something that number was going to go down, the business impact would be there. It didn't tell you what you broke. That was the job of all of the CPU and memory and database alerts. The job of that chart was to tell you, were you achieving your business goal? And I think that that's still fundamentally the most important question. More important than any of the stuff, whether it's AI or cloud or data, it's are you achieving your business goal? And why do I understand what people are doing and whether they're successfully accomplishing it? And if you're doing that, then everything else is just diagnostic. It's just helping you understand how to improve it or if something has hindered it. But the source of truth is whether if you're in E commerce, are you getting sales done? If it's like a social network, are people clicking on things like whatever it is your business exists to do, the reason you have this software, that's the most important thing that makes sense.
B
Thank you. This has been a great conversation. I'd like to end with one completely off topic question for new developers. People just out of college, just out of trade school, or just about to enter into their career and worried about whether AI is taking away their jobs, all that sort of stuff, all those normal things that are going on. What's the one thing you want to tell them?
C
If I can distill it down to one, I would say the first thing is I really do feel for them. I have friends who are just now getting started in the industry. It's a really rough time to get started because companies feel so much uncertainty. And that uncertainty, whether it's macroeconomic, global trade, it's AI, it's all these things are stacking up and it makes companies really reluctant to take risks and to take on people they don't want to hire. People who they're going to feel like they have to let go soon. And so every company right now is really slow on hiring. And I think that that's dispiriting. It's tough. It's just radically different than I'm sure the stories everybody heard about, oh, it's great in tech, there's always a job for you. It's got to feel really disappointing. And so my advice to people is going to be it's like, first off, there will be a job. It will be there. You probably, though, are going to have to pound pavement and work your network and do all the things that it took to find a job 30 years ago, before tech was hot. I think that that's the first one. And when it comes to skills building, yeah, spend time, spend time with Claude Code, spend time building software with this new tool set. But as you do it, take advantage of the ways in which those tools can be set to explain what they're doing. To be not just a magic machine that prints software, but a little bit more of the, like, diamond age. Right. Young ladies illustrated primer. Like, you want it to work with you as a customized teacher and to walk you through some of the things it's doing and take advantage of the ability of these LLM systems to give feedback and to explain. And this is something that even today, I tell you, lots of people, when they're writing, they say, don't use your AI tool to write prose. I don't want to read an AI written document. Do use your AI tools to read what you've written before you send it and say what questions are unanswered as a reader, like, what's confusing to you? And use those tools to hone yourself and to get sharp. You'll probably never end up building software. That's the same foundational levels that Lee, you and I started at when we started our careers. And I wouldn't expect you to. I would expect you to start today and to go to places that we can't even think of. But it is a bumpy time to start and so I do feel for you.
B
Great. Thank you. I appreciate that. My guest today has been Nick Benders. Nick is the chief technology strategist for New Relic. Nick, thanks again. Thank you so much for joining me on Software Engineering Daily.
C
Thanks, Lee. It was great to talk it.
Episode: New Relic and Agentic DevOps with Nic Benders
Date: April 14, 2026
Host: Lee Acheson
Guest: Nick Benders, Chief Technology Strategist at New Relic
In this episode, Lee Acheson interviews Nick Benders (Chief Technology Strategist at New Relic) on the evolution of observability—from the early days of basic monitoring and dashboarding to the present landscape dominated by AI-driven intelligent systems and the dawn of agentic operations. The discussion covers New Relic’s journey, the integration of AI/LLMs into observability, the future role of humans in operations, the unique challenges of monitoring AI systems themselves, and career advice for developers entering the field amidst rapid technological shifts.
Instrumentation Era:
Data Platform Era:
Intelligence Era:
Action Era (Emergent):
Three Approaches (Math → ML → AI):
Best Practice:
Reducing Alert Fatigue:
From Observability to Understandability:
LLMs & Summarization at Scale:
Data Structuring and Graphs:
Automating Toil:
Complexity vs. Toil:
Job Change, Not Job Loss:
Historical Parallels:
AI for Observability vs. Observability for AI:
AI’s ‘Golden Signals’:
Non-determinism as a Signal:
Proactive Guidance:
Current Market Difficulties:
Skills Strategy:
“Dashboarding and alerting as the way you do observability has reached its conclusion. You can’t go forwards anymore with a, ‘I’m going to make fancier dashboards or tinier widgets...’ Nobody wants to set alerts. Nobody wants to build dashboards. They want to know answers.”
– Nick Benders ([07:33])
“I believe that in the near future, when a user sets up their observability solution—which as you correctly point out, should be called an ‘understandability’ solution because nobody actually wants to observe... You want to understand.”
– Nick Benders ([19:10])
“If we let our humans accomplish more, then we can just get more done. We’ve never backed down and said, ‘Oh well, we’re just going to work less now.’”
– Nick Benders ([27:14])
“AI doesn’t remove jobs, it makes it so each person does more in their job and so therefore we do more. It’s not that we do less, we do more.”
– Lee Acheson ([27:51])
On the human side of AI in software:
“You’ll probably never end up building software that’s the same foundational levels that Lee, you and I started at… I would expect you to start today and to go to places that we can’t even think of. But it is a bumpy time to start and so I do feel for you.”
– Nick Benders ([44:50]-[47:17])
On business-oriented observability:
“The most important question—more important than any of the stuff, whether it’s AI or cloud or data—it’s, are you achieving your business goal? …Everything else is just diagnostic.”
– Nick Benders ([44:24])
The conversation is candid, practical, and occasionally nostalgic, balancing technical depth with approachable analogies and real-world anecdotes. Nick Benders is reflective and passionate about the value—and limits—of both humans and AI in the software operations lifecycle.
This episode provides a comprehensive lens into the fast-evolving world of observability and AI’s transformative impact on DevOps and software engineering. Nick Benders articulates New Relic’s course through the layered evolution from monitoring to proactive, agentic, AI-powered systems and offers a grounded vision of the future: humans working at even higher levels of abstraction as automation expands, along with pragmatic advice for new engineers facing a changing field.