Transcript
Narrator (0:00)
A distributed system is a network of independent services that work together to achieve a common goal. Unlike a monolithic system, a distributed system has no central point of control, meaning it must handle challenges like data consistency, network latency, and system failures. Debugging distributed systems is conventionally considered challenging because modern architectures consist of numerous microservices communicating across networks, making failures difficult to isolate. The challenges and maintenance burdens can magnify as systems grow in size and complexity. Julia Blaise is a product manager at Chronosphere, where she works on features to help developers troubleshoot distributed systems more efficiently, including differential diagnosis or ddx. DDX provides tooling to troubleshoot distributed systems and emphasizes automation and developer experience. In this episode, Julia joins Shawn Falconer to talk about the challenges and emerging strategies to troubleshoot distributed systems. This episode is hosted by Shawn Falconer. Check the show notes for more information on Shawn's work and where to find him.
Shawn Falconer (1:18)
Julia, welcome to the show.
Julia Blaise (1:19)
Thanks, Sean. So nice to be here today.
Shawn Falconer (1:21)
Yeah, absolutely. So I wanted to start off digging into your background a little bit. Can you talk a little bit about your journey into the world of microservices observability, what led you to Chronosphere and why you're interested in sort of these issues around troubleshooting?
Julia Blaise (1:38)
Yeah, absolutely. Well, I started out as a librarian. Actually, maybe not the most traditional career path into tech. I worked at the Library of Congress. I got a fellowship there. I went from Library of Congress to actually working at the Smithsonian. I will say it was less like what you think of as traditional librarianship, kind of written word librarianship, and a little more digitally focused librarianship. I was working with scientists and researchers and I was helping them store and organize their data so that they could ask questions of it and get the answers they needed. You know, going from information to insight, as I used to say. And then I did, you know, I was working in D.C. at the time, and I did eventually move over to work at a company called Palantir. It was maybe less well known in 2014 than it is today, but the reason I moved over is at least at the time, you know, they really talked about their software as a fundamental tool to help the government do something very similar to what I had been doing as a librarian, right? That is like understand, organize, analyze their data in a central location with a central toolkit. And I think government agencies faced really similar challenges to those faced by the scientists I had been working with, which is the data was stored in silos and each silo is organized differently. And you had different tools to work with each silo. And very few people really had been putting in the manual effort to understand how to work with all those different data silos and get that data together to provide insight. So you can probably see kind of where the through line is. My like information to insight role as an individual contributor to going to work at a company where that seemed to be their whole purpose. And it was really exciting, you know, and I really kind of enjoyed that path from librarianship into tech. While I was at Palantir, I started out in again that government facing side of the business in a customer facing role. So the first time I actually engaged with observability, I was actually what we would say is high side. I was in a customer secure computing facility. I had been on call, it was late at night. And you know, our developers for that software, they weren't always able to get on those government sites, right? They weren't always able to come out there and actually get hands on keyboard to see what was happening when something went wrong. So they would rely on people like me to kind of sit at the computer, be on the one phone line that could connect to the outside world and be their hands and follow their instructions. So I think my first engagement was, hey, I need you to grep for something that looks like this. And I was like, cool, what is grep? I don't know. So they really walked me through, you know, what it means to ssh send somewhere, what it means to grep what a log is, you know, what a metric is, how to describe what's on a metric dashboard so that they can kind of guide me through what else to look for to help them diagnose the problem. And it was really interesting, you know, I really enjoyed engaging with that side of software and it really demystified software a lot for me, which I appreciated. So as I spent time at Palantir and as I grew and the company grew, I actually moved into their product. Org and that's where I started learning about the difference between sort of monolith and microservices and on prem infrastructure and containerized infrastructure, because I was working with teams that were doing both. Some that kind of started natively building their services in K8s, others that had built services in a monolith and then were kind of trying to migrate them over to work in a more containerized environment and split that up into microservices. And it was really challenging, you know, and there were challenges on both sides and I enjoyed helping people with those and kind of working on those challenges, which brought me to Palantir's central observability team, which at the time was called their signals team. And that was the whole purpose of that team. So on that team, our challenge was to take all that telemetry data from all of the software, whether it was on PREM or commercial cloud or govcloud or secure cloud, monolith, microservice, whatever it was, and kind of develop tools and methods to bring that data into a central place where they could use it to troubleshoot issues. Of course, that did bring me to Chronosphere, I think, pretty naturally. We actually interviewed Chronosphere as a vendor at one point when I was at Palantir in that role. And they were just honestly, some of the most transparent, expert, nicest vendors I had ever interviewed. And I was just like, this company gets me. They really understand my problems. They understand my engineers problems. Palantir, at the time, I had been there for six years, it had gone public. I felt like I had kind of reached the end of what I had wanted to do at that company. I was looking for a new role, and Chronosphere seemed like a natural fit.
