
Every year kicks off with an air of expectation. How much of our Professional Life in 2025 is going to look a lot like 2024? How much will look different, but we have a pretty good idea of what the difference will be? What will surprise us...
Loading summary
Tim Wilson
Foreign. Welcome to the Analytics Power Hour. Analytics topics covered conversationally and sometimes with explicit language.
Michael Helbling
Hey, everybody, welcome. It's the Analytics Power Hour. And this is episode 262. Hey, happy new year. You know, 2025, that'll probably be the year of. Well, what exactly? There is a pretty steady flow of prognostications every year about the things that will define the coming year. And we're not, you know, completely immune to desire to define the future. I didn't say that very clearly, but we do want to define the future. So what will 2025 bring? It's probably the year of Tim Wilson still being frustrated with people calling stuff the year of.
Tim Wilson
That's fair.
Michael Helbling
Probably accurate. Yeah.
Tim Wilson
You could write in with him being frustrated with people. You don't really need to say further qualifiers not necessary.
Michael Helbling
We still like you. And 2025 probably be the year of Mo still liking Adam Grant and Brene Brown. Hey, Mo.
Mo
Yeah, probably. Actually, that's a very good prediction.
Tim Wilson
There's going to be a huge scandal with one of them between, like, recording and that coming out.
Michael Helbling
Oh, geez. All right, well, and I'm Michael Helbling. Well, some attempts at categorizing the future that is coming at us awfully fast is definitely warranted. So what better time than the first episode of 2025? You know, insert Zagger and Evan's pun here. And to do this right, we wanted to have a guest who has a great track record of observing our industry and seeing where the puck is going. Bar Moses is the co founder and CEO of Monte Carlo, the data reliability company. As part of her role as CEO, she works closely with data leaders at some of the foremost AI driven organizations like Pepsi, Roche, Fox, American Airlines, hundreds more. She's a member of the Forbes Technology Council and is a returning guest to the show. Welcome back, Bar.
Bar Moses
Thank you so much. I am honored and pleased to be a returning member.
Michael Helbling
No, we're serious. We love the way that you take such an interest in really having, from your level, a real good, clear view of where our industry is and the data industry is going. Before we get started, let's just get a recap of what's going on with you and Monte Carlo.
Bar Moses
Yeah, it's been a whirlwind couple of years for not only for Monte Carlo, but I'd say for the entire data industry. Like, I'm just reflecting. Last time I was here, this was 2021. Is this just kind of, you know, coming out of COVID I think we, you know, we're all getting Comfortable behind the camera and feeling, feeling comfortable at home. And you know, the world is obviously very different today, but maybe just to kind of give a quick recap. You know, Monte Carlo was founded to solve the problem of what we call data downtime. Periods of time when data is wrong or inaccurate. And you know, five, 10 years ago, that actually didn't seem important at all. Like, I think people spend some time thinking about quality of data, but you guys know this better than I do. But it probably didn't get the diligence that it deserved back then. Like, you could kind of like skirt around the issue. Could probably, you know, it was very common at the time to just have like extra eyes on the data to make sure that a report is accurate. And if it was wrong, you kind of be like, ah, shucks, so sorry, and kind of like move on.
Mo
I also, but sorry to interrupt, but I also think it maybe wasn't as complex and so like, you know, as complexity has grown that the ability to troubleshoot and dig into the why it's not reliable is even harder. But sorry to break your stride there.
Bar Moses
Not at all. No, I think that's spot on. And maybe just to unpack that a little bit, I think it was less complex because one, the use cases were limited, right? So today we call it data products and very fancy names for, you know, but, but the use case was maybe just revenue reporting to the street, right? And you know, the. So these cases were fewer, that the timelines were fewer. So, you know, you maybe use data like once a quarter to report the numbers. And also there were fewer people working data. So maybe it's like a couple of analysts under the finance team. And so you really had a lot more time, less use cases, less complexity, and the stakes were lower, right? And so in all of those instances, it kind of didn't really matter if the data was accurate or not. And then there was this big wave of actually people starting to use data. Remember when people would say, oh, we're data driven, and you kind of like didn't really believe them. There was a period back in time, it's still happening, still happening. Totally agree with you. I think there was this big push and that's when Monte Carlo created the category of data observability, which is basically allowing people creating data products, whether those are data engineers, data analyst, data scientists, anyone working with data to make sure that they are actually using trusted, reliable data for that and helping when someone's looking at the data and like, what wtf? The data here looks wrong. You Know, helping those people kind of answer the question of what's wrong and why. That was sort of kind of like the, the reason how Monte Carlo was born. Now, fast forward today. I can't believe it's almost 2025. It's like four years since, you know, I like to say that I think the data is she a little bit like Taylor Swift. We kind of like reinvent ourselves every year. We need to like an eras tour and kind of like go through all of the, you know, periods of time of the data industry. And I think the most, you know, the most recent era being swept by generative AI, the implication of that means that bad data is even worse for organizations, and we kind of unpack what that means. But, you know, at a very high level, what Monte Carlo does is help organizations, enterprises make sure that the data that they're using to power their pipelines, power their dashboards, power their generative AI applications is actually trusted and reliable. And we do that by first and foremost knowing when there's something wrong, like knowing if the data is late or inaccurate, but then also being able to answer the question of why is it wrong and how do I actually resolve an issue. I'll sort of pause there. Sort of a long answer and a lot more that we can go into, but it's been a fun couple of years.
Michael Helbling
Nice.
Tim Wilson
Well, but also, I mean, one, I guess just to clarify, we're not saying that in 2021 people weren't using data. I mean, that's been ramping up for a while, I think. Also the modern data stack, I'm not sure where that phrase was in the inflated expectations versus it definitely. I feel like since the last time you were on the modern data stack is a phrase has slid into the trough of disillusionment at least a little bit, which is kind of interesting. I don't know exactly how that applies to kind of where we're going from here, but I feel like there was a point where it was like, if we just have all these modules plugged in together with the right layers on top of them, then like, all will be good. And it feels like we're, we're a little past that. That, that, that nirvana, even if we got there, wouldn't actually necessarily yield the, the results that were being promised.
Bar Moses
But yeah, I mean, I think, look, putting myself in sort of the shoes of data leaders today, you're facing a really tough reality because, like every 12 to 18 months you're being thrown at with sort of a new concept. Call it modern data Platform, call it, you know, generative, call it whatever you want. You know, you're sort of expected to be on top of your game and sort of understand the, you know, word or trend du jour. But I think if you sort of unpeel that for a second and go back to fundamentals, there are a couple of things that I think remain true regardless and have remained true for the last 10, 15 years, which is first and foremost, like, organizations want to use data, and data is a competitive advantage, how you use it and in what ways, like, I think that is undisputable. Like, strong companies have strong data practices and use that to their advantage. You talk about how, for example, you. You can use it for better decision making internally. That was sort of one of the dominant use cases in the beginning. You can use it to build better data products. Like, for example, you can have a better pricing algorithm. And I think today you can talk more about this. But I think data is the moat for generative AI products, solutions. And so regardless of where the hype cycle is, I think one core truth is that data matters to organizations. What we do matters. And so data continues to be a core part for organizations. I think the second sort of fundamental truth that we believe in is like, reliable data matters. The data is worthless if you're working with, yeah, you know, like, it's. This even goes without saying, but, like, having something that you can trust in is sort of fundamental to your ability to deliver it. And then I think the third thing that sort of always remained true is like, innovation matters. Like, you have to be at the forefront. And so organizations that are doing nothing about generative AI or doing nothing to kind of, you know, learn what's next will be in a difficult position. I'm curious for your takes about the, you know, the modern data platform in particular. I think one of the, you know, benefits of that was that data, data leaders were met with many solutions for many problems, but actually were inundated with perhaps too many solutions and so ended up in a position where they had to make bets on a variety of solutions and ended up with maybe sort of a proliferation of tools. And now there's a big movement to actually consolidate that or cut back to what's necessary. And so if you're not solving a core fundamental truth, then you probably don't deserve to live in the modern data stack, if that makes sense.
Tim Wilson
You don't deserve to live in the modern day.
Bar Moses
I'm sorry.
Mo
I so deeply love when the podcast. Podcast intersects with things that are, like, completely churning through my brain at the moment. And it is like this beautiful, like, chef kiss. Because these are all kind of concepts that I've been giving a lot of thought to over the break. I. I want to dig into what you. You mentioned. Data can be a moat. Can you. Can you say more about that? Especially you said, I think relative to gen AI.
Bar Moses
Yeah, for sure. I'm happy to. I think what's happened to. Let's think about the last. I want to call it a year or 2 in generative AI. I actually start by sharing a survey that we did that I thought was really funny. We basically interviewed a couple hundred data leaders and asked them what percentage of data leaders are building with generative AI. Can you guess what percentage of data leaders?
Michael Helbling
Probably all. All of them are saying that they are at least.
Mo
Really?
Bar Moses
Yeah. So, like, I think like, 97%. Like, not a single person. Yes. That's. You're spot on, Michael.
Michael Helbling
Oh, no, we're all doing it. For sure.
Bar Moses
We're all doing it. We're all doing it. Everyone.
Michael Helbling
2025 is the year of maybe building with AI.
Tim Wilson
Maybe.
Bar Moses
Maybe we're all doing it. Right. How often do you do a survey and get almost 100% response rate for a question? It's pretty outlier. Second question that we asked was, what percentage of you, or do you feel confident in the data that you have? Do you trust the data that you have that's running it? What do you think is what percentage of people trust the data that they're using for generative AI?
Tim Wilson
70%.
Bar Moses
That's not bad.
Michael Helbling
It was the 70. Okay. Because usually the Duke Business school used to do a CMO survey every year, and they would ask data questions like that, and there was usually about a 60% gap between how important it is versus how much they trusted it. It was always a very big delta.
Bar Moses
So, yeah, that's exactly right. So 60% said they don't trust it. So I think there is. That's exactly the delta. So only one out of three trust and two out of three don't trust the data. So it's interesting that everyone is building generative AI, but no one has the core component to actually deliver said generative AI. I think that speaks more to kind of human nature. Right. And what we want to be where we are.
Mo
Can I ask. This concept has been rolling around, and I've been, like, digging up old blogs on it, but it just seems to have dropped off. There was a lot of hype. I feel like it was probably two years ago, but I mean, the last four years have blurred together, so it could be anywhere between two to six years about a metrics layer. Right. And it's. I feel like I've done all this, like, had to do all this, like, mental processing around, like, how does a metrics layer or semantics layer differ from like a star schema data warehouse to like, have a reliable data set? But it doesn't seem like anyone is talking about that right now. And I'm, I'm curious to hear your perspective.
Bar Moses
Wow, this is a, that's a really good question. You know, I think there's, you know, I'm curious for your opinions, but I think sort of going back to like, you know, sort of the Taylor Swift kind of analogy from before, there is this, like, I think there's this desire to like, chase the shiny object right now. And going back to this survey, like, if you're not talking about generative AI, you're going to be left behind. And I think there's a lot that goes into delivering generative AI right now. We can talk about what those things are. And I'll go back to your MO question for a second as well. But I think if you're not on track or have a really strong solid answer to how you're on track, you're kind of on the hot seat right now as a data leader. And so that, I think that has just sucked the air out of the room in every single room where there is a data leader or an executive leader. And I'll explain what I meant by sort of data is the moat. I think if you think about what a data needs to do now, basically the first thing that's being asked is what models are you using, what foundational models are you using, what LLMs are using, et cetera, between OpenAI, anthropic, et cetera, there's lots of options. The thing is, every single data leader today has access to the latest and greatest model. Everyone has access to that. And so I have access to that mo. You have Michael, you have. Everyone here is access models that's like, supported by 10,000 PhDs and, you know, a billion GPUs, right? And that is true for me and every other company around me. So in that world, how do I create something that's valuable for my customers? How do I create something that's unique? Like, what is, what is the advantage? Like, I can create a product just like you can create a product. And so what's the distinguishment here? Like, why, you know, if, like, for example, if I'm a Bank. How can I offer a differentiated service if I have access to the exact same model as you do and the exact same ingredients of a generative AI product, if that makes sense. And so I think what we're learning is that in putting together these general generative AI applications, which are today really limited to chatbots, if you will, or sort of agentic solutions, etc. And all of those instances, the way in which companies make those products personalized or differentiated is by marrying, by introducing their enterprise data, basically corporate data. And so let's just take a practical example. Like let's say I'm a bank and I want to build a financial advisor solution. I want to be able to help Tim fill out his taxes. And so I'm going to be able to do that better if I have data about Tim's background, his car, his house, whatever it is, I can offer you a much better differentiated product if I have reliable data about Tim that I can use. That's the only difference between bank one and bank two. It's what kind of data do we have to power that product? Just to summarize, we all have access to the latest, greatest models, but the only thing that differentiates different generative AI products is the data that's powering them. And so that's why data is actually remote in the world of generative AI.
Tim Wilson
But I, I mean, I'll, I guess, counterpoint, like, I feel like that that is coming from a, that's coming from a super data centric perspective. I mean, and I guess this is what this is, what terrifies me is that year 2025 could be supercharging, this obsession with more, more, more, more, more data. As you throw more data in, then it's harder to keep it clean. You've got more things that can conflict. And so absolutely. And we fought this battle in the past where you chase all this data because anytime something isn't seen as valuable, the easy thing to default to is to just to point to some data that's not clean enough or not clean. It may be clean enough, but it's never going to be perfectly clean or data that's missing. And so that can feed this horrendously vicious cycle where we completely lose sight of what are we trying to do. And oh, what we're trying to do is get as much data as possible. The counterpoint is those banks could differentiate by thinking about, with way less data, what their customers really value, what they most need. Right. And it's not an either or, but if there is a Deep understanding of their customer and they value something. It may need very little data. It may be using data in a different way from they already have it. So I think there has to be that balance. I would, I would hope that we get to that point of like, we can't just be in this arms race for more and more models, more data, more whatever. So.
Mo
Okay, Val, okay, so my visceral reaction, my visceral reaction is like, I can absolutely see that some people would use like what you're saying, like the Gen AI hype train to be like, we need more data. I don't think that's what Bar is saying. But I will obviously give you the opportunity to speak for yourself yourself because like my reaction is, but it's not about the quantity. It is about the quality. Like it's, it is not about let's, let's collect more data. It's that we have, the last few years has been all about like, let's have fucking data lakes. Let's just dump data from backend services into anywhere and it's created. I mean, I think we've said a swamp before, but it's like it, you can't ask important questions like what do my customers value if the data that's there is a complete trash fire. And I don't think it's about quantity you're drawing.
Tim Wilson
There's also this distinction of like, it is so easy to say I found an error in the data, this field is missing or this field is incorrect. Ah, fix it. As opposed to you just, you just said if your data is a dumpster, a trash fire, there is, there is a gradation of which so put aside the more and more and more data and bring in the pristine data. That point. It is so easy to find a problem in the data and chase that and extrapolate from that. So absolutely we need proper governance. But you can replace either more, more, more data, which they're absolutely. You can, you can google for it and find all sorts of articles that say who's going to win? Or the ones who collect all the data. You will find. I completely grant you a, the data has to be garbage in, garbage out. I mean that is like a pap. That may become my next favorite thing to hate on after. In God we trust. All others must bring data. Like it's so easy to say garbage in, garbage out. It's like, well, people are not pouring garbage in. Yes, there are errors, yes, there is process breakdown. Yes, there needs to be governance and observability, but it is so easy to say that if we're not getting value out. Oh, it's a, it's a data quality issue. And now you can get equally obsessed around over chasing that. So. So, Mo, I feel like you were putting, you were again, putting words in my mouth and like, well, you. It's not that at all. But no, no, no.
Mo
I just, I think sometimes that, like, when we're discussing this concept, there are like extremes and it's says the one.
Tim Wilson
Who said dumpster fire.
Mo
Or like, it sometimes is interpreted as a binary thing. And it's not like, I do think there is a spectrum. It just often happens that you're at one end of the spectrum and I'm at the other end. But let me just elaborate what I mean by quality, because I again can see a situation where a business goes, we must have perfect data. And that's not what I'm saying. I'm saying the data has to be meaningful so that you can create connections between different data sources and that the way they relate to each other is consistent so that like, different areas of the business are not like, tripping over themselves making mistakes because it's like, fundamentally so unstructured. And so like, to me, it's about how all those things connect together. It's not just about, like, is this number accurate to the 99th percent or whatever. It's, it's. I don't know. I'm gonna just shut up and let Bar talk because I feel like she probably.
Bar Moses
No, I love this. I've been, I've. I love hearing Yalls thoughts. I'm. I'm. Yeah, I love it. Well, okay, so a couple of thoughts. One, obviously I'm biased, right? Like, I have a very data centric view. I will not for a minute pretend that I have nothing but bias, right? And I think my bias comes from a place of like, yeah, I think data is like the most interesting place to be in the past five, 10 years. And in the next five, 10. I think it's like the coolest party that everyone wants to be a part of and like, they should. And, you know, I'll continue thinking that, you know, I, I have strong. I wake up every day and choose to be part of the data party. And I think it's where we're having fun. So, yes, I'm 100% biased and I agree with you. I think data hoarding has been a huge issue, a huge problem, and I think it's been sort of a strategy that has largely failed. Like, oh, let's just collect all the data and like hope that it solves or you know, think that more data is more helpful. It's actually interesting. I was just sitting down with the founder of a data catalog company a couple of days ago and we were talking about how 95% of the problems that people, 95% of the questions that people have of data have already been answered. And so their challenge is just finding the answer and surfacing it. There's very, very net new insights being created, if that makes sense. And so really their challenge is about how do we help company or help people users discover the answer versus create a new answer. Which is actually mind blowing if you think about what a small percentage of like new insights are generated. Like, it sort of made me a little bit sad for like, you know, the human race, but also happy that maybe we can solve this. But you know, I think that I digress here. But my point is, I think what you're, what the point that you're making, Tim and Mo, is an important point. I am definitely not. I don't think that more data is necessarily better. In fact, I think there are a lot of areas where like, less is better and like more, you know, precise answers are better. For a minute. I'm not advocating for that. Not at all. I think what I am saying is most of the, you know, if you look at like ChatGPT or kind of things that like anyone has access to that is trained on data that everyone has access to, like we can all sort of, you know, it's funny, you know, people used to say, let me Google that for you. And I was trying to think, what's the new, like, let me perplexity that for you? I don't know, it doesn't, doesn't like roll it off the tongue just as much.
Tim Wilson
Yeah, well, let me ask Claude would work, you know, so.
Bar Moses
Exactly. Let me ask what Claude says. But I think the point is like from, from that perspective, everyone has access to that and also everyone can use those models to, you know, to, to train their data. And so everyone sort of has access to that. But if you have some data about your users, right, let's take like, I don't know, like a hotel chain that's trying to create a personalized experience for their users. Like no one knows as much as they do about, you know, I don't know, the like, how you like to travel, the kind of food you like to eat, the kind of, you know, ads that would speak better to you. Not that I'm advocating for like an ad centric world. But my point is like the power today and where I think the leverage lies in is in having things that not everyone has access to. And the reality is everyone has access to the latest and greatest LLM. So that cannot be your moat or your advantage. It by no means, means that we have to have too much data or a lot of data. I'm not advocating for that and I think it's a very important clarification. I actually will say that oftentimes in the companies, at least, that I work with, one of the biggest challenges is that they have so much data they don't even know where to get started. And so a lot of the work is actually saying, let's try to, you know, you can think of like layers of important data, tier 1, tier 2, tier 3. And think about like, what's the core data sets that we care about, making sure that those are really pristine and reliable. So oftentimes, like actually starting small is the winning strategy. I find when companies, you know, when we work at the company, company is like, I want to observe everything wall to wall. I'd be like, whoa, whoa, whoa, hold on, that's going to be really hard. Like, tell me why are you actually using all of that data? And that strategy often fails. And so I'd much rather start with what's a small use case that you actually really are using the data for and that's really important for your users. Let's start with making sure that that's really highly trusted and reliable. So I agree with you is my point here. And I think it's an important clarification.
Tim Wilson
Beau, are you gonna.
Mo
No, I am like waiting for the next rant.
Bar Moses
We can rant, by the way. I'm happy to rant about garbage in, garbage out. I think that is a great rant. I'm happy to carry the torch on ranting against that. Tim, if you'd like. I don't know if you want to share why you want to rant. I'm happy to share my rant about it. Go for it.
Mo
So I'm curious, Tim, when I said that stuff about connectivity, what's your views on that? Because I feel like you can only answer important questions if, if the data is like kind of, I don't want to say structured, but I'm thinking about like Boris comment of, you know, the competitive advantage that you have is your data set. Like it's not the models, right? So like how, how that all works together then to me becomes the most important bit. And like, I really like Boris concept. Actually, someone in my team did this recently where they went through of like, what's Tier 1T Tier 3? And like, I think it's such a great framework to help the business understand like the different levels of importance. But like, Tim, what's your thoughts on like that connectivity piece?
Tim Wilson
So one. I mean there is, there is nuance. I try to not say things like it all has to be connected or it's a dumpster fire or it's perfectly pristine and maybe I fell into it a little bit and then we chased the more and the more and the more. But, but I mean, I would love for there to be a little bit more discipline and nuance. Like is bar when you said starting small. There is no pressure, no force in business right now that says when doing anything with your data, you should go lock yourself in a room with some smart people on a whiteboard and then come out with a mandate that it's an absolute minimalist approach. And then you build from there. Because when you say some what, where, and I feel like I see this and I see it, I mean, I'm spending too much time on LinkedIn and reading articles that if someone says this is data that we uniquely have as a bank or a hotel chain, therefore they make the leap to we have it, therefore we need to feed it in and connect it because that is something unique to us and therefore it provides competitive advantage. And that there's, there's kind of a. That's the default position is it's our unique data, we must use it. And what, where I see that going wrong is there's a missed step to say, like, really, like, just because we have it uniquely doesn't mean it's necessarily valuable. If somebody says, here's why we think it can be valuable, what's our, what's our minimum viable product? What's our minimum way to test that it would be valuable, but instead it kind of is. Like there has this tendency to say it's ours, put it in the system, make sure it goes through that it's pristine, which when you flip it around to LLMs, like they're, they're doing stuff probabilistically, like hallucinations are coming out. All of that's getting better. But it's like even with pristine data going in, it's going to give kind of inconsistent results. And we're kind of like, oh, that's cool. Well, it's like, well then I can't remember who wrote it. Might have been Ethan Malik or somebody who pointed out like yeah, like data that's got noise in it putting into something. It's not that if you put pristine data in, you're going to get a definitive deterministic answer out. If you put pristine data in, you're going to get a probabilistic answer out. If you put noisy data in, you're going to get probabilistic with a bigger range of uncertainty. And I just, I think there's just thought and nuance to say if you had a bias towards less and it's not saying don't do it, it's just saying move with deliberation. So that like you figure out something is a tier one and then you say that's tier one, it's a differentiator. Lock that in and make sure that it is clean. And when you're connecting it to something else, you know, so that's. Well, that was, I guess that was, I was like, I'm not going to rant about this. I'm going to have a very nuanced thing to say and then whoop, here it comes.
Mo
That was very eloquent. No, that was eloquent. But okay, can I add some color to the situation? Right. Like I feel like there are some companies that still have like a highly centralized model for how they store their data or how it's built, that sort of stuff. Like my world is very different to that. Everything's done completely decentralized. So like in marketing we have marketing analytics engineers and data scientists creating data sets. And then over in the growth team there are people creating data sets and over in teams and education. And like even if you start with that, like let's do something small, it's often created in isolation. And the, the problem is, is like it's really hard to answer a cross cutting business question. Like what's important to our customers or what do our customers value when everything is built in this completely decentralized model. Because like if I take my tier one tables and like data sets that will be completely different to another department's tier one data set and like you might not be able to answer that question. I agree. Like just to be clear, I totally agree. I love this idea of like starting with less, but you can only start with less if it is. I don't, I don't know if the right word is like company wide or like it's centralized. Like I feel like there's this tension in how technology is built in some companies quickly.
Tim Wilson
Unfairly. I'm going to admit this is unfairly. Picking on an example that you just threw that. If it's like what do our customers value? And it's like, well I have to have all the data and hook it all together. Or I could field a study and ask them, you know, like there is that there's this story out there of I'm going to plug in, I'm going to launch my Internet and I'm going to say what do our customers value the most? And then through all of this magic, it's going to generate it. And you say, well why can't it has to connect all of this stuff. If that's a fundamental question, then there are alternative techniques that have been around for 50 years which is usability testing or focus groups or panels for some of that. That's unfair because you just yanked that out as one example. So I'm going to acknowledge just a random example.
Mo
But yes, I agree that there are other research methods that would be more appropriate there. Again, I'm going to shut up and let Bart speak.
Bar Moses
No, not at all. I love this. I feel like I'm asking questions that I haven't thought of in a while. So that's good. No, I mean, look, listen to this. My reaction is a couple of things. One is going back to sort of data is being faced with sort of a really tricky part of their journey, I think. And you talked a little bit about sort of what does a grade model look like for a team? Is it sort of centralized or decentralized? And I think organizations go back and forth on that. And it also is a little bit of like a function of the environment in which they operate. So we work with, you know, highly regulated companies who operate in a highly regulated environment. So think like financial services or healthcare or anything like that. And in those instances they're actually, you know, privy to significant regulations and audits. And in those instances you really need to have really strong data management, data quality controls in place and oftentimes that needs to be across your entire data estate. And that is sort of, it's sort of like a table stakes. You can't really operate without that. I think that's very different from you know, like a retailer organization or retail company or you know, an E commerce company. So you know, first and foremost I think this is really dependent on where what the environment you're operating and also what problem are you trying to solve when we say data products or generative AI applications? It's very broad and I think if you really think about what actually is being used, there's a couple of things. One is creating a personalized experience for your customers. But it can also be inwardly looking for a company automating internal operations. So an example, a Fortune 500 company that we work with, they have a goal to have their IT organization. 50% of their IT work needs to be either completely AI automated or AI assisted. That's sort of their goal. And that's in terms of internally automating sort of human manual tasks. And so, you know, I think it sort of depends on what you're trying to solve. And I think that that's sort of what data leaders need to ask themselves today. Maybe sort of one thing that's coming out of that is I think there's this sort of blurring line between different people working with data. So, you know, in the past there's sort of, you know, you could really draw the lines, I think, more clearly between engineers, data engineers, analysts, data scientists. All of that is becoming a lot harder to distinguish. And I think my view is sort of in, you know, the teams that will be building generative applications will be a mix of that. So it will include both engineering and data people. Like, I don't think, I think, you know, how does this work? Like someone wakes up in a company and is like, hey cto, go build a generative application. And so like a bunch of engineers like run off and build something. And then someone is like, hey cdo, Chief data officer, like, go build a generative application. And like the data team runs off and like build stuff. And so you end up having data teams trying to build stuff that software engineers should be doing and software engineers trying to build data teams. But at the end of the day, like a strong generative application or any data product needs a good ui, which should be built by software engineers. Like, you're not going to like, that's not the data team's job. And it also needs like good data pipelines and reliable pipelines. And that doesn't make sense. Like you don't need a front end engineer to build like a data pipeline. And so I think at the end there will be some convergence of like what the roles are. But right now there's a lot of people sort of crossing the lines and lots of blurry lines in between.
Mo
And what's your perspective on data products being more as like a platform product like, versus? I don't know, I feel like there's been, there are many kind of ways you could cut it, right? Like, like sometimes data products seem to sit more in like a marketing technology space or whatever. But like it seems at the moment, there is kind of a lot of perspective about it really sitting in like that product platform sphere. And like product PMS are quite different as well to like a customer facing product manager.
Bar Moses
Yeah, I mean, I think if you look at like the product. Oh, go for it, Tim.
Tim Wilson
Well, I just, I just want to clarify. So when you say a platform, are you saying the data product is a platform that then gets kind of winds up serving a bunch of different use cases or are you saying just where are you saying organizationally or are you saying what the. The data product is a platform with a bunch of features? Like, what do you, what do you mean by.
Mo
Yeah, when I say platform product, I'm more meaning like the products that you build, suppose in house that serve as like the platform for internal stakeholders and like the tools that you're building to service your organization. And I suppose like, as I'm saying this out loud, I'm like, I suppose you could have data products that would be doing that and you could also have customer facing data products products and those things would probably be different. Oh, wow. I've really answered my own question there, haven't I?
Bar Moses
No, it's okay, I can elaborate. But I think you did, you did answer parts of it. So maybe also just like taking a step back for a second, if you think about data products and where they are in the hype cycle, like, I think they're sort of like, you know, it's like there's this hype and then they plateau and then you're like, oh, now I can actually make use of this. And I think that's where data product, there's like, oh, now I can actually really use this thing, which is good, I think. I think data products can really mean whatever you want. It can both be. It could be, you know, let's walk through a simple example, like an internal dashboard that like, you know, the chief marketing officer is using every day. Right. And so it's basically like a set of dashboards or a set of reports. And then there's a lot of like tables with this, you know, followed by a particular lineage that feed into that report. And so it could be a combination of, you know, user attributes and sort of different information about those users and also some user behavior and could be a bunch of sort of different third party data sources. And so all of that can be part of a data product. So from, you can describe that as basically like all the assets that are contributing to said reporter dashboard that the CMO is looking at. My point is you can Basically use dataprice as a way to organize your data assets and to also organize your users and data teams. And so to me it's less of a question of is this part of a platform or not, because that varies, as I mentioned, by the organization, the size, the maturity of the organization. For me, it's more a way for companies to organize what they care about. And so oftentimes, if we will work with a data platform team, we'll say, hey, what's the data that you care about? And then they might tell us, oh, we have a marketing team and that really focuses on our ads business. And the CMO there looks at this dashboard every morning and they are so sensitive to any changes that they have there. And so we want to make sure that all the data pipelines from ingestion third party data sources through transformation, all the layers through to that report, we want that to be very high quality and accurate. So we want to make sure that that entire data product is trusted. That's like one way to think about it. Now the ownership of those assets can be by the data platform itself, or it could be by the data analysts that are actually running the reports. Oftentimes it's a combination of both. So you might have data analysts looking at the reports, the data platform running the pipelines, that totally separate engineering team that's owning the data upstream and sort of the different sources. And so oftentimes it's actually all of them are contributing to this sort of, you know, said data product, if you will. But to me, where data products are most useful is in a way to organize data assets and organize a view of the world for a particular domain, for a particular use case, for a particular business outcome, if that makes sense.
Tim Wilson
Do the data product, this is, I guess for both of you data product product managers, like, what's the breadth? Do they engage all the way up to the upstream engineering, owning the data creation all the way through to the use case and the need? Or does it like where did, is there a natural cutoff where they say this is now, this is engineering's problem. They're just, they need to be managing the data coming in. Or like how broad does that role go, assuming it? I guess maybe there's a precursor question. Does that role get defined and exist as you are a data product product manager for this data product or set of data products? And if so, what's the scope of that role?
Mo
Yeah, doesn't it depend on the organization? Like, I mean, we're having lots of conversations at the moment because Like I said, we have a decentralized model which is quite unique, right? Because like, well, it's not unique, but like, it, it creates different layers of accountability. Right? Because like if you have engineers that have a backend service and they're pushing that data to you and then you're building a data product off it, like, the question that comes to mind for me is like, who's accountable? Well, like it's not an easy answer in that, in that model, I think it's a responsibility of the team that own the backend service to make sure that the data is getting pushed correctly out. But then likewise for the people who are receiving it, like they have layers of accountability as well as the people that are using that data. But like in a completely different model where you don't have that, like you have a more centralized model, those lines of ownership could be different. Right. And so I think it's so dependent on the, on the company and how they're structured to understand where something starts and ends. I think it's probably impossible to think that a data product PM would own everything completely end to end. Like, I, I, I can't envisage a world where that would happen just because there are so many different parts of the bit. Like, I don't know anyway, I'm not making a lot of sense now.
Bar Moses
Yeah, yeah. I mean, this is a maybe, you know, not, not what you'd want to hear, but I think it's a, it depends answer. Like it depends on, on the maturity of, I mean, I don't want to repeat what Mo said, but I, I strongly agree with that. It's, it's hard to draw the lines. I think some of the, the teams that do this better are those that are able to have a strong data governance team that can actually sort of clearly sort of lay out what that looks like. The most common model is something like a federated model where you have a centralized data platform. Like what you said, Mo, the centralized data platform sort of defines what excellence looks like, what great looks like. And so they might define, like these are the standards for security, quality, reliability and scalability. And so whenever you're building a new data pipeline or adding a new data source, you need to make sure that it passes these requirements on each of those elements. And so in that way, the centralized data platform defines what great looks like. And then no matter what team you're on, this could be the data team serving the marketing team or finance team or whatever use case it is. We adhere to the same requirements that the Centralized team has defined. So we see a lot of that. I think that's again, with generative AI, we will see more of that because maybe going back to what we said at the very, very beginning of the call, how we use data 10 years ago was a lot simpler. There were very few use cases and very few people using data. But today, because there's so many more use cases, so many more people using it and more in real time, the need for a centralized, you know, sort of governance definition is more important. I mean, this is also, you know, you kind of see this. I think the sort of, you know, LLM or generative AI stack is still being defined. But, you know, one of the questions you raised this, Tim, was, you know, hallucinations are very real, right? And you know, when you release a product and the data is wrong, it could have, you know, colossal impact both on your revenue and your brand. You know, maybe the example that I like to give them the most is I don't know if you all saw this sort of went viral on Twitter or X. I'm not going to get used to that thing. But it went viral on X. Someone did this thing on Google. Basically the prompt was something like, what should I do if my cheese is slipping off my pizza? And the answer was like, oh, you should just use organic super glue. And it's obviously a bad answer, right? And honestly, I think Google can get away with it because of such strong brand that Google has these days. And so, yeah, I'll probably continue to use Google even though they gave me a shit answer about organic superglue for my pizza. But most brands, if I'm an esteemed bank or an airline or a media company, I can't afford to have those kind of answers in front of my users. And so actually getting that in order is, you know, again, Google can get away with it, but like 99.9% of us cannot.
Michael Helbling
Nice. I want to switch gears just a little bit and talk about something else that kind of obviously ties in, but also kind of reintroduces a lot of challenges, which is unstructured data. And going into next year, one of the articles I was reading that you wrote, written bar was kind of like saying that was going to be one of the things could you kind of give a perspective about. Okay, so we're going to be using a lot more unstructured data, but then doesn't that. How do. How do we then take all the things we've just been discussing about how challenging data is and now we're just Going to slam on now a new set of challenges on top of that that are going to kind of redo the whole thing. Like what, what do people do about this?
Bar Moses
Yeah, great question. We should do at some point, like at 2025 will be the year of. And see, see, see what we come up with. I don't know if it'll be real round robin. Yeah, exactly.
Tim Wilson
You ask Claude, I'll ask perplexity. You ask ChatGPT, please.
Bar Moses
Yeah, exactly, exactly. I mean, honestly, if, like, if we could foresee that, we probably wouldn't be in this business, right? We'd be doing something else if we could be forecasting that. But I think as will 2025 be the year of unstructured data? I don't know. But I can tell you this. For the last 10, 15 years, most of the data work has been done with structured data. And structured data is very easy. It's like data that's in rows, columns, tables that you can analyze in pretty straightforward way with a schema. And most of the modern data stack and whatever solutions that we all use and love on day to day have been focused on structured data. That being said, if you look at where the growth is, I think there's some crazy estimates from Gartner. 90% of the growth in data will come from unstructured data. Something like that or. And just to define when we talk about unstructured data, things like text, images.
Tim Wilson
Et cetera, when 80% of that unstructured data will be generated by an LLM.
Bar Moses
So no, it's turtles all the way, if you know what I mean. I think the former founder of OpenAI said something like, we're at the peak data of AI now. We're at the time where this is the most data that we have to train. And from now on we're going to have to like, rely on synthetic data in order to do that. So, you know, and that goes back to your question of like hoarding data. But they're going back to the unstructured point. I think, you know, unstructured data is becoming more and more important and we're seeing, you know, organizations not only, you know, start to collect more of that, but also understand how to use it, how to, how you know, what to do with it. You know, I think this is very early days for this space and I think we're still sort of watching and kind of understanding what's happening. But I think one of the things just to make this really concrete with an example, I think is a cool example. We work With a company that's a Fortune 500 insurance company. And one of the most important types of data for them, unstructured data, is actually customer service conversations. So like, let's say I have a policy or something that I'm upset with and I want to chat with someone and then I have this conversation and you know, you can analyze that conversation to understand my sentiment. You know, how pissed off am I like, am I like yelling representative like, I don't know, I'm like getting my own manager or whatever it is, or you know, like super happy, thank you so much. Right. Like that's what I mean by sentiment. So you can sort of analyze like what is a conversation like, and basically, you know, you can also ask the user for feedback, right? Like sort of scoring that. One of the things that this customer does actually uses LLM to create structure for this unstructured data. What do I mean by that? They basically take a conversation and then score that conversation. So like 0 to 10, this conversation was a 7 or an 8 or something like that. Now what's the problem? The problem is that sometimes hallucinate and they might give a score that's, let's say, larger than 10. What does that mean if a score, if a conversation scored a 12, for example. Right. And so, so actually like the way in which we were working with this company is allowing them to observe the output of the LLM to make sure that the structured data is within the bounds of what a human would expect to score an unstructured data, which is the customer conversation. In that instance, we're using automation in a way that we maybe hadn't expected before in order to add value and to, in this instance is actually reduce the cost and improve the experience for the users. And this case.
Tim Wilson
But it's one of those that brings up the case of say that it just that scoring that model, it shits the bed 10% of the time, but it does way better 60% of the time and it does about the same as a human and it's overall a little bit cheaper. I think there are the trade offs and I mean, maybe this goes back to earlier, the discussion that if it's like, well, we're gonna pull out the one that it said at 12 and say you gotta fix that from happening. That's one approach, make this never happen. The other option is it's gonna happen. So the process needs to be human in the loop or human on the loop. Like don't, don't completely hand this over so that you can catch the Ones because a human would catch it. And there the trade offs are. And you know what, maybe they're even, you know, it's okay, you're going to have a small percentage who are totally pissed off, even if you're just running humans because their wait time was too long or something else. Is your goal to have every customer have a delightful experience or is it to actually have fewer customers have a horrible experience? It may be a different set of customers that are having a horrible experience and then probably Moti or Connected. You want to make sure the ones with the highest predicted lifetime value, you're not saying, great, we have way fewer customers, are pissed off. Unfortunately, it tends to skew towards the ones that are the highest lifetime value.
Bar Moses
So I think that's. Yeah, I mean, I think that's spot on. And I think it's. I mean, one of the questions that I remember sort of thinking through is like, you know, what's worse? Like no answer or a bad answer, you know, And I'm not sure I can tell you we're not creating, you know, sort of agents, if you will, in order to say, oh, I don't know, right, that's not how you create them. But oftentimes like that actually might be the better answer. I think Tomas Tunguz, who you know, we sort of collaborated with on our predictions for next year, sort of mentioned to us that like, you know, what you'd Expect is like 75 to 90% accuracy is considered like state of the art for AI. However, what's often not considered, I mean, on the face of it, 75 to 90% seems really legit and reasonable. But what's not considered is like if you have three steps and each of 70 to 5 to 90% of accuracy, the combination of that is actually ultimate accuracy of only 50%, which is, by the way, like worse than the high school student would score in that sense. And so is 50% acceptable? Probably not. And so what ends up happening is, is actually what I think we're seeing in Mark is this like the market actually took this big step back. Like I think a year ago there was this huge rush to adoptive AI and to try to build solutions. But as we are seeing that the accuracy is sort of, you know, at those ranges, companies did take a step back and actually are reevaluating or rethinking where to place their bets or place their chips, if you will. I still find that most companies evaluate a solution with a human thumbs up or thumbs down. Like, was this answer good or not? In allowing users to just mark like, yep, this was great, or no, this kind of sucked. Companies still have that and I don't think we're moving away from that, you know, unless there's sort of big change in the near future.
Mo
I have a totally unrelated random question bar with the companies you're working with is the focus of reliability and the work you do quite different depending on whether data structured or unstructured. Like in the use case you just gave, like, it sounded like it was quite different, but like, what are you seeing across the industry?
Bar Moses
Yeah, 100%. Like, I think the use cases that we cover very tremendously based on industry and company. And I think that's a reflection of the variability in what you can do with the data across the industry. So it can range, you know, the sort of, the types of products that we work with can be, you know, data products that are more like a regulatory environment where in, you know, one mistake in the data could actually put you at risk of regulatory fines. You know, if you are using data in some incorrect way or not following what is defined as sort of best practices for data quality. Sort of like this blanket statement that's very high level, but actually like, is very important in these environments. That's like one. The second could be where you have a lot of internal data products so, you know, like a lot of reporting or you know, product organizations that are, you know, doing analysis based on cohorts or segmentation of your user base. You know, a third could be data products that are sort of customer facing. So for example, if we have like, you know, the easiest thing is like a Netflix, you know, recommends, you know, your next best view, for example. And then a third, I guess a fifth use case could be, you know, a generative AI application. So for example, like an agent chatbot that helps you answer, ask questions and answer about, you know, your internal process or your internal data. So you can ask really basic questions like, you know, how many customers do we have? And you know, how many customers have renewals in the last few years. Or if I'm in, if I'm in support, I can ask how many support tickets has this customer submitted in the last year and in what topics and you know, what was their CSAT sort of questions like that. And so these, each of these can include structured or unstructured data and each of these can cover very, very different use cases in very different applications of the data. So if anything, I see the sort of more, less homogenous sort of applications of the data, if that makes sense. And I actually anticipate that this will carry through to the generative AI stack. So, you know, there's people create software in a multitude of different ways in a multitude of different stacks. The same can be said for data. Like there's not one single stack that rules it all. There's not one single type of data that rules it all in order to create data. And I think the same will be true for generative AI. There's not one single stack or one single preferred language of choice, and there's not one single preferred method, whether it's structured data or unstructured data. I think the, the. This does very much sort of vary, I will say, from my bias point of view is the thing that is common sort of going back to like the foundation of truth and sort of what is very important is like every organization needs to have or needs to rely on their enterprise data to make sure that it's high quality trusted data so that they can actually leverage and, you know, capitalize on that. And I think it's a messy, messy route to get there. Maybe 2025 will be the year of messiness. Sometimes you just got to like, lean into the messiness, you know, you know, on our like, like this random, you know, random path to, to kind of figure it out. But there's a. There's a lot more to figure it figured out there. But I don't see us sort of converging on like one single path or use case or even type of data.
Michael Helbling
All right, we've got to start to wrap up. This is so good.
Tim Wilson
And yeah, oh, we figured it all out. We're good to wrap.
Bar Moses
We can before exactly 2025 will just.
Michael Helbling
Be the year of leaning into the mess. And maybe that's the best we can do right now. Anyway, one thing we love to do is go around the horn, share last call. Something might be interesting to our audience Bar, you're our guest, dude. You have a last call. You want to share?
Bar Moses
Sure. So this concept that someone has shared with me recently, which I'll call sort of watching the avocado, if you will. So I don't know if you experienced this, but, you know, you, you buy an avocado and it's like, it's not ready, not ready, not ready. Boom, you're too late. It's already like you can't eat it anymore. Right? That happens to you, right? And so, you know, I think the idea is like a lot of sort of new technologies and trends are like that. And in this case, sort of this is like Generative AI, like, we're too early, we're too early, we're too early. Boom. You know, you missed the boat. And so I think one of the things that I take away from that is, like, as data leaders, as sort of data practitioners, how do we keep watching the avocado? And we gotta hit the avocado before it's too ripe. But the timing matters here, especially for a lot of these sort of trends and technologies.
Michael Helbling
Nobody likes bad guacamole.
Tim Wilson
If any listener now uses that when they're talking somewhere internally, if they use the analogy, please let us know. I want to.
Bar Moses
I like that.
Tim Wilson
We gotta watch the avocado.
Michael Helbling
Yeah. Awesome. All right, Mo, what about you? What's your last call?
Mo
Okay. I've been doing lots of thinking about how I make 2025 really great. And I think one of the tensions I've found is that, like, I'm naturally inclined to, like, want to go fast and get to the place that I want to get to. And so this is not anything other than just kind of a personal learning or a personal goal that I've set for myself. It is the start of 2025, after all, that I want to be more intentional about enjoying the journey. And the analogy I have is, I love going to the beach. Going to the beach with two small humans is really fucking hard. There's all this shit to pack. You've got to cart it all down there. Everyone needs sunscreen on, like. And so sometimes the bit of getting to the beach is so unpleasant that by the time you get there, you're all, like, flustered and hot and you don't want to be there. And you're like, oh, fuck it. Let's all just go home. So I'm trying to enjoy the journey to get there more. So, like, I went to the beach the other day. It took us an hour to get there. My kids wanted to stop at this playground. They wanted to look at the bird, like, you know, they wanted to have a snack. And I'm like, you know what? That's okay. I am just going to lean into letting. Enjoying the bit to get there and not focusing so much on kind of the end state. And it's. It's not just about kids. It's also about work. Right? Because, like, if you're constantly trying to, like, come up with this huge, amazing strategy and deliver this project, but, like, you're miserable in the months delivering it, that kind of, you know, defeats the purpose. So, anyway, that's just my intention for the year. Thought I'd Share what about you Tim?
Tim Wilson
Well my publisher is going to hurt me if I don't plug analytics the Right Way. So if you're depending on when you're listening to this it is less 15 or fewer days from actually being available. But analytics the Right Way is available for pre order until January 22, in which case it will be available as a print book or an ebook and the audiobooks coming out four or five weeks after that. So that does have a section talking about human in the loop versus on the loop versus out of the loop and some of the AI trade offs. But it is not and AI heavy book at all. So that's my obligatory self my log rolling last call. But for fun I will I've definitely last called Stuff from the Pudding before but one that they recently had it's at Pudding. Cool. But it was Alvin Chang got a data set that looked at a whole bunch of different roles and it was how much they spent of their time sitting versus standing. So it's kind of one of those like scrolling visualizations. You enter kind of some stuff about about your job first so it can then kind of locate you on it. But it's just a simple x axis from that goes from sitting all the time for work versus standing all the time for work and then looks at a whole bunch of different it varies what the y axis is as you scroll through it. So it's kind of just a fun visualization and it also starts to call out like how tough on bodies a lot of professions are because they're required to crouch or stand all the time. They can't take breaks and that sort of thing. But it's just kind of a fun interactive visualization. So worth checking out to relax. What about you Michael? What's your last call? I mean it was going to be the book, Tim.
Michael Helbling
I was I was actually ready to do one on the book for you just in case you didn't cover it. So good job. We'll report back to your publisher. You're doing it. You're doing what you can do. No, no. So actually mine is recently recast who I think is some of the best in the game when it comes to media Mix Models. They've started publishing a series of YouTube videos on how to think through the creation of those models and I think it's a great watch for anybody who's engaging with that kind of data. So I'd highly recommend it. And they've put a couple out already and then I think there's some more to come. So that would be my last call. All right, so what is 2025 the year of. We just have one word. Everybody has to go around and do like a one word. It's. Or like a fast. No, nothing.
Tim Wilson
Moderation, I think.
Michael Helbling
I think 2020. Yeah, there you go. I think 2025 is going to be the year of being thoughtful, keeping with the work, increasing insights, maybe helping with process that none of that's actually going to happen, but I just sort of like wish it were. So that's my take on it.
Tim Wilson
So you use the one word for all of us. You just. You kind of took. We all deferred or.
Michael Helbling
Well, nobody answered Tim, so I just figured we were.
Tim Wilson
I yielded my one word to you.
Michael Helbling
So good. Yeah.
Tim Wilson
Like it.
Michael Helbling
So I couldn't think of a better person to help us kick off 2025 with than you. Bar, thank you so much for coming on. The show is been awesome.
Bar Moses
Absolutely. I hope 25. 2025 will be, you know, even better and greater than 2024. And you know, I would probably be remiss if I wouldn't say that 25 would be the year of highly reliable data and AI.
Michael Helbling
That's right.
Tim Wilson
Hey.
Michael Helbling
What'S the saying? From your mouth to God's ears or whatever. That's the. We absolutely would want that.
Bar Moses
Amen.
Michael Helbling
Thank you so much. Awesome. Thank you so much for coming on the show again. And of course, no show would be complete without a huge thank you to Josh Crowhurst, our producer. Just getting everything done behind the scenes. As you've been listening and thinking about 2025, we'd love to hear from you. You feel free to reach out to us. You can do that via our LinkedIn page or on the measureslac chat group or via email at contactnalyticshour IO. We'd love to hear your thoughts, other things that you think are big topics for 2025 in the world of data and analytics. So once again, Bar, it's a pleasure. Thank you so much for taking the time. We really appreciate having you on the show again and, you know, you're on track now. There's. We keep talking about the Five Timers jacket. That that's going to be a thing. So you're in the running. There's only been people that done this.
Tim Wilson
A couple of times. Are you prepared to have five kids? I guess is the question. Like we might need to.
Michael Helbling
Oh, yeah, yeah. Anyway, so of course, I think I speak for both of my co hosts, Tim and Mo, when I say no matter where your data is going, no matter the AI model you're using. Keep keep analyzing.
Tim Wilson
Thanks for listening. Let's keep the conversation going with your comments, suggestions and questions on Twitter @analyticshour, on the web at analyticshour IO, our LinkedIn group and the Measured Chat Slack group. Music for the Podcast by Josh Crowhurst so smart guys wanted to fit in, so they made up a term called analytics. Analytics don't work.
Bar Moses
Do the analytics say go for it no matter who's going for it. So if you and I were on the field, the analytics say go for it. It's the stupidest, laziest, lamest thing I've ever heard for reasoning in competition.
Tim Wilson
So my yeah, my smart speaker decided to weigh in on on that.
Bar Moses
I love it. What do they have?
Michael Helbling
It's the perfect little end note to that particular thing.
Mo
Yeah.
Tim Wilson
Yeah. Probably be Cuban to thumbs up or thumbs down, man, the background was saying, nope, I don't think I can actually. It basically said, I don't know. Now that I think about it, it was like whatever it decided it had heard, which was nothing. Yeah, perfect. Rock flag and lean into the mess.
The Analytics Power Hour - Episode #262: 2025 Will Be the Year of... with Bar Moses
Release Date: January 7, 2025
In the latest episode of The Analytics Power Hour, hosts Michael Helbling, Tim Wilson, and Mo Kiss engage in a thought-provoking conversation with returning guest Bar Moses, co-founder and CEO of Monte Carlo. Titled "2025 Will Be the Year of...", this episode delves deep into the evolving landscape of data analytics, the challenges of the modern data stack, the rise of generative AI, and the critical importance of data reliability and governance.
The episode kicks off with the hosts humorously contemplating what the year 2025 will be defined by, setting a lighthearted tone for the in-depth discussions to follow.
Michael Helbling [00:13]:
"I didn't say that very clearly, but we do want to define the future. So what will 2025 bring?"
Bar Moses is introduced as a seasoned expert in data reliability, leading Monte Carlo in ensuring data accuracy and trustworthiness across organizations leveraging AI.
Michael Helbling [02:04]:
"Bar Moses is the co-founder and CEO of Monte Carlo, the data reliability company... Welcome back, Bar."
Bar reflects on the journey since her last appearance in 2021, highlighting how data quality has risen in importance as the complexity and usage of data have surged.
Bar Moses [02:30]:
"Monte Carlo was founded to solve the problem of what we call data downtime... periods of time when data is wrong or inaccurate."
Mo Kiss [03:33]:
"I think it maybe wasn't as complex and so like, you know, as complexity has grown... it's even harder to troubleshoot."
Bar elaborates on the inception of data observability, emphasizing its role in maintaining trusted data pipelines essential for accurate reporting and AI applications.
Bar Moses [04:15]:
"Data observability is basically allowing people creating data products... to make sure that they are actually using trusted, reliable data."
The discussion transitions to the modern data stack, with Tim expressing skepticism about its initial promises and Bar agreeing on the necessity to revisit fundamental data principles amidst the influx of new trends.
Tim Wilson [07:36]:
"The modern data stack is a phrase that has slid into the trough of disillusionment."
Bar Moses [10:07]:
"There are core truths like data being a competitive advantage, reliable data matters, and innovation is essential."
Bar introduces the concept of data serving as the moat in the generative AI landscape. She shares insights from a survey indicating nearly universal adoption of generative AI among data leaders but a significant lack of confidence in data reliability.
Bar Moses [10:42]:
"The only thing that differentiates different generative AI products is the data that's powering them."
Michael Helbling [12:25]:
"Only one out of three trust and two out of three don't trust the data."
The conversation delves into the complexities of data governance within organizations, debating centralized versus decentralized data management models. Bar emphasizes the importance of federated models where centralized standards guide decentralized teams.
Bar Moses [33:23]:
"Centralized data platform defines what excellence looks like... all teams adhere to the same requirements."
Tim Wilson [26:19]:
"Is it our unique data, we must use it. There's a missed step to say, like, really, what's our minimum viable product?"
Bar discusses the anticipated surge in unstructured data, predicted to account for 90% of data growth, and the challenges it poses for data reliability and AI training.
Bar Moses [48:34]:
"Unstructured data is becoming more and more important... I think it's the year of messiness."
The episode highlights the critical need for robust data reliability to prevent costly errors in AI applications. Bar recounts an example where Monte Carlo helped an insurance company ensure the accuracy of AI-generated sentiment scores from customer service interactions.
Bar Moses [51:09]:
"A conversation scored a 12... we allow them to observe the output of the LLM to make sure that the structured data is within the bounds."
As the conversation wraps up, the hosts and Bar share their predictions for 2025. Bar confidently asserts that the year will be defined by highly reliable data and AI, underscoring the industry's shift towards prioritizing data trustworthiness.
Bar Moses [65:28]:
"2025 would be the year of highly reliable data and AI."
The episode concludes with the hosts sharing personal reflections and recommendations. Bar uses the metaphor of "watching the avocado" to illustrate the importance of timing in adopting new technologies like generative AI.
Bar Moses [58:56]:
"As data leaders, how do we keep watching the avocado? We gotta hit the avocado before it's too ripe."
Mo Kiss [60:05]:
"I'm trying to enjoy the journey more... not focusing so much on the end state."
Data Reliability is Paramount: As organizations increasingly adopt generative AI, the reliability and trustworthiness of their data become critical competitive differentiators.
Generative AI Demands High-Quality Data: With the widespread use of generative AI, ensuring data accuracy is essential to prevent costly errors and maintain brand integrity.
Evolving Data Governance: Organizations are moving towards federated data governance models, balancing centralized standards with decentralized data management to handle the complexity of modern data usage.
Unstructured Data's Rising Role: The bulk of data growth is expected to come from unstructured sources like text and images, presenting new challenges for data observability and reliability.
Organizational Adaptation: The roles within data teams are blurring, necessitating a mix of engineering and data expertise to build effective generative AI and data products.
Michael Helbling [00:55]:
"2025 probably be the year of Tim Wilson still being frustrated with people calling stuff the year of."
Bar Moses [10:42]:
"What's the advantage? I can create a product just like you can create a product. The only difference is the data."
Bar Moses [24:13]:
"We all have access to the latest, greatest models, but the only thing that differentiates different generative AI products is the data that's powering them."
Bar Moses [58:56]:
"As data leaders, how do we keep watching the avocado? We gotta hit the avocado before it's too ripe."
Bar Moses [65:28]:
"2025 would be the year of highly reliable data and AI."
Episode #262 of The Analytics Power Hour offers a comprehensive exploration of the challenges and opportunities facing the data analytics industry as it navigates the complexities of a rapidly evolving technological landscape. With insightful commentary from Bar Moses and the insightful moderation of the hosts, listeners gain a nuanced understanding of why data reliability and governance are set to be defining themes in 2025.
For those eager to stay ahead in the data analytics field, this episode underscores the importance of building robust data infrastructures, embracing generative AI responsibly, and fostering organizational structures that prioritize data trustworthiness and innovation.
Note: For further discussions, listener interactions, and more insights, connect with The Analytics Power Hour on LinkedIn, join the Measured Chat Slack group, or email at contact@analyticshour.io.