
Loading summary
A
Foreign. Welcome to our second 3 Smart Guys podcast, sponsored by Directions on Microsoft, your authoritative source of information on all things Microsoft enterprise software, cost management and licensing. Visit us@diallectionsonmicrosoft.com so I'm Barry Briggs and I'm here with George Gilbert. Wave, George. And Peter o'. Kelly. Wave, Peter. Together we've got something like over a half century of hard earned experience in enterprise IT and enterprise computing. Digging into our topic for today, it's become axiomatic that data is the fuel for AI. So in a lot of ways then, AI is the catalyst for evolving the enterprise data landscape. What does that all mean for Microsoft? What does it all mean for its enterprise customers? What does it all mean for you? Let's jump right in. Today it's almost become obligatory when talking about a successful enterprise AI deployment that you have to clean up your data state. Now, long time ago, when I first started as CTO of Microsoft's own IT unit, we had over 2,500 applications, meaning 2500 databases, one form or another. Now we did go through a big app simplification and consolidated effort, a consolidation effort, but still, that's a lot of data and a lot of data to clean up. So, Peter, I'll start with you. How do you clean up a data estate?
B
Well, it's a very timely topic and I think first it's important to stand back and say the combination of AI and data, I think is lining up to be the most common consequential case of the adage of garbage in, garbage out. Because if you attempt to use natural language like AI natural language query for democratized data analytics as an example, it's just not going to end well. If your data estate is in bad shape and similarly so the AI tools will confidently compile and present sometimes incorrect answers, usually consuming a lot of resources along the way, kind of adding insult to injury. But another high level challenge is something that might be called inadvertent oversharing, which is to say if security by obscurity was working well for you in the past, it's not going to work anymore. When AI tools are added to the mix and if we expand the scope to say, from data estate to digital estate for an enterprise, adding the world of documents and communication related resources like email messages, text messages, meeting, summary transcripts, it's even scarier than the stark reality of enterprise data today. And so it's not a great picture. Now I'll get into just briefly some of the challenges and then get into some good news as well. So just to briefly review some of the challenges that I think got us here, a big one is data related technology Churn. So over the last 50 years we've cycled through a bunch of different things in the interest of time. I'll skip the main chapters, such as the dozen or so times that both SQL and relational database management systems have been prematurely so far been proclaimed dead. But along with that, the churn enterprises get these very large collections, cumulative collections of data related technologies, and they rarely get retired. And then when new things do come in, in many cases enterprises will just do the convenient lift and shift and bring along all of the idiosyncrasies and limitations of the the earlier deployments. And a third common problem is the inconvenient truth that data modeling has become something of a lost art and a topic we're going to revisit momentarily. But just to recap, the bad news is that the situation is pretty dire today for many organizations when they're adding AI to the mix if their data estates or digital estates are not cleaned up. The good news is that AI provides a very strong incentive to make the investment required to clean things up. And that's often been a very big challenge for data advocates to get approval to get funding resources to do this. Also, AI tools can be very effective in a collaborative intelligence sense in assisting in the cleanup. And last but not least, the leading data platform vendors have a very strong incentive to help enterprises address these opportunities and challenges because otherwise their data platform customers are likely to do some serious self inflicted damage with AI.
A
Yeah, I used to think that, you know, getting senior leaders to address the problem of data hygiene and data cleanliness and data health and all that was almost Sisyphean challenge, you know, pushing the rock up the hill. But you know, but we suddenly have somebody pulling the rock, which is, which is AI. I have to ask this because I've known you for like 30 years, Peter, and you have always been a passionate advocate of data modeling. And I sort of feel like it's kind of gotten out of favor, out of style lately. But I imagine you think that data modeling is a big part of this equation.
B
Yeah, I'd say definitely, now more than ever. And an important thing for enterprise planners to consider is it's not rocket science like data modeling has never been a super obscure kind of thing. We talk about conceptual data modeling, which is making sure that all of your human and now AI stakeholders are in sync on which parts of the real world we want to describe. Logical data modeling, which is more, let's take that and put it into a database platform today, usually extended relational and or document and physical data modeling, which we used to do a lot before the advent of modern cloud data platforms going in and optimizing for performance and stuff. And as with the data estate picture for many organizations today, they just don't have this under control. So for conceptual data modeling, it's the people. It's really challenging to get people, especially with different priorities, different parts of an organization, to collaborate for logical data modeling. The data technology churn has been a big problem, although I think extended relational is building momentum again. Another reason that this has been super challenging historically is that it's seen often as kind of anti agile. So it's more of a perpetual program than it is a readily pointable project. And last point on this, a lot of the database technology vendors have taken some liberties over time in implicitly or explicitly suggesting that you don't need to do data modeling anymore. And if we remember things like schema on reading that was an example, that was definitely an actual results may vary kind of proposition. But again, accentuating the good news, LLMs such as Cloud and Gemini are actually very productive data modeling collaborators and they can help to streamline what was otherwise often a very daunting proposition. And that includes analyzing existing systems to determine which resources and databases are actually being used and if they have room for improvement. And there are also some great resources coming out. Again, historically it's been a bit of a lost art, but now you've got really popular and influential authors like Joe Reese with his Practical Data Modeling substack and a forthcoming book on the same topic. So for people who have been long term advocates of data modeling who are kind of waiting for the right moment to make the case for it, things are lining up pretty favorably now.
A
Passionate for sure. So let me ask you, George, I'm going to send this to you. So when we think about this problem of cleaning up the data state, you have to have a pristine data environment before you can even think about AI.
C
I think it's something like a combination of like Dante's seven levels of Hell and somewhere like a stairway to heaven. It's like you have to ascend from the seven levels of hell. But the thing to remember is just in the age of AI, you program AI with data. And so the richer your data model, the more you can get out of your AI. And as Peter was saying, we can't do an end to end enterprise data model like in the past. This has been a dream, you know, over many decades and many generations. But like to make this work for, for agents, for example, you probably have to support like pulling together the data for one outcome at a time and then have some sort of central governance mechanism so that each of the individual use cases don't step on each other. But as you build out this model, it gets, you know, it gets richer. But in the past we just modeled entities like people and resources. And in the future we need to model the processes so we understand why things happen and what's likely to happen and even what should happen. And you know, that's much, much harder than the entity centric stuff that we did. So, you know, going back to it, I think we're going to have to do one business outcome at a time.
A
So continuing on that theme for a second, you know, for years it seemed everybody thought that a data estate, a little quote signs there, you know, meant taking all your transactional data and just throwing it all into a data warehouse. And then eventually we also threw all of our unstructured data and called it a data lake. And you know, that was our data estate. And all that did, at least in my view, was to create frankly, a big mess. In fact, I was talking to a very senior analyst recently who said data warehouses are where data goes to die. I don't think that's true anymore, especially since with the dawn of AI. But George, you built a whole maturity model for how we should be thinking about data and data governance. Can you give us a brief overview of that?
C
The gist is that if you go all the way back to the beginning, the lowest level, I guess you should think about stuff like an xy. If you think about an XY set of axes and the maturity going up and to the right at the lowest level on the X axis, there's the scope and fidelity of what you can represent in your model. And on the Y axis is the analytics sophistication, what questions you can ask of that model. And most enterprises are in the lower left. And that's like siloed, functionally siloed aggregates of what happened, like cubes. And I say silo because when you put them in the aggregate, you lose the, a lot of the perspective, the ability to ask questions of fanning out across different perspectives. And as you, you move up this maturity model, like you start to track things in real time and you start to be able to ask why things happen. And, and you have to mu. Move even a few more steps up and to the right to, in the data model scope and then the analytics sophistication to get to a knowledge graph where you can ask not just what happened, why and what's likely across the entire enterprise's operations. Because you're, you're harmonizing.
A
Yeah. You know, and then just to kind of just interrupt for a second so, you know, the whole notion of why this happened is kind of a new concept in managing data. And so how do you think about that? Where does that, where does that come into play?
C
Yeah, it's, it's a, it's a great question because there's, there's two levels. There's the deterministic process where you want to capture, you know, this has to happen before this, you know, and then this. But there's a, the deeper why is all the tacit knowledge, the decision points and that we're just beginning to grapple with. And I think we need both because they inform each other. And so what I was talking about still was in the deterministic, you know, like these rules have to be followed.
A
Peter, do you agree that, you know, most enterprises are still down that lower left and have a long way to go to reach the kind of levels of maturity that George is talking about?
B
Yeah, I think that's true. Unfortunately, at an overall level, there are pockets of excellence, I think within many organizations and certainly some really fast moving newer companies have their house in order pretty well. But I do think overall many organizations are challenged. And now as we're discussing, the advent of AI tools is really going to make it imperative to get further up that maturity curve and fast.
A
So going up the maturity curve for a second. Yeah. I can't help but remember that a decade or so ago we talked about things like metadata repositories and nobody knew what that was and everybody thought it sounded good because it had a lot of syllables in it. But you know, they, and, but nobody was willing to really invest in it. But today, cool kids, it seems like they're all talking about semantic layers and ontologies, context graphs, knowledge graphs. And Peter, you mentioned that earlier this month there was a seminar, a webinar actually from Atlan. It featured a panel of data domain leaders debating whether the term semantic layer is already passe, suggesting that context graph or context layer should replace it. Peter, can you disambiguate what all this stuff really means?
B
I can give it a shot. And not surprisingly, my general take on this is that all of these terms flying around have done a really good job of making it possible to have a checkpoint on the extent to which organizations have their data models. In order. Because if you have your data models in order, you just don't need a lot of the semantic spackle that's being promoted these days. So if we, if we step back and just look at three facets of this from technology.
A
Sorry, can I interrupt? So do you think that these context layers and so forth are band aids for the, for data models?
B
So I think in many organizations they are trying to address what might be considered presenting problems. They're trying to address immediate symptoms or show short term needs instead of getting to the core of basically going out and getting your data estate in order. Yes, but to continue on the terms and where the terms are coming from, as I'm trying to make sense of it, the first thing is there's a need to create a comprehensive logical data model across all of your data silos, your application silos. And again, this isn't new and as you pointed out, like master data management is a term that used to be used for this, but I think there were a lot of dashed hopes for MDM initiatives long, long time ago. And again, where this is now not a nice to have or do it as time permits, it's something that you really have to get in order because many organizations just don't even have consistent definitions of fundamental things like customer and product. That was a big part of the ATLAN discussion a couple weeks ago. So, second of three things. Another facet as we've touched on is extending the scope of this model. So now you want to be able to have a comprehensive repository that includes facts, dimensions and metrics. This is also not new, but historically you probably did that in your BI tools or in your custom application. So these are extensions and the goal is now to do that once, instead of doing it once per tool type or application. And then third, which we also touched on already, is expanding the scope. So when we say resources, digital resources these days it's not just traditional databases, but now also documents, email messages, text messages, meeting recordings and summaries. So this used to be referred to as unstructured or semi structured information resources. It's perhaps easiest to think of that as, as the digital stuff you have that is probably not under the control of a data platform today. So from my perspective, the terminology is sometimes kind of amusing when you hear software salespeople coming in and talking about terms from metaphysics and other things. But if you say semantic context, ontology, knowledge graph, if this is ultimately going to lead to a renewed focus on conceptual and logical data modeling, it's a huge step in the right direction, even if in the transition the terminology can get a little confusing.
A
So George, though, do you think that this notion of context graph I want to come back to. But let me just start by asking that. In his book Crossing the Chasm, your friend Geoffrey Moore argued that systems of intelligence will ultimately supersede systems of record like ERP and systems of engagement like CRM. And you yourself have written about systems of intelligence. Now coming back to the context graph, is that what's going to get us to the systems of intelligence? What are your perspectives on this shift?
C
Yeah, it's a really good question because we talked about this semantic layer and we talked about it endlessly for years and it started out as the business intelligence metrics and dimensions, but there was a richer definition which was the entities and processes for the entire enterprise and that's a much, much harder problem. But the context graph, it captures stuff that you're not going to capture in this semantic, in the broader definition even of the semantic layer, it captures the reasoning traces, the tacit knowledge of why people make decisions and, and you really need them both. And I want to say that a lot of the discussion Atlen, the folks at Foundation Capital, all the different approaches to capturing it were like, let's somehow in a lightweight, non intrusive way capture these reasoning traces. And I think the guys who have made the most progress in doing this in a, in a, in a way that can really move the needle, that measurably we have evidence that moves the needle in how the models work in their training and same way that these reasoning traces can be used as precedent at inference time. It's how the like mercures of the world capture this stuff. It's much more rigorous. It's in a, it's in an environment where you know, the expert has to set up, you know, here's what the conditions are, here's how I thought through it, here's the counterfactual, here's how to graph grade it so you know what a good answer looks like. And that's really heavy weight and no one has tried to undertake that inside the enterprise because it's too demanding on the domain expert. But if you combine that semantic layer, the deterministic digital twin, with this way of capturing reasoning traces, it's much less burdensome. But you get these high fidelity reasoning paths. And so again it's the deterministic semantic layer and the context graph reinforce each other. And that I think is what's ultimately going to be called the system of intelligence.
A
The two together the same thing as the digital twin of the enterprise?
C
Well, I think one layer, the semantic layer is going to be the deterministic twin and the context graph is going to be the cognitive twin.
A
Do you think? I just have to follow up. Jeffrey Moore talks about the next level up above that is the systems of autonomy, when essentially the enterprise runs itself. Are we still light years away from that?
C
I think that's like, it's aspirational. It's sort of like the, I don't want to say the red queen effect, but you know, it's always receding into, in the distance.
A
I see.
C
Where, where. Because remember, the scope and fidelity of what you're modeling is, is never complete, you know, and, and so, so you're never, you never quite make it there. But, but, but you hold out that goal to give you directional guidance.
A
Let's, let's conclude by talking a bit about vendors and products. And you know, when I think about the, the battle of the data titans, I think of obviously the hyperscalers like Microsoft, Amazon and Google and Snowflake, which I notice is now valued at around $60 billion and Databricks now valued at $134 billion after a series L round. If we agree that the maximum value of data will occur when the semantic value or semantic layer or context graph is finally materialized. I'm not sure we do agree on that. But let's just pose the question of these who's going to win? And let me just add one other thing too. When we think about Microsoft. Microsoft at Ignite last November introduced its IQ family of products. There's Fabric IQ and Work IQ, Foundry IQ. Work IQ in particular leverages the Microsoft Graph in Microsoft 365. So in a sense it's almost a context layer in and of itself. Fabric IQ does similar sorts of things on fabric data, but there are, but there are others. And I mentioned Snowflake, Databricks, Palantir, there are others. Who do you think is important and who do you think is going to win?
B
I'll take a run at that first. So from my perspective, I think it's really in the western world, down to five for the data platforms on this. So they're the hyperscalers, the big three as you mentioned, and then databricks and Snowflake. It's interesting that databricks and Snowflake are kind of ending up in the same place but starting from different directions. So Snowflake was a built for the cloud data management platform focused on analytics, initially expanding to include other workloads and addressing things like data science and now machine learning and databricks was coming from the other side. So it's more of a data processing framework focused a lot on data science, now machine learning and adding a more traditional data platform to it. Both of them, interestingly, within the last year have acquired and integrated built for, built for the cloud, PostgreSQL. So now they have OLTP on top of it as well. And that's going to be sort of interesting, kind of going back to the hyperscalers and their respective Swiss army knives of every data platform, every data tool that you can imagine in there. So I think that's going to change the competitive dynamic. Microsoft, I think it may be able to assert that it's on track for delivering a better bundled value proposition, but at this point I think it's lagging the market leaders in AI and data in several respects. And I see this as another opportunity to test what's been called Microsoft's good enough moat. Like it's still TBD to see if Microsoft's enterprise customers are going to believe Microsoft has a credible story and path to getting to a sufficiently competitive position relative especially to Databricks and Snowflake. And of course this is complicated because those are also Microsoft partners. So Microsoft has kind of a win, place and show set of bets in this. But in, in the immediate future, I think most enterprise IT planners are going to have two or more of these data platforms. So certainly the hyperscaler or scalars they're working with and their data platforms, but also databricks.
A
And do you think that, you know, with the acquisition of Postgres into, into Snowflake and Databricks and you know, with the fact that Microsoft of course already has that and of course they have SQL Server and Cosmos and a lot of other. Do you think there's a sense of. But in particular with Snowflake and databricks, do you think there's a sense of we're positioning ourselves as one data management system to rule them all?
B
Exactly. And in fact I'd say they would go even further and say, I don't think they would define it this way, but if you say information management equals data plus all of those other types of resources we've been talking about, like Snowflake in particular has made a lot of investments and delivered a lot of, for bringing the best of data management capabilities to those traditionally semi structured, unstructured things, PDF documents, you know, productivity application documents and other things. So I would say yeah, emphatically they do. And if they're successful, that means that the hyperscalers will be more relegated to a role as a cloud platform provider instead of a cloud data platform provider. So yes, I do think this is going to be a pretty intense competition.
A
George, is there any room for smaller companies in all of this?
C
You know I, I'm going to come at this with a, with a non consensus take which is, I think we're so early in this that like for instance like with, with Snowflake, you know you originally put all your data in for the great analytics and then they added other workloads. But now that we opened up the data formats, you know, you don't own all the follow on workloads. The new point of control is the catalog that where you add more and more metadata to try and define the data. But we're so early in that process that like someone can come along, you know, like a Palantir or a relational AI where they're adding, they're adding definitions that are so rich that you get to this digital twin. Now we know Palantir is very expensive and very labor intensive and relational AI is still not got product market fit. But the point is that we're so far away from even modeling that deterministic twin and, and we're at the context graph where we were with semantic layers five years ago. So I think it's too early to say. I agree that the hyperscalers and the big data platforms are the by go to market presence, have the sort of pole position in trying to build out these workloads, this modeling layer. But I think we're so early that we could be surprised by someone coming in, you know, from left field.
B
If I could just respond to that for a second. I think it's good to distinguish between two layers at the just base data platform layer. I think even though there has been a lot of sort of fire all weapons, try anything once, you know, like Hadoop, maybe not a super good thing to have on your resume anymore. And a lot of these things like I mentioned with the Churn, have been problematic over the years. But now I think there's real convergence where you're saying I will have columnar relational, I will have row based relational, it will be extended. So I can have you know, scalar data types but also documents, JSON and xml and I will have postgres. If I'm going to be doing oltp, I think that that battle is over and postgres has won. So I agree with you that for the things above that or you know, we're getting to the long aspired for integration where we're saying we're going, as you say, bring the benefits of data management to process management as well. And the context graph layer, like the runtime, the application runtime in things like Palantir, I think that is still in more of a formative stage. But I think at the data platform layer the patterns are pretty clear.
C
But I guess the point I was trying to make is that at the data platform layer like it's open, no one owns the data. The vendors don't own the data. What they own is right now the choke point is the catalog that defines the data and governs it. But those definitions are very primitive. That's why I'm saying someone could come in from left field.
A
Let me set a Microsoft perspective here too. I think that there's a war for the catalog and Microsoft's armor in this is Purview, which is aspiring to be an enterprise wide data governance product. Now whether it subsumes something like Databricks, Unity or the other catalogs, we'll have to see. But I think George has got a good point that there is a choke point there with the catalog and with the governance tools.
C
Yeah.
B
And I just think it's going to converge a little bit more rapidly. And basically the question for an enterprise is going to be where is your center of data gravity? And then what is going to be your strategic data platform for that? And the catalog will be coming along with that. It will not be a separate thing that you layer on top. I mean maybe if you're back in the hey, I need spackle to tie together a whole bunch of systems, then I can't make the investment to try and consolidate things then you need more of an intermediate layer. But the value of having Purview is a great choice for a brand name on this. To have the Purview with everything under the control of a database management, a data control plane is going to prove compelling. And again, this is not just because data is fun and everybody should do data modeling. It's back to the top where if you don't do this and you start introducing AI tools, it's going to get chaotic fast.
A
Guys, we're out of time. It has been as always a real pleasure to talk chatting with you. Thank you so much and thanks to our sponsor Directions on Microsoft. And again, if you need a reliable information source about anything Microsoft Enterprise, whether it's software, services, costs or licensing, come visit us directions on Microsoft.com and if you like this podcast from the three of us, give us a thumbs up and subscribe. And if you have any feedback or anything, anything you wanted on anything we talked on anything we talked, including this wrap up, please please add a comment on the session's YouTube page. Thanks very much and we'll see you again in a few. Sam.
Date: March 4, 2026
Host: Barry Briggs
Guests: Peter O’Kelly, George Gilbert
Theme: How data estate modernization is now imperative for enterprises in the age of AI—what it means, the challenges, and Microsoft's prospects.
This episode brings together long-time Directions on Microsoft analysts Barry Briggs (host), Peter O'Kelly, and George Gilbert to discuss why “it’s all about the data” in the enterprise—especially as AI, Microsoft technologies, and competitive pressures compel organizations to overhaul, manage, and derive more value from their data estates. The conversation moves from the classic problems of data hygiene and modeling to hot topics like semantic layers, context graphs, data maturity, and the high-stakes 'battle of the data titans' among Microsoft, AWS, Google, Snowflake, and Databricks.
“It’s lining up to be the most consequential case of the adage of garbage in, garbage out.” — Peter O’Kelly [01:52]
“Data modeling has never been super obscure... now, LLMs such as Claude and Gemini are very productive data modeling collaborators.” — Peter O’Kelly [05:35 & 07:22]
“Data warehouses are where data goes to die.” — (Citing a senior analyst, paraphrased by Barry Briggs) [10:00] “We need to model the processes so we understand why things happen... much, much harder than entity-centric stuff.” — George Gilbert [08:31]
“If you have your data models in order, you just don’t need a lot of the semantic spackle.” — Peter O’Kelly [14:55]
“The context graph... captures the reasoning traces, the tacit knowledge of why people make decisions... the deterministic semantic layer and the context graph reinforce each other.” — George Gilbert [19:12]
“In the western world, [it’s] down to five... hyperscalers, then Databricks and Snowflake.” — Peter O’Kelly [23:12]
“At this point I think [Microsoft is] lagging the market leaders in AI and data in several respects… Is Microsoft’s ‘good enough’ moat sufficient?” — Peter O’Kelly [24:20]
“There is a war for the catalog and Microsoft’s armor in this is Purview.” — Barry Briggs [29:53]
“If you don’t do this [clean up and consolidate], and you start introducing AI tools, it’s going to get chaotic fast.” — Peter O’Kelly [30:25]
On the impact of AI on data hygiene:
“Getting senior leaders to address the problem of data hygiene... was almost a Sisyphean challenge... but we suddenly have somebody pulling the rock, which is AI.” — Barry Briggs [04:56]
On data modeling’s ‘anti-agile’ perception:
“It’s seen often as kind of anti-agile. So it’s more of a perpetual program than it is a readily pointable project.” — Peter O’Kelly [06:26]
On data warehouse criticisms:
“Data warehouses are where data goes to die.” — (Cited by Barry Briggs) [10:00]
On system convergence:
“At the data platform layer... the patterns are pretty clear... For OLTP, I think that battle is over and Postgres has won.” — Peter O’Kelly [28:16]
The panel frames AI as an irresistible force finally driving enterprise leaders to take data hygiene, modeling, and governance seriously—after decades of neglect. They anticipate major changes to how businesses define, manage, and draw insights from their data, with technical leadership (and vendor supremacy) hinging on mastery of not just data storage, but model richness, process modeling, and context-capturing reasoning traces. Microsoft is a strong contender but faces agile, innovative competition as the “war for the catalog” unfolds. The call to action: invest in data estate modernization before AI multiplies your mess—and your risk.