
Loading summary
Cindy Howson
Foreign.
Podcast Announcer
Welcome to the Analytics Power Hour. Analytics topics covered conversationally and sometimes with explicit language.
Michael Hoeble
Hey, everybody, it's the Analytics Power Hour. This is episode 286. You know, it's Tuesday and I know what you're thinking. I sure hope revenue and active customers still mean the same thing as they did yesterday. A lot of you know firsthand the pain I'm describing as data ping pongs around the business taking on shapes and definitions that were never really intended. Well, the semantic layer was supposed to take care of all that. And to be fair, there are some nice, tidy businesses out there doing a great job, but most of us are still trying to figure out where it should live, what it should be written with, and who should own it. So I think we should dig into it. But first, let me introduce my co host, Mo Kiss. How you going?
Mo Kiss
I'm going great. I'm really excited about this.
Michael Hoeble
I'm excited too, and excited to do the show with you and Tim Wilson. Howdy.
Tim Wilson
I think it's just all semantics.
Michael Hoeble
It's all so good.
Well, that's an interesting potential cop out. Okay.
Tim Wilson
No.
Michael Hoeble
And I'm Michael Hoeble. Well, to really get into this topic, I think we found the perfect guest. Cindy Howson is the Chief Data and AI Strategy Officer at Thoughtspot. She was previously Vice President at Gartner, along with many other distinguished roles throughout her career. She is the host of the Data Chief podcast and has authored many books on BI in data. And today she is our guest. Welcome to the show, Cindy.
Cindy Howson
Thank you for having me, everyone. I'm so excited to be here.
Michael Hoeble
I am excited that you're here too. You're kind of, to me, sort of an OG of the data space. And so I love people who can provide as much depth and background and historical perspective on all the things we're struggling with in the world of data today that were struggles 20 years ago and still remain with us today, but with different tools and names and things like that. But today we're talking about sort of. Oh, go ahead, Tim.
Tim Wilson
Well, I was going to say, I mean, I having done my little forensic sleuthing that I saw Cindy speak at a TDWI Summit back in 2004, which I think is amazing since we're both like 35 years old.
We just look at Tim. It was like my intrigue moving from technical writing to marcom slightly into web analytics, and that was like my entree into the world of analytics and BI was that conference. So.
Yeah. So I just felt like I had.
I'm a Fan.
Mo Kiss
Big fat is the summary, Tim.
Cindy Howson
Now I feel like I have to send you an original BI scorecard black bear that I used to use as giveaways for class participation. We'll see if I still have one.
Michael Hoeble
That.
Tim Wilson
Now I couldn't remember the topic, but I feel like I was coming back and I was like, I had. There was some other lady who spoke, and it wasn't Jill Desha. And the other person was like, the only. And I cannot remember her name because.
Cindy Howson
We wound up actually Claudia Imhoff.
Tim Wilson
Yes. Like two months later called and had Claudia come out and just spend two days explaining data warehouses to our team. So doing the conference circuit and as a consultant, and you're like, oh, yeah, nobody's really just going to call you up. She's like, what do you want me to do? And we're like, you're smart. Please come. Just sit in a room and answer our questions for two days. And it was. Our dev team was thrilled. And I'm so glad that you can remember that was who it was. Okay. Yeah. Okay, Michael.
Michael Hoeble
Okay, now we're going to do the episode, too.
Cindy Howson
All right, fast forward just a few years.
Michael Kaminski
Yeah.
Michael Hoeble
So nowadays. No.
Tim Wilson
So let's talk about Star schema. No. Yes. Well, I mean, we can start there.
Michael Hoeble
So semantic layers, Cindy, obviously everyone talks about those, but there's a history here. And maybe just to get us started and for people who aren't as familiar with the concept, maybe just a quick primer on sort of what does that even mean? What are those? And we can kind of use that as a launching point.
Cindy Howson
Sure. So in the simplest terms, a semantic layer provides a representation of the business model in business terms to the physical structures in your whatever, data warehouse, data lake, cloud data platform, whatever you want to call it. And it is important that it is in business terms. So if I think about my German language has only served me well. When I look at SAP original tables, VBAP was the customer table in SAP R3. So you could never show a field VBAP to a business person. Instead you would say, this is customer name.
Mo Kiss
And I keep hearing the word. When people talk about semantic layers of context, and that seems like. When you say business terms, terms, is that what you mean? Like the context of the data, how it relates to each other? Like what the definition is? Is that the same thing? Or when you say business terms, you think about something different?
Cindy Howson
No, I think it's both. Because if we get precise something like revenue. Well, revenue in an inventory and supply chain context, I'm going to look at revenue based on when somebody placed an order. But if I take it in terms.
Finance, the Office of Finance, they're going to want to know when that invoice was paid or if I'm doing, you know, from a cash basis or accrual basis. So the context of that revenue field matters. Did that answer your question, Mo?
Mo Kiss
Yeah, absolutely.
Cindy Howson
Okay.
Michael Hoeble
Totally not to head.
Tim Wilson
I struggle with. And revenue is a great example because.
Part of the challenge, it's like we're trying to find a technology or a tool or a process to solve for something where the business, the person in finance, when they think revenue, they're always thinking in a revenue recognition world. And when somebody's in inventory, they're always thinking of it another way and don't even necessarily. And they both may complain that I see reports from the other department and they have revenue and it's wrong. Like, it becomes a.
Business understanding challenge that a data processing technique mechanism is trying to solve. Or am I just being cynical about that?
Cindy Howson
No, it is. I could get annoying and say, yes, it is semantics, Tim. But it is.
But this is where, let's say, and sometimes people conflate data literacy with technical literacy, which is wrong. But we're really talking about what does the data mean in a business context and where does the data originate from. If I'm talking about an order system versus an invoicing system, sometimes that's different. And so a finance person is always going to assume I am talking about when it was invoiced. A salesperson is going to come from the context of when is my commission going to get paid? And so we come to the data already thinking about the data through our own lens, our own business function. And yet they may have very, very different meanings. Even.
Somebody that I was working with, I won't name them, but it was, it was hysterical. We're both working off the same data set. You would have thought. And I'm like, why are your numbers different than mine? Like sales. Here's what I thought the number was. And you're coming up with a different number. And yet in his data set, he only included software licensing and did not include professional services. And I was like, oh, why would you exclude that? Like, I'm really just looking for total revenues related to this particular segment.
Mo Kiss
Can I then ask, do we risk? I mean, I'm going to definitely come full circle on this because it's definitely been a topic that's on my mind a lot. But one of the things that I've seen play out is this Very like precise, I don't want to say business domain, but this very specific interpretation of a metric by a particular area of the business.
Cindy Howson
Right.
Mo Kiss
And I'm going to give the typical example in my world which is like, let's say you have 12 different products. And so then one team is like, well, we're going to talk about video maus and another team is like we're going to talk about search maus. And then another area of the business is, I don't know, template maus. I'm making up all stuff that's relevant to my world, of course. And then we come up with this fundamental problem of if you summed every department's version of their metric, we would never end up with all MAUs. But we also end up with these very precise definitions that might work within the business context that they're in. But then like, I think the thing that I struggle from that kind of like viewpoint is like sometimes I feel like we over, over like orchestrate things for a specific domain and then we kind of can't roll up and think about what's the bigger picture across the whole company when we say MAU, what to me what do we mean? Because they might have had interactions with lots of different products, for example. And like I feel like semantic layers in this are kind. There is an overlap here that I'm sure we'll get to. But do you see that problem playing out a lot? And is that part of why semantic layers are becoming like the new hot topic of the moment?
Cindy Howson
Well, let's go back and say you just used a term mo. If I was a new employee at Canva or at any kind of SaaS startup, what the heck is it MAU.
Michael Hoeble
Or what's a MOU?
Cindy Howson
A madly active user. Well, and so, and maybe I'm even really going to split hairs here and say, well, if I only clicked on the video and so it was a one second interaction, are you going to count me there as a user or should I have actually watched at least two minutes of the content? So we can parse these definitions a lot of different ways. But I want to come back to like why did semantic layers so start.
More than 30 years ago and why are they coming back now? 30 years ago, prior to semantic layers and really business objects patented and won in courts, the first semantic layer and cognos at the time with impromptu had to actually pay a license fee to them. And prior to this you had to code your own SQL. You would have to say some cryptic name.
By vbap.l333 from this table. And that was terrible. The semantic layer gave report writers a way to click on business terminology to generate the SQL. That was the first purpose of the semantic layer. Now as the industry moved to, let's say, in memory tools with the likes of Qlik and Tableau, there's a whole generation of, let's say 10 years, maybe 15 years, where people didn't think about this. They just loaded their data, did maybe one big SQL extract, loaded it into an in memory file. And so they were only working with their subset of data. And so of course the maus meant what I wanted it to mean. And, and there, there was this loss of knowledge about what semantic layers are. Now here we are in 2025 and we're all trying to build agentic AI systems. And what we're learning is that without this context or clear business definitions, we have hallucinations, we get incorrect results. So the more context you give the LLM, the more accurate your answers will be. And that is why I think, well, I think semantic layers have become more important because of agentic AI. But also, let's say before that cloud data platforms and the whole modern data stack have given rise to, to, hey, I don't have to subset my data. I don't have to load just a small data set into an in memory engine. Let me get to all of it, whether it's in Snowflake or DataBricks or Google BigQuery. Let me get to all of it. And so people don't want to move the data, but they do want to trust it, no matter if they're doing a juntiq or not.
Tim Wilson
Well, so that's this whole notion of context and using agentic AI as an example, is it moving down the path? Will a semantic layer help.
AI demand some explicit context? If I ask for, tell me how many customers we had last month, will a semantic layer start to say that's not enough? I know, Tim, what role you're in. I can guess what your definition of a customer is, but I'm going to require that you give me more business context in order for me to find the right.
To pull the right information. Is that a feature of the semantic layer or is that something that's got to be built in the intermediary tool that's using the semantic layer to interface with, with a business user?
Cindy Howson
Yeah, I follow you, Tim. And this is where I think what people want is one semantic layer to rule them all. And I just think that's a fallacy. Will I Ever see that?
I don't ever see that.
What the industry at least right now is trying to get to. And I will also say this is the second, second attempt, maybe the third attempt in the industry with Snowflake's open Semantic interchange is at least let there be a common set of standards so that everyone can interoperate. And that already would be a huge sea change. Otherwise everyone's building proprietary integrations even, I mean, I will say working for ThoughtSpot, ThoughtSpot integrates with the Looker metrics layer and LookML. ThoughtSpot integrates with the DBT semantic layer and that has changed different incarnations. There's a few others that you know, some have built integrations with Cube js, some have built integrations with at scale. There's, there's others, but let's just take those. Well, those are all point solutions. Like we have to keep up with what is DBT's latest protocol, what is Looker's latest protocol. And it would be great if we all just say, here are the approaches that we're going to use.
And so it's all common rather than point solutions. So that is the vision and the hope of Snowflake's Open Semantic interchange.
Mo Kiss
However.
Cindy Howson
So this is a very long winded answer, but we will have separate incarnations. And that I have to say, like every customer conversation I've had about this in the last month, they're like, we only have to have one instantiation. I'm like, no, you don't. You do need separate instantiations. Because every downstream tool and even backend database, they have their own limitations. So if I'm going to create something, a metric called top 10 customers. Well, there's some databases that don't support a ranking function. So even like denodo Virtualization tried to do this for a while and it's like, great. In Thoughtspot we have an object called top 10. Well, if I hit the Snowflake database, it's working on the back end, it's hitting. I'm going to forget which database didn't support it, some variation of SQL Server or whatever didn't support it. Well then denodo is like not working, not giving an answer. Or in Looker we have a very cool visualization. My favorite visualization, a KPI chart. It's too complicated for the Looker metrics layer. So you're always going to have these separate.
Instantiations of a semantic layer because nobody is going to want to dumb down their semantic layer for the least common denominator.
Mo Kiss
Okay, I'VE got to make sure I'm following this, though. Okay, so what we're saying is that, I guess the thing is, like, what I'm observing is that folks seem to want to be pulling their semantic layer further and further up in the chain, right? So, like, you want it to sit less in a downstream tool and more, like, internally. And I obviously have a biased view, but, like, wanting to bring things like semantic layer in house so that you also have your options open about which way you go with whatever AI you choose to leverage. Right. But what you're saying is, like, it's kind of unavoidable that we're going to end up with a semantic sandwich or cake or whatever you want to call it, where you kind of. You might have to have something at one layer. And then when you go to, like, a BI tool or some other type of tool or integration, you might end up having to have a second layer just because they have different, like, features or attributes that you want to leverage. Is that. Am I hearing that correctly?
Cindy Howson
Yes. So, and if by downstream and upstream, you mean the database, people want it closer to the database because that's where the data lives. But then as you get closer to the business decisions, you're going to have derivations and metrics and context that may not exist in the database. And I would also say we also have to think about how these things get defined. So working with one team, they're like, okay, we're going to build everything out in the database. Like, great, so your DBA is going to do all this. Or here I have a really strong SME. And if we bring data mesh operating principles and domain ownership. So I have this great marketing person, and they know the differences between a video MAU or a web click MAU. And I'm going to want them to add a little more context to it. So I'm going to want an easier interface. And guess what? That interface does not exist in the one that was designed for the dba.
Mo Kiss
Okay. But I've got to. I don't want to take things in a totally different direction, but I am, like, dying a little bit.
Cindy Howson
I see Mo.
I opened another can of worms. I can tell.
Mo Kiss
The thing I'm really struggling with, with this whole discussion about semantic layers, it doesn't feel new. And I feel like what you've written about it makes that very. But part of me is also, like, really grappling with. Is it actually the fact that it's not. Net new? Is it the fact that the way we want to use Agentic AI on top of our data? Or is it the fact that we have gone towards this data mesh approach with less. I don't know if structured is the right word, Cindy. You can definitely insert better terminology because you are the queen of exceptional terminology. But we used to have such structured data sets. We had star schemas, they had context. Is part of this just our own doing? Because we wanted to move faster and have less structure in our data. And so this is just the consequence.
Cindy Howson
So that was a two part question. So is it new? Is the semantic layer new? It's not new. It has gotten more robust over time and not all semantic layers are created equal. So I can show you one semantic layer and it only supports a single star schema. Or maybe even worse, it only supports one big table. I can show you another semantic layer and it supports multiple fact tables, different design approaches, star schema, snowflake schema, it supports, it even includes capabilities for aggregate table navigation or query compilation so that the most efficient query path is taken. So not all semantic layers are created equal. And I do think that has changed over time. And for sure the openness has changed over time. So if I go back to the original query tools, whether again, business objects, cognos, whatever, those, those were largely closed. Some boutique consultancies had open APIs to access them obiee their model was open and nobody used it. You could expose it as an ODBC connector to other BI tools, but nobody used it. Performance was not good. So what we have now is definitely more openness. But I do believe it is the agentic part of why we're demanding, why we need them more. It'll just make AI better. The second part of your question was then, are we decentralizing these things? And yes, I think that's part of it too.
Tim Wilson
This makes me feel like.
Cindy Howson
This is.
Tim Wilson
Either going to be just so obvious that it's dumb. Are there people out there who, if some generic person came in and looked at it, they would say, you have built a wonderful semantic layer. And the people who built it would say, I built something that functioned for what I need and I didn't know that's what it was called.
And I guess on the flip side that has me thinking that semantic layer, it sounds kind of cool. It gets treated as this binary. If you have one, things are good and if you don't have one, they're bad. And it sounds like what you're saying is you could try to boil the ocean with one grand semantic layer and it would probably be Bad. We treat it as though there's this label and if you have it, then things are fixed. But there's always the gradations of whether you do it well, well architected and appropriately. And I mean, that probably happens with everything that gets a fancy new label.
Cindy Howson
Yeah, yeah. So.
I don't know, Tim, like, do I want one mega semantic layer? Oh, please, not so. Because it becomes overwhelming to maintain and it becomes. Now maybe if I'm just using natural language.
To ask questions, I don't care what it's hitting on the back end.
Michael Hoeble
But.
Cindy Howson
I would be skeptical.
That that would work. There is a belief that in the industry we're going to go towards verticalization of some of these semantic layers. So there will be.
And maybe this is, I kind of bristle it. We throw new terms out there, ontologies. Well, can we just talk about domains? That makes more sen sense to me and that aligns with the data mesh. But.
But could we have an insurance industry semantic layer? Could we have a marketing web analytics semantic layer? I think we could. I think we could. We would get to common metrics, you know, the physical pointers ultimately back to which table is it hitting? Might change a little bit, but I think that business representation, we could possibly get to that.
Mo Kiss
Okay, one thought, just asking for a friend, of course that has been on my mind is we can approach this from like a business domain perspective, just like the examples you gave. Right. So like you might have one that's more like marketing and acquisition more. The one that's like, I don't know, finance or whatever, insert whatever else business domain. The thing that I kind of keep wrestling with though is are we just doing this again, where we're overlaying our thoughts about what a domain is versus the business user and how they want to interact with data? And what I mean by that is, if I'm a business user, what's my business question? What are the questions that I want to ask? And let's say the theme might be I want to ask a question about our users or I want to ask a question about.
I don't know. Now I'm going to like struggle to think of a comparative example. I might want to ask something about. Someone help me with an answer.
Tim Wilson
A marketing channel.
Mo Kiss
A marketing channel. Sure, A marketing channel. Or yeah, like I want to understand something about how experiments have done. Like, are we doing a thing where we're trying to make semantic layers representative of business domains that make maybe business sense, but don't reflect the way that users and our business users want to interact with data when they have questions.
Cindy Howson
Well this. So to me, if you build a semantic layer that doesn't work that way, what is the point? Go home.
Because you know SQL, you want to code your SQL, you don't need a semantic layer. You might want it for some reusability. But the semantic layer gives the business user the ability to ask the questions without knowing SQL. And then it gives the LLM. So I don't. It gives the LLM more context to generate better SQL. So all these companies that have tried to do text to SQL without a semantic layer, they're largely failing and guess what? They're adding semantic layers so that they work.
Semantic layers bring reusability. That was the original purpose.
And then it is a business friendly interface. And now in agentic AI it's the context for the LLMs to ensure accuracy. So if you're going to give me a semantic layer that is just a bunch of cryptic names, technical names, and it's not giving it to me in a way that the business sees it, it's a waste of time. It's a poorly architected semantic layer.
Mo Kiss
So hypothetically, if you just like took all your YAML descriptions, that probably wouldn't be good enough because it's been written by a data scientist in their domain, their own specific domain, for use by someone who deeply understands their area.
Cindy Howson
Well, if they deeply understand their area, there might be a lot of useful context in there. But if it's a lot of code.
And technobabble, then I think it's going to be less useful.
Tim Wilson
Back on the. I may blend two things together. The referencing snowflakes, open semantic, what is it? Open semantic interchange.
Does feel like.
Michael Hoeble
That.
Tim Wilson
Brings to mind the XKCD cartoon about people complaining that there are 13 different standards. We need one standard and then the next panel is well now we have 14 competing standards.
There does need to be a first mover or a dominant. Is there a race to say obviously Snowflake wants to be.
The owner, the driver of that. And I guess the same thing when you talk about verticalization, say something like digital analytics and you're like let's just have one common marketing digital analytics. Well now you're going to have the players in that are all going to say yes, the way that we think about that data is the way that the industry so they're going to like. How does the effort to try to have some sense of standards not lead to self interested competition to sort of pull the market towards whoever's on point for defining the standard? Or maybe my third example would be the W3C. I mean, we go back 30 years trying to define what HTML is supposed to do and Microsoft could. Doesn't even conform to the W3C standards because, you know. Yeah, so, thoughts?
Cindy Howson
Tim, you just answered that question like Microsoft would love us to revert to mdx, right, instead of SQL for the most part, but it is true. So look at who was not part of that.
Effort. Was databricks invited to the party? Was Google BigQuery invited to the party? Will they invite themselves? Will they become part of it? Standards get adopted based on who leads it, but then also who uses it and who asks for it. So that's where. When I look at how we prioritize.
Our product strategies, we are very much listening to the customer and.
Sometimes we've gone down rabbit holes and I'm like, why did we build that integration? So I won't say which integrations to me were a waste of time, but some of them, I'm like, why did we do that? Because we were trying to. We thought something would have legs to it. We listened to the customer and it never really took off. And then some will change strategies. So we thought DBT's initial effort would take off and instead then they're on version two.
So Snowflake, hugely influential in the industry. We're very proud to be part of the.
Committee defining these standards, but we have to see how broadly adopted they are. The market will decide.
Michael Hoeble
And certainly right now, AI is kind of a forcing function for the industry where maybe that hasn't been or there hasn't been an imperative like that for a lot of companies. Does that seem fair?
Cindy Howson
I think that seems fair, yeah. And there's more willingness to be open and to focus on where you add the value in this data to insight, to action change.
Mo Kiss
That actually triggers an interesting thought. One of the things that I've kind of observed is this push for semantic layers. I don't know, I feel like it's kind of come out of left field. I don't know if that's fair or not. It just seems to have swirled very quickly and. And the products maybe aren't at a state of maturity where they need to be for like what people like. I almost feel like a lot of companies are building as they're like gathering requirements as customers are trying to build out with them.
Do you think that's a fair representation? And it seems like, has this happened before with a particular tool that's had to develop very quickly because of the pressure and I feel like AI is the pressure of like everyone suddenly needs these semantic layers to make AI, quote unquote. Like, do you feel like that's happened with a product development before or is this like a net new thing that data companies are trying to deal with, where they're trying to build at pace while customers are wanting to already leverage and use it?
Cindy Howson
Yeah, so I don't want to sound like a commercial and you can edit this out afterwards. All semantic layers are not created equal now, fortunately, because Thought Spot, whether it was purposeful or luck, ThoughtSpot always generated SQL on the back end. So the semantic layer was always super robust.
So did we get lucky or was it intentional? And the cloud data warehouse and agentic AI has just helped that others have only just started to embark on, on natural language processing and agentic AI. And they tried to do it without a semantic layer and that's why now they're dabbling in it and they're like, oh, it takes a lot to build this. And some of them, they're starting out simple, you know, one big table, that's all they can handle and code based.
And I think about, I think about a blog actually that our co founder, Amit Prakash wrote about four years ago, I think it was, and it was the metrics layer, which is just a subset of the semantic layer. The metrics layer has some growing up to do. Even as a former Gartner Research vice president, I have to give credit to Gartner. They still say that the time to maturity for metrics layers is five to 10 years. That's a long time. Yeah.
Tim Wilson
Well, but so is. How unfair is this parallel to point to Master Data Management as something that I remember having a moment that was, oh, this is. Things are getting fragmented. We need to just do an MDM initiative. And I guess to my earlier point it was kind of a binary. If we do mdm, all these problems get solved. And the companies that were already sort of built where they sort of had MDM kind of under the hood anyway because they'd architected their setup well, could do mdm. The ones that had built kind of a hot mess and were then trying to just apply a whole bunch of duct tape and baling wire to do mdm, like never really got there. Is that a fair parallel or am I too much of a stretch?
Cindy Howson
Yeah. Well, so is it a fair parallel? I would just say it's valid. It's valid. And I remember, so the first eight years of my life in this industry were at Dow Chemical and we had a Master data management system called inca. I don't even remember what it stands for. It was homegrown. And the next, then I worked at Deloitte and I was like, wait, you don't have clean product codes, you don't have a single product hierarchy, you don't have clean customer data. It was a foreign concept to me that not everyone had clean master data. So.
I would just say that.
And Mo, you asked this earlier, so I wanted to come back to this point. Semantic layers right now are mainly for this structured data. But I think there's a time in the not too distant future that it will encompass also the semi structured data. And I would say this data is a hot mess, frankly, because we've never applied all of these data governance and data management disciplines that we have been applying link to the structured data. So I think organizations that had the organizations that are best positioned for the agentic AI era got to cloud, had clean data, had good master data. And then of course the culture and the people change management. If they already did that, they're already, they have such a leg up now we're throwing generative AI, agentic AI, semi structured data, a lot more data that we couldn't get to before. And yeah, it's not that easy.
Michael Hoeble
It's nice to know we'll continue to have jobs going into the future though.
Cindy Howson
Yeah, that's why I'm like, what's everyone worried about not having jobs? They just will be different jobs. Different jobs.
Michael Hoeble
That's right.
Mo Kiss
Yeah.
Tim Wilson
I got one more. This could be a complete non sequitur, but I feel like Cindy could tee off on this. And I want more color because it was from the post that you'd written where the quote was, our industry has also now raised a generation of data analysts who never learned proper data modeling. And I kind of wanted you to elaborate on that.
Cindy Howson
Well, I'm going to say first, tell me if you disagree or not, but tell me if you disagree or not. But I, I follow the work of people like Joe Rice.
And Sunny Rivera, a snowflake superhero.
And yeah, it's. And, and I, I work with a lot of, let's say visualization experts who are just used to one offs, let me load the data and let me visualize it. And they never really learned proper data modeling techniques.
Tim Wilson
Well, so I guess my question is that a way of saying that there are analysts who aren't really actually thinking about the structure of the data and the ramifications for how the data fits together. They're just kind of trying to get to an output. I don't know if I agree or dis. I probably agree because I'm just generally negative. And that's like a negative statement.
Mo Kiss
But.
Cindy Howson
Let'S not take it negative. Yeah, let's challenge these people. To me, empower them. Say, you know what, you're great at visualization and you're great at building dashboards, but if you want to continue to have a career in this space, I want you to learn some data modeling fundamentals. And I don't care who you know, which methodology you follow. Learn some data modeling. That's on the technical side. But also we talk about data literacy. We also need to bring in business literacy. To me, it's not just about.
Where is the data coming from. It is also how is it used and that there really might be two different definitions. I mean, when I. When I talk to somebody in airlines, I don't even. I'm like, oh, wow. I think of on time performance.
Did it leave the gate on time or did it arrive on time? Which one is really more important to you? By the way, when you're crossing international date lines, it gets a little more complicated still. So I would say I want these analysts to learn both the skills.
Mo Kiss
I have one last question. Just hypothetically, if you were into implementing a semantic layer there, what would be like the top three things you'd want to avoid?
Cindy Howson
The top three things. Okay, well, I'm gonna start with I. I'm gonna. The first thing I would want to do. So I'd. I'd have to flip it. Avoid what or what do I want to do? Or you can do.
Mo Kiss
Or you can do the top three things to make it successful. Either way, whichever your brain works, you.
Cindy Howson
Want to avoid bringing in absolutely everything in the physical storage and exposing that to mere mortals, because that'll be overwhelming. So I always start with who is going to use this and what are the top questions they're going to want to be able to ask of it. Not because I'm going to hard code that, but then I'm going to get an idea of the context in which they're operating.
Michael Hoeble
Cindy, wow.
So cool to talk to you. Thank you so much. This has been really, really good. I've got a ton of notes that I've been writing down, so I know that our listener is probably also gaining a lot from this episode. All right, well, let me switch gears really quickly because I need to talk about a quick break with our friend Michael Kaminski from Recast. They're the media, mixed marketing and geolift platform, helping teams forecast accurately and make better decisions. Michael's been sharing with bite sized marketing science lessons over the last couple of months and they'll help you measure smarter. Okay, over to you, Michael.
Michael Kaminski
Multicollinearity strikes fear into the hearts of many analysts and executives, but it's also one of the most commonly misunderstood concepts in analytics. Some amount of correlation across variables is expected in most real world analyses. So it's critical to understand what multicollinearity is, why it causes issues, and whether or not it's a problem for your particular analysis. Multicollinearity means that two of your variables share some of the same signal. This causes problems for a regression model which will not know how to allocate credit between the two variables. This can cause challenges when it comes to interpreting the results of your regression. Let's imagine you're modeling the drivers of home prices in some geography and you want to include home square footage and the number of bedrooms as predictors. These two variables share some amount of signal, namely about the bigness of the house. If you include both variables in a simple linear regression, you'll often get strange results where one of the two variables is highly impactful with a large coefficient and the other might be very small or even negative. Slightly different data sets might even cause the variables to flip which one is positive and which one is negative. This happens because the model doesn't know how to apportion credit for bigness, which is present in both variables. So you get these strange results. So the core problem of multicollinearity is that when there's shared information across variables, a simple regression won't know how to apportion credit between them. This means that you either need to accept more uncertainty in results or try to change the variables you're using to account for the shared information.
Michael Hoeble
Thanks, Michael. And for those who haven't heard, our friends at Recast just launched their new incrementality testing platform, Geolift by Recast. It's a simple, powerful way for marketing and data teams to measure the true impact of their advertising spend. And even better, you can use it completely free for six months. Just visit getrecast.com geolift to start your trial today. Okay, well, we've got that done. One thing we'd love to do is go around the horn and share something we call last call, something of interest that might be of interest to our listeners. Cindy, you're our guest. Do you have a last call you'd like to share?
Cindy Howson
Well, I want to ask a question. If I Can on the last call. And when you think about how quickly our industry is moving and innovating, what do you see as your best method media to keep up? Is it listening to podcasts, reading substack or medium articles, or how do you feel about books?
Michael Hoeble
Are we supposed to answer that?
Cindy Howson
Well, I'm looking for feedback because, you know, even though I'm a podcast host, I'm a writer at heart. And yet is the industry moving too quickly for another book?
Mo Kiss
Yeah, I mean, I can speak for myself. I listen to podcasts and host a podcast. That's a big part of how I stay up to date. But I also, I love books. I'm a book person, probably books more than articles.
Tim Wilson
But you listen to a lot of the books, right?
Mo Kiss
Yes, I do, but that's just because of my life stage of being time poor. I end up listening to books on Audible a lot. Yeah, for sure. What about you?
Michael Hoeble
I would say, I would say my number one source is articles. So in my day to day travels, I'll run across an article and then bookmark it and read it later. So I'll do that. I buy a lot of books and then don't read them.
Mo Kiss
Oh boy.
Michael Hoeble
In fact, that's right behind me.
Tim Wilson
Have you finished, Michael? Have you finished, have you finished the book?
Michael Hoeble
I have not finished your book, Tim. Well, you haven't finished that either here.
So. Yeah, but I, so I don't. Because for me, reading is sort of like an enjoyable pastime and I, unlike mo, I can't pay attention if someone's reading it aloud or audiobooks, so I have to sit down and read it. And then when I do finally get a chance to read, I end up reading like sci fi or fantasy novels instead of business books. So it's, it's a tough one. And then of course, of course, podcasts are very important. I have to believe that. Right?
Cindy Howson
Right.
Michael Hoeble
So there you go.
Cindy Howson
This feels like confessions of a podcast host.
Tim Wilson
That's right.
Michael Hoeble
Exactly.
What do you think?
Tim Wilson
Ton of podcasts and yeah, he does. I listen to a ton of podcasts and I. Very few of them are business or data or analytics related. So I am very much the subscribe to, I mean, a medium substack daily weekly newsletter fiend. Which starts to feel a little overwhelming. But yeah. So with the occasional book, the books feel like a chore though.
Michael Hoeble
Well, I feel like if someone else.
Tim Wilson
Is doing, just to be clear. So I don't listen to the podcast even though I make one and I don't tend to read. I struggle to Read the books, even though I wrote one. So, yeah, I'm the first.
Cindy Howson
So I think Tim summed it up. Wait, are you telling me 2 3rd of our time time spent is like a waste of time? Why am I writing books? And why am I holding hosting a podcast? I'm just gonna get on with building stuff. Okay.
Michael Hoeble
I don't like the data that we've uncovered here. No.
Tim Wilson
I mean, I get a lot of value out of hosting the podcast because we get to have excuses to say, hey, why don't you come on and explain semantic layers to us? So, yeah, that is actually doing a.
Michael Hoeble
Podcast is one of the ways I learn new things. So that's something you could add to the mix. Yeah.
Tim Wilson
So when is your next book coming out?
Cindy Howson
I don't know. Can I take a break from the podcast to stop something? I don't know. I don't know. This is what I was trying to figure out. What should I do next? Yeah, yeah, fair point.
Michael Hoeble
All right, Tim, what about you? What's your last call?
Tim Wilson
Well, I guess follow on. There is a substack that I discovered a couple of months ago from somewhere that is. We have the data. It's kind of silly. It's kind of data visualization candy, but it's. We have the data.net. i think it's a couple of times a week, and it's just kind of like numbwack news, but data visualizations instead. So they're pretty lengthy. They're a collection of often kind of trivial data visualizations, but. But it's kind of a fun scroll in my inbox.
Michael Hoeble
Outstanding. All right, Mo, what about you?
Mo Kiss
I want to do a plug for Cindy's podcast. I was lucky enough to be a guest back in October, and it's called the Data Chief. And as you can tell, I ended up hanging out after the show and picking Cindy's brain for like another 30, 40 minutes about all of these topics, which is why she's here today. And she just has such a range of, like, really incredible guests. It's a really different format to our show, so really encourage you to go check out the Data Chief podcast.
Michael Hoeble
Outstanding. And yeah, we'll put a link to that in our show notes as well, so people can find it pandering.
Tim Wilson
You're supposed to pander at the beginning.
Michael Hoeble
It's fine. We'll pander all over the place.
Tim Wilson
What's your last call?
Michael Hoeble
Well, I'm so glad you asked, Tim. So a good friend of mine, Mary Gates, actually made me aware of this, so informs which I'M sure we're all familiar with. They have an initiative called Pro Bono Analytics. So I'm a big fan of any analytics initiatives that I've been able to be part of them over the years that help nonprofits and allow people to give of their skills in data and analytics to nonprofits and, and mentorship and things like that. And so Pro Bono analytics is a, is an initiative run by informs. And so I just wanted to give that a shout out. I was not familiar with this before, but it looks like a very cool organization. And so if you're a nonprofit and you're listening, that might be an amazing place to partner with them to get help with data initiatives. And if you're a professional and you're working in data and you want to find a way to give back, that might be an amazing way to do that. So we'll put a link to that in the show as well.
Podcast Announcer
Okay.
Michael Hoeble
As you've been listening about on this topic of semantic layers, I'm sure you have thoughts, I'm sure you have questions. We would love to hear from you. Go ahead and reach out to us. And there's three main ways you can do that. You can do that through LinkedIn or the measureslac chat group, or you can email us at contactnalyticshour IO and. And yeah, we'd love to hear from you, Cindy. Once again, this has been a very information rich and awesome episode and primarily because your deep knowledge and expertise in this field. So thank you again so much for joining.
Cindy Howson
Thank you for having me. I feel like we should do this over, you know, a cup of coffee or a glass of wine at some point.
Michael Hoeble
Yes, I wholeheartedly agree. That's how this whole podcast started was cause. Mary, Mary, we're all drinking, drinking at an analytics conference and said we should put this on the radio. No, we didn't see that. Great, great idea. That's right. Another drunken.
Tim Wilson
Great ideas.
Michael Hoeble
All right. Also, if you are somebody who puts and is not directed at you, Cindy, this is back to the audience. If you're someone who puts stickers on your laptops or whatever, we do have stickers and we'd love to send you one. You can actually request one on our website site. So you can go and do that. And then obviously no show would be complete without saying a huge thank you to all of you listeners who go out and share ratings and reviews with us and tell us how you're enjoying the show. So please continue to do that. We look forward to that feedback. We appreciate it very much. All right, as we wrap up, I know that no matter if you're trying to.
Build one ring to rule them all type of semantic layers or if you're spreading it out across verticals, I know both of my co hosts, Tim and Mo, would agree with me. You should keep analyzing.
Podcast Announcer
Thanks for listening. Let's keep the conversation going with your comments, suggestions and questions. On Twitter @analyticshour, on the web at analyticshour IO our link LinkedIn group and the Measure Chat Slack Group Music for the podcast by Josh Crowhurst.
Tim Wilson
Those smart guys wanted to fit in.
Podcast Announcer
So they made up a term called analytics.
Tim Wilson
Analytics don't work.
Podcast Announcer
Do the analytics say go for it no matter who's going for it. So if you and I were on the field, the analytics say go for it. It's the stupidest, laziest, lamest thing I've ever heard. For reasoning in competition, we'll just do.
Michael Hoeble
Our best with it. It, you know, that's why we have an audio engineer. Hi, Tony.
Tim Wilson
Hi, Tony.
Rock Flag and semantic layers are 30 years old.
Release Date: December 9, 2025
Host(s): Michael Helbling, Moe Kiss, Tim Wilson
Guest: Cindi Howson (Chief Data & AI Strategy Officer, ThoughtSpot)
This episode dives deep into the persistent and evolving challenge of semantic layers in analytics—what they are, why definitions of even simple business metrics (like “revenue” or “active users”) create confusion, and how semantic layers play into the future of generative AI, data mesh, and modern data architectures. Featuring Cindi Howson, a long-time thought leader in business intelligence and analytics, the panel debates whether the dream of a single “semantic layer to rule them all” is visionary—or a fallacy.
“In the simplest terms, a semantic layer provides a representation of the business model in business terms to the physical structures in your...data warehouse, data lake, cloud data platform, whatever you want to call it. And it is important that it is in business terms.” — Cindi Howson (04:31)
“A finance person is always going to assume I am talking about when it was invoiced. A salesperson is going to come from the context of when is my commission going to get paid?” — Cindi Howson (07:26)
"Prior to semantic layers...you had to code your own SQL...The semantic layer gave report writers a way to click on business terminology to generate the SQL." — Cindi Howson (11:24)
“The more context you give the LLM, the more accurate your answers will be. And that is why...semantic layers have become more important because of agentic AI.” — Cindi Howson (13:00)
“What people want is one semantic layer to rule them all. And I just think that's a fallacy.” — Cindi Howson (15:08)
"You want to avoid bringing in absolutely everything in the physical storage and exposing that to mere mortals, because that'll be overwhelming." — Cindi Howson (43:25)
“If you build a semantic layer that doesn't work that way [reflecting how business users ask questions], what is the point? Go home.” — Cindi Howson (28:15)
“Our industry has also now raised a generation of data analysts who never learned proper data modeling.” — Cindy quoting her own writing (40:00)
“We talk about data literacy. We also need to bring in business literacy. To me, it's not just about where is the data coming from. It is also how is it used and that there really might be two different definitions.” — Cindi Howson (42:16)
There’s real risk that attempts to standardize semantic layers result in even more fragmentation (the “now we have 14 standards” XKCD effect).
“We have the XKCD cartoon...There are 13 standards, we need one more—and now we have 14 competing standards.” — Tim Wilson (30:18)
True adoption will depend on both vendor coalitions and customer demand. Snowflake, ThoughtSpot, DBT, and others are moving, but ecosystem buy-in is uncertain.
Cindi’s tips for success:
“The more context you give the LLM, the more accurate your answers will be. That is why semantic layers have become more important because of agentic AI.”
— Cindi Howson (13:00)
"What people want is one semantic layer to rule them all. And I just think that's a fallacy."
— Cindi Howson (15:08)
“If you build a semantic layer that doesn't work that way, what is the point? Go home.”
— Cindi Howson (28:15)
“Our industry has...raised a generation of data analysts who never learned proper data modeling.”
— Cindi Howson (40:00)
“…You want to avoid bringing in absolutely everything in the physical storage and exposing that to mere mortals, because that'll be overwhelming. So I always start with who is going to use this and what are the top questions they’re going to want to be able to ask of it.”
— Cindi Howson (43:25)
For more: