
We're seeing the title "Analytics Engineer" continue to rise, and it’s in large part due to individuals realizing that there's a name for the type of work they've found themselves doing more and more. In today's landscape, there's truly a need for...
Loading summary
Michael Helbling
Before we start the show, we have a special announcement. This fall, the Analytics Power Hour crew is headed to Measure Camp Chicago.
Mo Kiss
That's right. Even your co host from all the way in Australia will be there on Saturday, September 7th to join in all the unconference Measure Camp.
Michael Helbling
Fun.
Val Krul
I'm so excited that we're all going to be together. Well, except we'll be missing Josh, but we'll have him there in spirit. But I'm curious, I've never been to a Measure Camp. What's it like?
Dumke Dewald
What's it like? Okay, well, I've been to one of them in Europe and I've been to, I think, all of the ones that have been in person in the US and to me, kind of the most iconic feature is that the schedule is created on the day of the event and everyone who attends is encouraged to actually lead a session based on whatever they're finding most interesting or most useful or even maybe what's kind of vexing them the most of late. So it's really all about an exchange of ideas and having some really in depth and rich discussions with your peers.
Mo Kiss
I've also been to quite a few and I've also helped with planning the one we run in Sydney. And the truth is, it's just phenomenal. It's better than Christmas Day, honestly. And one of my favorite parts of Measure Camp is that they're held on a Saturday so it doesn't interfere with your work and the tickets are always free.
Michael Helbling
Yeah. And I loved my experience at Measure Camp Austin earlier this year. I mean, it was so accessible to everybody and it was so fun. Okay, so what are we going to be doing there?
Val Krul
So we're gonna be doing a couple things. So the first is we're gonna have a room booked for us all day long where you can stop by and visit a couple of the co hosts and talk about what you've been talking about throughout the day or maybe one of the sessions you're presenting. And we're also going to have a couple questions posted up on the board day of and you can come in and give us your answer to those prompts. And then at the end of the day, during the happy hour, we're also going to do a short live show.
Dumke Dewald
Will there be shots?
Val Krul
So mark your calendars for Saturday, September 7th at 9am at the Leo Burnett Building, downtown Chicago, right on the river and just a couple of blocks from Michigan Avenue. Get your free tickets now by heading to bit lyaph dash Chicago. And start thinking now about what you Might like to present or talk about.
Michael Helbling
Awesome. We're headed to Chicago, but now let's start the show.
Josh Cohearst
Welcome to the Analytics Power Hour. Analytics topics covered conversationally and sometimes with explicit language.
Michael Helbling
Hey, everyone, welcome. It's the Analytics Power Hour. This is episode 251. You know, they're everywhere now. It's like the hottest job in analytics. It's the analytics engineer. Even at companies where they already have analytics engineers, it always seems like they could use a couple more. The analytics industry has been in a state of near constant evolution for more than 20 years now. And in the last five years, we have seen this role grow massively. So we want to talk about it. So in light of that, let me introduce my co hosts, Val Krul. How you doing?
Val Krul
Pretty good.
Michael Helbling
Awesome. And Mo Kiss. How are you?
Mo Kiss
I'm doing wonderfully.
Michael Helbling
Yeah. Mo, you started talking about this before any of us, so I'm really excited to bring this full circle. We can all act like we're on board now, but you were definitely the one pushing this on the on us as a podcast. And I'm Michael Helbling. We also have a guest today to join us in this conversation. Dumpke dewald is a senior analytics engineer at Xebia and he's held numerous other analytics engineering and data roles throughout his 10 years the analytics industry. He is a co author of the new book Fundamentals of Analytics Engineering. And today he is our guest. Welcome to the show, Dumke.
Josh Cohearst
Thank you, thank you. Happy to be here.
Michael Helbling
Well, we're glad you could come too. This is something that's pretty exciting and we want to keep covering it because it's definitely grown so much and it's so a kind of a really important part of the analytics ecosystem. But I think maybe a great place to start because, I mean, there's a number of our listeners who are analytics engineers, but there's quite a few who are not. Maybe, maybe we'll start just with from your perspective, a definition of what is analytics engineering.
Josh Cohearst
Yeah, this is a really good question. And I will say that this is not like the jury is not out on this yet. So what we define in the book is really that the analytics engineer is kind of a bridge between business and data engineering. And I think traditionally the way I compare it is back when building websites was like a new thing, you had this concept of like a full stack developer, some person that did everything like designing the website, building the front end, building the back end. And then as time goes on, really what you see is that these websites get bigger and bigger and so no one person can do the job anymore. And so similarly in analytics and in the data field, what you see is that traditionally you had this data engineering role where people would ingest data and also like make it available in a data warehouse. And then you have your BI team or BI developers that kind of then work with that data. And every like request for changes, for changes to a data model or changes to new sources would have to go through that data team. And kind of what we saw in the last couple of years is that as companies or as the data teams at companies grow and the number of companies using data and analytics is growing as well, you see this new kind of role emerge of analysts who are also more technical and take up these best practices from software development, like version control in their way of working. And so that is also when tools like DBT or Data Forum come up that make that kind of development workflow a lot easier and kind of facilitate the growing need for data analytics at companies.
Val Krul
So one thing that would be super helpful for me, because that was a great definition, is could you talk about the differences between that and a data engineer? Because I have to admit that sometimes I feel like I get those confused or understanding the differences. I would love to hear your thoughts.
Josh Cohearst
Yeah, that's, that's a really good point. And so what it means for me is that a, a data engineer is usually a person that will do like a little bit more low level stuff. So they will set up ingestion. So basically taking data from a source system to a destination system, which is usually your data warehouse or a data lake, they will either use existing tools or connectors, something like FiveTran or Stitch or similar tool to get data from, from point A to B. Or sometimes they will write custom connectors to facilitate the process. And that's something I often see for like old on premise systems where you want to get data into your data warehouse. And so they do a lot with that kind of stuff, but also like the network connectivity and then kind of like input validation or basically validating your data that's coming in. And then on the other side you will have your analytics engineer and they will be working more actually in let's say SQL or in a way where they kind of take that input, like the raw data that's coming in and they make it, they map it to their, their business processes and kind of the architecture of their organization so that in the end you will have models coming out that are more meaningful, semantically meaningful to the people that will actually do the analysis. And so what I see is that for an analytics engineer there's a lot of knowledge required around data modeling which when I first like started into this space I was like how, how hard can it be? But it's the same like for, for any analyst, right? You, you think like how hard can it be to really answer this question? And then you come into all these like different types of processes and like between teams in an organization the processes are different and so the definitions are different. And basically what analytics engineering is trying to do is kind of take that a little bit away from all the different analysts and put it into one central place so that everyone can use that same definition. And that requires a lot of data modeling, data modeling techniques to make that happen basically.
Mo Kiss
Yeah. It's funny, I, I feel like I have been on this real journey myself like having worked at quite a, I guess a tradition, I don't know if traditional is the word, but like previous version or iteration of job titles where it was very much like a star schema with a data analytic data engineering team and like we would put in requests to. Then you know, I probably went through a phase where I actually just thought the change from data engineering to analytics engineering was like oh they're just calling themselves something different, um, but it's the same job. And actually I, I did really enjoy reading the book that you co wrote, the fundamentals of analytics Engineering and I probably didn't sufficiently grasp like the difference of skillset as well. Like even particularly from like a language perspective of like Scala and Java and stuff and the, I guess the analytics engineer being so much more SQL based and so much closer to the business. And I think that that is probably the biggest difference that I have personally noticed is that like when we had that previous model the data engineering team were quite like they sat so far from the business that they'd like they'd produce something but they kind of didn't understand like the reports that you wanted to build with it or how you like the measures that you wanted. And so you'd end up like just kind of going back and forth a lot. And I feel like analytics engineers have really filled that gap of like once the data is taken into like how to actually get it to the state that the, the end user and I'm calling the data scientists and data analysts in this case an end user. But I'm sure that's up for debate too. And yeah, I just wanted to say how, how much I enjoyed like learning I guess more the detail about some of these, these topics that I guess I took for granted. I don't know, there wasn't really a question in there.
Josh Cohearst
Well, thanks, but, but it is a really interesting point. Right. And, and I see this a lot in. So I work for consultancy, so I see a lot of different businesses from, from the inside basically, and they all have their different processes. But you do indeed see that kind of. The analytics engineers or the people that are most suited to be analytics engineers are like the technically minded analysts because the data engineers usually come from a software development background. And that's not to say like this is generalizing. Right. So it's not to say that you can't have either of these roles to do different things. But what you see is that kind of in software development there's just this traditional way of working where you just, you get your JIRA tickets and you work on your ticket and that's it, and then someone else checks your work. Whereas in, in the analytics space, I think people have been kind of forced to find ways to answer their question. They have this, this kind of urge to answer their question and they find ways to make that work regardless of whether the data is available in their system currently or not. And so that kind of mindset of think, hey, my stakeholder really needs an answer to this question to make a well informed decision. That kind of really helps to then say, you know, why can't we do this a little bit faster? Why can't we do this in this way or that way or hack it together a little bit? And I think that really helps in this, this type of work to really think along with your business stakeholder about how to make it work. Instead of saying, you know, file a request with my product owner and he'll handle everything.
Michael Helbling
It's time to step away from the show for a quick word about Piwick Pro. Tim, tell us about it.
Dumke Dewald
Well, Piwick Pro has really exploded in popularity and keeps adding new functionality.
Michael Helbling
They sure have. They've got an easy to use interface, a full set of features with capabilities like custom reports, enhanced e commerce tracking, and a customer data platform.
Dumke Dewald
We love running Piwick Pro's free plan on the podcast website, but they also have a paid plan that adds scale and some additional features.
Michael Helbling
Yeah, head over to Piwick Pro and check them out for yourself. You can get started with their free plan. That's Piwick Pro. And now let's get back to the show.
Val Krul
So I know we've jumped in to ask all these definitions and the differences and comparisons, but in the book, there is this amazing supermarket analogy that I think our listeners would love to, to hear. So if you wouldn't mind walking that through. I, I would feel selfish if I only read it myself. And we didn't have you walk through it on the show.
Josh Cohearst
Yeah, no, I think that that is a really, really interesting one. And full credits for that to my colleague Ricardo, who came up with it. So what is interesting in that analogy is that if you imagine a supermarket, right, there is, there's a lot of different elements that happen to get your, your groceries and basically the whole process before you have your groceries in your, in your kitchen. So in a supermarket, you could have, let's say, an analyst who wants to understand a little bit more about what articles are being sold, maybe what articles are being sold together or like where to find new articles. And you can imagine that like a data engineer, the articles, the groceries will have to come, the fresh produce will have to come in some way. So your data engineer could basically bring in those, you know, the fresh produce from their source. But that is not enough because then you have this analyst who says, hey, I have this idea about how to organize my supermarket. And then there's this guy who shows up with a truck, says, here's a bunch of fresh produce. But actually what the analytics engineer kind of does is be that man in the middle to kind of organize your shelf space. Think about how can I facilitate that. The shelves are also organized in order from like first in. I never, I never worked in a supermarket, to be honest, so not too familiar with this, like first in, first out stuff. And so you can imagine that there are different levels to think about how this produce and these, these items in the supermarket go from like one step to another. And yes, it's very important to have that analysis. How are my customers navigating the supermarket? What are they actually buying? What do they need? And can I kind of predict that? And it's super important to have that produce coming in. But you can really make a difference as a supermarket by bridging that gap between, you know, we're not just going to put everything out. We're going to have someone who thinks about how to organize it, how to model it, think about how is that customer going to walk through the store and where to put stuff in the right place. So that, yeah, that to us is kind of the, the role of the analytics engineer.
Mo Kiss
So like I said, I, I did feel at some stage there was definitely a misconception by me that data engineers and analytics engineers were the Same thing. And I think that has definitely shifted even through my own experience at work. But I guess one could argue that like the scope then has had to narrow for both specialties, right? Like almost like less full stack and more like specialization. But then I also think, particularly because I have three analytics engineers in my team in marketing, it's like that I also look at like the breadth of tools in marketing and like the modern data stack, how complicated it is, like the privacy and governance stuff. And I'm like, I, I, I also am like torn by yes, there's specialization, but then there's like maybe more complexity or difficulty. Like do you feel that tension as well?
Josh Cohearst
Yeah, yeah, that is, that's definitely been a key point in like the things that I think about in terms of or especially what I want to like help people understand when I'm, when I'm on a job. I want them to be able to kind of understand the space that they're in and the problem in that space in their organization and find the right solution for that. So analytics engineering is basically a solution to a problem that didn't really exist before. Right. So it's only come to existence because in the last years and I think part of this, this is maybe even due to like Covid accelerating businesses being online and growing in their like data space. So there's a need for like bigger teams and more structured way of working. But that doesn't mean that you cannot have a single person data team in let's a startup or a very small company that can do a lot of work with the tools that you currently have available. So one of the things where I think you can really see this, and we discussed this in our chapter on observability and data quality, is that you don't necessarily have a problem with data quality at first. And so it doesn't always make sense to say from the start that I need this data quality or observability tool that does everything for me and then you pay a ton of money to make that work. It's only when like you see that something really influences the way you make decisions in your organization or like the quality of those organizations or of those decisions is impacted that you start to think about, okay, well maybe I do need a data quality or observability tool, but then you can start to think about do I actually need observability for everything that I own within my stack, or is it just that, you know, I need to write some tests in DBT to say like, hey, I want this the output of this column needs to be unique or there don't need to be any null values. Like you really need to think through the problem that you have and then identify the tools that will help you solve those problems. Now I will say that identifying those tools is a lot harder than it seems at first because there's just so many of them. Right. And part of the job is, is keeping up with that in a way. But what I do, what we do try to do in the book as well is to kind of say, like, it's not always about the tools, even though like some of the technology changes have been very revolutionary. We can talk about that later. But it's not just about the tools. It's about certain concepts or problems that those tools solve. So for example, getting data from a source system to a destination system and specific types of data like marketing data or sales data, those are problems that many companies have, right? And they have the same exact problem. And so you see these ETL vendors like fivetran and Stage and Talent fill up that space. And so if you understand that you as a company are not necessarily a unique snowflake in that sense, but that your problem is similar to what other, the problem that other companies have, you can start using tools that cater to the needs of many and those can be way more efficient than what you could do on your own. And I think this is actually very similar what you've seen in like the web analytics space as well. So I've been to a few companies that try to build their own web analytics tool and yes, it works, but it's just way more efficient and effective if you take one of those off the shelf web analytics tools because you just can't think of every possible use case out there and your requirements will change and your team will change and it's, it's just not your core business. So yeah, I think for a couple of these elements, data ingestion, data warehousing, quality and observability, you want to identify the problem that you have and just go and look out there to see what's already available and identify if it solves your problem.
Val Krul
Love that. So I have a quick question. Just on like org structure, kind of based on what you were just sharing there because unfortunately I haven't had the pleasure of working directly with the team with someone called Capital A Capital E analytics engineer. Although now that I understand a little bit more about what it is, I think there was some people who had that as a part of their role. But you started to Tee this up a little bit too about the endpoint and the end users. So I thought in my head traditionally that one of the first analytics roles that you would hire if you were hiring a team, if your business saw this need, would be the analyst and that they would be working with people in it similar to what Mo said. But do you see like even when you're talking about the business, I wonder if you mean even sometimes some of the analysts that sit inside of some of those business roles. And so maybe the first hire of the team could be an analytics engineer and that would be that layer that they're the center of excellence of those analysts. The analytics team could really be stocked more with those engineering roles servicing the analysts that are embedded within those teams. Or I'm just curious if you could talk a little bit about how you've seen like the. Org models of different teams change or organizations who are really ramping up their hiring of these analytics engineers. Like where they're even sitting inside the organization. Is it in a marketing org like at Canva or is it within it would just love to hear all your thoughts and reactions.
Josh Cohearst
Yeah, that, that's really interesting. And, and I think I've seen them all over the place to be honest. So maybe to start from kind of the, the smallest possible organization. Right. So it's fine sometimes to start with something like emailing Excel files around and you will have an analyst who will analyze those Excel files. And Excel's been great because it, you know those problems with, with one tool and then as the team starts to grow you see that indeed you will usually have some kind of data engineer because it, it gets harder to get in source data and those will often be in IT teams on at first at least and then as it grows you get more analysts and then you know, you get the analytics engineers in between. And those could be. Traditionally I've seen a lot of like analytics or more like the BI people at the finance departments. But organizations that come from let's say a, a more web oriented approach, they will have their analysts or they will have a lot of analysts in the, the marketing and, and like web and sales team. And so sometimes you see that that's where the kind of analytics and analytics engineering organization tends. And this is actually a really interesting point because we've also seen companies, organizations where you get kind of two teams growing in different ways and they start competing with each other. Right. So you have a BI team that is building out their own BI needs and finance dashboards and that kind of thing. But then you have a marketing team that has a very strong analytics approach and sometimes they're further ahead and sometimes it's the other way around. But there needs, they need to find a way to consolidate these, these different platforms and that, that can be very interesting, so to say. But yeah, I think that in general what you see is they can be in different departments and in the end every department will usually have some kind of analytics role for reporting or dashboarding. And for some departments like marketing, like finance, those tend to grow a little bit bigger than other departments. And then you can also like when you go up another level, you can get into the whole data mesh thing where all these departments are so like independent of each other but have like shared definitions or at least agree on definitions of their outputs through like data contracts. But that gets very complicated very quickly and I don't see a lot of organizations really doing that very well.
Val Krul
Not quite ready.
Mo Kiss
Would you have seen more sort of centralized or more embedded models like which would you say the industry is kind of leaning towards, from your experience?
Josh Cohearst
I might be a little bit biased because we get brought in to centralized teams and like build central data platforms. So that is definitely kind of biased. But I do see a lot, I see a mix really. Like when I, when I speak to others in the analytics engineering community, there is, there's a lot of decentralized, like I said, I think a lot of it comes from let's say marketing and web analytics as well or from like BI teams. And in the case of BI teams they're either centralized or with the finance organization. So it's really a mix.
Michael Helbling
Yeah. And it seems like a lot of companies are growing sort of organically into this as opposed to more top down planning a lot of times. And so you, you end up with little spots here and there it feels like. And it's fascinating you brought that up Dunkey about how like finance analytics teams or traditional analytics organizations and sort of like these newer ones with maybe marketing data, they really have a lot to do for each other, but somehow never met before in most companies. And it, it keeps blowing my mind. Like I'm like well you have this whole other analytics org, why aren't you talking to them? And they're like, oh yeah, well, they just don't know anything about what we do. And it's just one of those things. But I hope as time goes on we'll see that combine or coalesce into more robust analytics organizations that are holistic in their approach to the business. I do understand it To a certain extent, because the data sets that people are working with and the methods and things like that are much different on different sides. And so, you know, if you're a data scientist on the operational side or the finance side, you're not solving the same problems as a marketing data scientist or a marketing analytics engineer. And so they're a little bit different. But yeah, it's really strange how they, they're not well connected usually.
Josh Cohearst
But this is a really interesting point to me and I think what I've seen. So a lot of us come from like a web analytics background, right. And we're very familiar with more event driven data.
Michael Helbling
Streaming data.
Josh Cohearst
Yes, streaming data as well. And interestingly enough, like all these data engineers, or not all of them, but a lot of the data engineers are not as familiar with that type of data. And even like the BI teams are also not as familiar with the type of data. So I do also see, and this is really a change in the last couple of years, that these different worlds come together in a way where all of a sudden the data engineers also need to quickly learn about how does all this like web analytics and event streaming data pipelines work and how can we integrate it into these more kind of bigger static types of data? And how does that affect like the way we view our customer? Because all of a sudden like a customer is no longer just a row with like, this is my customer and this is their revenue and five orders. Exactly. Yeah, yeah. And now you have to think about, okay, so if I have like 2000 events for this customer in the last week, how do I like make a meaningful aggregation of that over a certain period of time? And that then requires data modeling that these data engineers are not familiar with. And yeah, so there's a, it's like a big pot, I think, feel that's being stirred at the moment. And the dust really hasn't settled on how this should be organized.
Michael Helbling
And I think that is sort of a hook into something else we can talk about in this space, which is sort of this transition from sort of traditional ETL to what's now being called elt. And maybe you could talk a little bit about the terms around that as well as sort of like, why is that transitioning happening?
Josh Cohearst
Yeah, exactly. And so to me, this is the fundamentals of this is really understanding a little bit about the technology that kind of provides this change. And so back in, let's say 2012, what you would have is if you wanted to bring in like large amounts of data, you would need some kind of distributed system for transferring that data from place A to B. So you'd have to split up that data in different chunks and like different computers, different servers basically would be able to move that data into your data warehouse. And your data warehouse would be oriented row by row. So like your 1 million transactions would go in row by row. What we're currently seeing is that there's been two really big changes. And the first one is that we went from these row based storage systems to column based storage systems. And why is that important? Well, the important thing about that is that when you physically put stuff together on your hard drive, so you can imagine a row with a transaction id, a transaction amount, and a lot of details about that transaction. If you put that block by block, you can see all these rows after each other on your hard drive if you want to aggregate those transactions. So let's say we want to have the sum of the total transaction value for a customer or for all our customers, or for all our customers in a specific segment. You would have to skip like lots of columns in between. So what the bigger analytics data warehouses decided to do really is to put to create a columnar format where you have all the values of that specific column together. And so all of a sudden your hard drive is just very fast at going from the first transaction to the last transaction because they're all together in the same space. And so that paved the way for BigQuery for Snowflake to make use of that. Of course, then you get all of these data warehouses in the cloud, so to say, so they're way more accessible for a lot of customers. And the last kind of trend that we're seeing with a database like DuckDB, not sure if you heard about it before, isn't it super important? But what they're trying to do is to say, like, hey, we used to have computers that were not as powerful as we needed them, but we've kind of abstracted away the whole computer layer. And right now computers are basically so powerful that you can have one server with a giant amount of memory, a giant amount of processing power, and it's more than enough to accommodate for like 95% of all analytics use cases. And if you have like a top 5% analytics use cases, you're probably Google or Netflix or Meta, and you have the engineering capacity in house to facilitate that. So I think we hopefully see a little bit of a, kind of a trend where things get a little bit simpler because you just have, you don't have to think about how to distribute, distribute your workloads. And I've done this in the past with like tools like databricks where you need to actively think about is the kind of analytics function or computation that I want to do here. Is it like more memory intensive or more compute intensive? And how do I make sure that my cluster skills are perfectly, perfectly for that and I don't spend too much money on it. And those are all like very low level types of thought processes that I don't necessarily want to be bothered with because I think a lot of the good tools out there just abstract that away. So hopefully that gets a little bit easier. And so that's been, yeah, basically the three kind of trends that I've seen in the past, like these columnar databases move away from distributed systems systems right.
Mo Kiss
Now that has then shifted though the order in which we ETL or ELT because of the.
Josh Cohearst
Yeah, exactly. Yeah. So that was the original point, right? Yeah. So now what you can do, because you were restricted on that initial load of your data, you had to think about what types of transformations do I want to apply because I don't have the capacity to duplicate my entire source system. Well, turns out that right now we do have that capacity. And so you don't need to think about all the use cases you have in advance. You can just bring in all that data. It's still good practice to think about. Like don't bring in too much, you know, you want to prune things a little bit, but you can have like a non destructive mechanism on top so you keep that original data and then you build your transformations on top of that. So that's why you do the extraction and the loading first and than the transformations. Because then when someone says, hey, actually we made an error here or like we want a different type of transformation, you can just go back in and say, let me adjust that for you. And you run your, your SQL code again and it just uses the same data, but it builds the new system with the new transformation in it.
Mo Kiss
And the funny thing was the thing that has been kind of rolling around in my mind for like the last, last period of time is about like privacy and governance. And I'm like, wait, why does it sit with the analytics engineer? And I like, I don't feel like I ever had an answer that like clicked in my mind. And I feel like now reading your book and also understanding like the difference between it, like I knew that it was ETL and now it was elt. No wait, it was ETL and now it's elt There we go. But like I couldn't articulate and understand why, and I didn't understand why that was would then mean that analytics engineers would be managing a lot of that like privacy and governance piece, which I think was like a really interesting takeaway for me. Here I go again without an actual question.
Val Krul
I mean the number of visualizations that I saw talking about the differences between the two and my brain being like, okay, but yeah, that made so much sense. So I appreciate that description of that walkthrough.
Josh Cohearst
I'm glad. But yeah, but to your point, Mo, about the privacy part, I think it's not necessarily the responsibility of the analytics engineering team to take care of privacy, but they are the ones that can identify where privacy issues might arise. Right. So what we do a lot is just have sit down with a data owner who is usually that's the team that provides the data or is responsible for inputting data if it's for example a sales system. And then we can say, hey, actually these here are email addresses or they're like home addresses for people. So we need to apply some kind of masking strategy so that the analysts from other departments or you know, other teams will not be able to see those values and we still might want to be able to use them for certain computations. So our underlying system needs to be able to see them, but the analysts within a specific group or a specific team are not supposed to see them. And so that is a very technical challenge in that sense. And also it requires kind of a process where there, there is one person or a role that identifies this specific field in the database and assigns a value to it to say like hey, this should be masked or this should be deleted after X amount of time, for example.
Val Krul
Super interesting. The one thing that's in the back of my mind how people kind of find themselves into this role because I saw you wrote somewhere about how so many people in the analytics field, regardless if it's on the data engineering side or analyst side, that we really pride ourselves of being self taught in a lot of ways and learning on the job and reading those blog posts. And so I think you said something about like this is the book that you could have used five years ago to help give you some of those like hard skills or explain some of those concepts. But I'm curious your thoughts on, on just that and like the role that this book can play to help people who are interested. But also I'm curious, my second loaded question here is do you see a lot of people Coming into the analytics engineering role from data engineering roles or more from the data analyst side, I'm curious, like which one is kind of merging and. Or if it's completely different, you know, background or industry altogether?
Josh Cohearst
Yeah, it's a really good question. I think what I see is that a lot of people already have this type of role. They're just not aware that there's a name for it yet. So that's the kind of first group of people. And I've had quite a few people who come up to me even before we organized analytics engineering meetups here in Amsterdam. And so people will come up and say, hey, this is really interesting. I've been doing this. I just didn't know there was like a name for it. But you know, having a name for it makes it easier to find resources online, find other people. So that's been one part of it, which I think is really great. You create this kind of community around it. And then you also have, indeed from both sides, from data engineering or analysts. You will have people that say, actually I kind of like this more technical stuff as an analyst, so I'm going to go in that direction. Or as a data engineer who says, you know, I kind of miss that connection with, with the business, with talking to stakeholders. So I see that quite a lot as well. I think in general it's, it's easier for people to say, hey, I, I like talking to stakeholders and I'm, I'm going to add, kind of brush up on my technical skills. But this also depends really, right? So some of the data engineers are more like traditional software engineers where you work in big teams and you're just comfortable in your team and having assigned tasks for you. But I also see a lot of data engineers that go in the other direction and say, like, hey, I just enjoy the best of both worlds. So it's not, to me, it's more of a. Maybe that's my final answer to your question is it's more of a role that doesn't necessarily apply to one single person. So one person in an organization can be, let's say, both a data engineer and an analytics engineer, as in that role. And you can kind of play around with the terminology there. And so it helps to shape a role. And you can as a person identify if that is the type of work that you want to do. And I hope for people it will give them direction to say, hey, this is the type of work that I want to do. And now I have kind of a guide to understand which kind of direction I need to take what kind of skill set I need to add to my, to my skills and VAL in my team.
Mo Kiss
Well at CANVA in general we've had lots of data scientists who have started realize that like our data scientists actually do a lot of analytics engineering. Like now that we have analytics engineer they tend to do like layers further down, like more like model and report layers, less transform and source. But we've had quite a few of our data scientists are like who were like really enjoy the work in the data warehouse, really enjoy data modeling who then transitioned from data science into analytics engineering. It's been yeah really interesting to see it kind of build out as well as a function actually. Now selfishly that is a great question I should ask on behalf of some of the analytics engineers at my work. Dunkey, it does sound like you have a great community where you are in Amsterdam. But like one of the things that I have found that is really difficult is professional development because like the industry, well the role itself, like what was it like five years ago, six years ago that it kind of became a thing. And so we have really struggled with everyone's kind of progressing at the same level. I mean there is like obviously lessons we can take from data engineering or analytics or whatever or traditional software engineering. But I've definitely struggled with how to help people with their professional development in a space that is quite new. What, like what, what advice do you have?
Josh Cohearst
Yeah, it, it's, I mean of course I can say by, by the book that's kind of cheating but there is, so there's a couple of areas and this again it depends on the sort of profile that you already have. Maybe you know Python for example, but you don't know SQL or you know Python and SQL but you're not as good with like managing stakeholders or like setting up workshops with your, with your stakeholders. So it depends a little bit on where what you as a person have already in terms of skill set. In general, I think what we like what we look for, for example when we, when we hire new analytics engineers in our team is we want them to have an idea of what software best practices are. So that's like version control and testing your, your code. We want them to have an idea of what data modeling means and how to apply that in SQL. So have an understanding of like what a star schema is, what dimension effect tables are, maybe like some other types of data modeling. If you want to really sort of show off. Then we kind of look at what is your experience with cloud Computing, because I think that's not for everyone. But it is good to have an idea of where does my code actually run, where do my transformations run, and what does it mean to do that in a data warehouse like BigQuery or Snowflake? How do I set that up? How do I manage that in terms of costs and permissions? And then on the other side, it's more like consulting skills. So are you able to ask the right questions? Are you able to really work with your stakeholders to kind of summarize their needs and requirements? And are you kind of entrepreneurial enough to set that up yourself and manage your way around an organization? So that's kind of the. The five pillars that we use for assessing, like, analytics engineers.
Val Krul
That's a great summary. So I think I saw that each of the authors had different chapters that you were kind of responsible for in the book, depending on your expertise and, like, your areas of interest. So I'm curious, which ones were yours or which ones were you, like, super excited to write about? And a part of me, like, wonders, were you writing this book as, like, a letter to your former self of, like, these are the things that you're gonna, you know, fall in love with with this role. But I'm just curious, like, what parts of the role, you know, excite you the most?
Josh Cohearst
Yeah, that's. That's an interesting one. And. And so to me, I think what was really great about having a different set of authors is that what I already said, right? It's. It's almost too hard to do this as a single person. So especially if you want to have the latest and best of everything, you need someone with a. With a specialty in, let's say, BI or dashboarding or history of data warehousing or that kind of stuff. And so in that sense, it was really great for us to come together as a team and have discussions about the structure of the book and how to. How to do that. And so for me, personally, I wrote the chapter on data ingestion and on data quality and observability. I think, especially for me, like, data ingestion has been really interesting because I've. For the last 10 years, I think I did my job pretty well. But I've always struggled to understand why do we need to move data from point A to point B in the first place and, like, what's happening in between. And so this is kind of the culmination of that thought process of what happens there and what you need to think about to make that work. Because so often I've done that. I was like, yeah, it makes sense now. But then you know, a week later it's, it breaks or something. Mrs. And this is almost like my personal guide for like myself to have a checklist and say if I go through these steps I can make sure that I have a fail safe data ingestion pipeline to move my data around.
Michael Helbling
Yeah, your data ingestion pipeline is always throwing off errors, whether you're looking at them or not.
Josh Cohearst
Exactly.
Michael Helbling
This is great. All right, well, this. Oh, go ahead Mo. I have a question and I want you to ask.
Mo Kiss
Just because it is something that I think a lot about and I, to be honest, I think about this like trade off for many areas of data work, including like machine learning models and all sorts of stuff. But I guess how do you know as an analytics engineer when you get the balance right between like the business logic requirements, the optimization to run faster, especially when compute keeps getting cheaper and cheaper, like you might not have as much drive to go back and optimize old code and like also the time to build and deploy, like is, is there a, like just a Spidey sense you have or how do you feel like you know when you've got it right?
Josh Cohearst
Yeah, that's, that's really hard always. I think the, the key point is to understand where business value lies. If, if you have an idea of whether something is going to add business value or not, that that's kind of your first starting point. That being said, I, I do like the idea which we apply quite a lot is to have like a dedicated percentage of your work for eliminating tech depth basically. So thinking about, you know, there's always going to be a little bit of stress. There's always going to be deadlines which you will need to meet, but you can account for the tech debt that you build up by saying, hey, I'm okay with doing this now, but then, you know, over the next two, three months I'm going to slowly work on like turning this into a more generalized approach so we can also use it in other places or I'm going to clean up these tables that we're not using anymore and at the same time create a process around that. So yeah, I do feel as a team it's, it's good to make a conscious decision about. You can't allocate 100% of your time to building out use cases. You, you have to account for some kind of percentage basically of cleanup time.
Mo Kiss
Yeah, I like that.
Michael Helbling
Yeah, that's a, that's a really good thing to bring up because I. I find a lot of organizations really struggle with the balance of that. That. And it's hard as an analytics engineer to. To get people to understand that work because they're sort of like, well, you're not creating any new reports or you're not creating any new insights or anything, so what are you doing? And it's like, well, I'm making it run. I'm making it keep running. Okay, so.
Mo Kiss
Well, so tip for young AES out there. I was helping one of our AES go through a promotion application, and they were kind of like, oh, I've got nothing to put on there. And I was like, are you mad? Like, you are responsible for these models that drive, like, tens of millions of dollars of, I don't know, marketing budget or revenue or whatever for the business that have not had, like, any issues come up or, like, they have been totally reliable or, you know, they're used to, like, feed this, I don't know, marketing program that's then retargeting users. Like, you are responsible for that being able to happen because of what you do as an aa. And they were like, oh, I never, like, really thought about my impact that way. And I'm like, yes, tens of millions of dollars right there. Write it down on your application.
Michael Helbling
Yeah. All right, well, this is good. This. We could talk about this for a while because there's so much to this, and it's crazy how much sort of this has blown up in the past few years, but this has been a really good start to the conversation. And thank you, Dumpke, so much for joining us to talk about it. It's been really good. All right, well, one thing we like to do is go around and share a last call, something that might be of interest to our audience. And so, Dumke, you're our guest. Do you have a last call you'd like to share?
Josh Cohearst
Thanks. Yeah, I was actually, last week, I was reading an article by Gailey Oros. If I pronounce his name correctly. It's called the Pragmatic Engineer on the Internets. And the article is actually of a revisit of something he's done before, which is called the Trimodal Nature of Tech Compensation. And what he does is he looks at, how are people in tech compensated? Like, what are their salaries? And why does there seem to be this disconnect between some companies paying a lot and some companies saying, like, where does this come from? Like, I'm not paying that. And so what he identifies or what his original theory Was there is local companies that have a local market maybe in their own language that they want to address. And then there's like the, the kind of super a players in that market that can pay a lot more and then you have these like global companies like Google and Netflix and Meta that bump up the the market and also like the hiring prices for other companies basically. And so now he's gone back and collect data since his first article and so he has this great analysis on how because he originally did it in Amsterdam and now he's looking at other places as well and sees that it holds in in all kinds of cities across the world.
Val Krul
Very cool, very cool.
Michael Helbling
All right, Val, what about you?
Val Krul
So no one's surprised. Another Medium article published on the UX Collective which is just. It's my favorite email to open every week. This one is talking about emerging UX patterns in generative AI experiences. And it's a long one but it's. There's so many awesome visuals in the way that they break things down and it's talking about the historical context of Beck, you know, command line interface and like where we are today and like different graphical user interfaces and the experiences we've become accustomed to. But some of the things that they break down is a lot of AI tools say that they're computer conversational but they're basically saying no you're not. It's not, this is not conversational yet. Right. But it's talking about some places things might go which I think is really interesting. And another one which I hadn't really thought about, but after I read it, I see it so many times is so many AI features that software tools are adding is really just combining a lot of the features that already existed. So it's doing things things faster but you, you lose control like some of the finite control of each of those individual features. But that's one of the ways that they're trying to assist to be kind of working alongside you to do things to do your job faster. But anyways, the breakdown of some of those systems and the comparison to legacy and, and predictions of the future, it was just a really well done piece. So.
Michael Helbling
Nice. All right. And Mo, what's your last call?
Mo Kiss
So I stumbled upon this article on FS blog about first principles through like our engineering handbook. I was like deep in something to do with our engineering values. It was a whole thing anyway. First principles is generally like a concept that people are pretty used to. It's about like reasoning and removing assumptions and conventions and that sort of thing. But that's not the bit that I actually found interesting about the article. The interesting bit about the article, it was talking about the coach and the play stealer. So basically a coach, it comes from like an anecdote in the article about like, not everyone's a. Who's a coach is really a coach. Some people are a coach or a place deal, place dealers. So the coach who is really a coach is the one that like creates new plays, right? That really deeply understands the game, understand, understands their team, how to like shuffle things around to get the outcome they want. Then there are the coaches that are place dealers that just be like, oh, that other team did that thing and it worked and I'm going to try it out. This is not to throw shade on either because I am definitely a place dealer, not a coach. But the bit that really blew me away was it talked then about like, when a play isn't working, the play stealer can't figure out why it's not working because they don't know why it was created, whereas the coach can. And I guess like, in my mind I'm actually just thinking about this in terms of like the team and the different people that you have on your team. And I guess like the, the takeaway is like, you need to have enough coaches on your team and not just place dealers. Otherwise like, everything kind of becomes derivative. And there's a lot of arguments in tech that that's kind of the thing where everyone's just kind of like moving between the different tech companies ultimately, like taking their playbook with them from their previous company. And it just kind of got me thinking and it also made me really appreciate the people on my team that are the coaches that come up with those really crazy, audacious plays and are like, we're gonna try this crazy new tech or this bonkers idea over here. And I'm like, yes, let's do it.
Val Krul
But I still don't understand why.
Mo Kiss
So anyway, it was just a really interesting read and it just made me self reflect a bit and reflect on the team and that was really nice. So, story. Yes.
Michael Helbling
All right, well, my last call is from Cedric Chin, who was on the show a couple months ago. We talked to him about sort of becoming data driven from first principles. That was the article that we talked about. But he's now finished that series, which was about the Amazon Weekly Business Review, and posted that article, which is an excellent read, as most of his stuff is. And so I highly encourage you if you've been following that thread at all to go read those articles and especially now the sort of the capstone article there over on commoncog.com I've got two more things. Yes, go ahead, Mo.
Mo Kiss
Sorry, I know, but I like him. Completely breaking the rules because I didn't even do the two at my time. Okay, firstly, I really do want to do a shout out to the book that Dunkey co authored, Fundamentals of Analytics Engineering and Introduction to Building and to End Analytics Solutions, because it really is a terrific read and we got to see a couple things that weren't in the book and I'm still like, oh, that should have been in there. That was so good. But yeah, thoroughly enjoyed it. And the other thing I wanted to quickly do a shout out about is Measure Camp. It is happening in October in Sydney, which is just a couple months away. It'll be Saturday, October 26th in Sydney if you are anz based.
Michael Helbling
Awesome. Yeah, a lot of Measure Camp going on. We'll be at 1 in September in Chicago, so. All right, well, this has been excellent, Dunkey. Thank you so much. Thanks for taking the time to come on the show. Really appreciate you sharing your experience and expertise with us today.
Josh Cohearst
Yeah, thanks. It's been a great pleasure and it's, it's been nice as a longtime listener. First time, guys.
Michael Helbling
Oh, you see how, see how the sausage is made, so to speak. So yeah, nothing to it. All right. And of course, no show would be complete without a huge shout out to Josh Cohearst, our producer, and we really appreciate him. And as you've been listening, you may have been thinking, oh, I'd like to learn more or I'd like to read more about this. We would love to hear from you. Please reach out to us. We can, you can reach us on our LinkedIn page or via email or on the measureslack chat group, which you mentioned, Dumpkey, which we're also very active on. So love to hear from you. And that's a great community for sharing things together. All right, well, let's wrap this up. But I know that as you're going out there and you're grappling with things that are happening in the data and analytics space and you're figuring out what to do, I know I speak for both of my co hosts. Whether you're a data engineer, a data scientist, or an analytics engineer or an analyst, remember, keep analyzing.
Josh Cohearst
Thanks for listening. Let's keep the conversation going with your comments, suggestions and questions on Twitter at analyticshour, on the web at analyticshour IO, our LinkedIn group and the measuredchat Slack group music for the podcast by joining Josh Crowhurst.
Val Krul
So smart guys wanted to fit in so they made up a term called analytics. Analytics don't work. I love Venn diagrams. It's just something about those three circles and the analysis about where there is the intersection. Right. How is everyone looking and sounding to everyone else? Everyone seems pretty good to me.
Michael Helbling
Yeah, it's usually two him that's the problem. I'm just kidding.
Val Krul
It is. It usually. I mean it is.
Michael Helbling
Well I know, but it's true.
Josh Cohearst
Not just in terms of like audio. I, I mean I love it. Yeah, it's so the funny thing is I've. So I've been in web. I started in web analytics and I've been in that kind of space for a lot. Long time and so. But I started out mostly as a freelancer because I came from like a little bit of a different direction. Did some like built some websites before and, and then I, I get in touch with this marketing agency to get a job there and like, you know guys, I, I don't really know anything about this. I just follow this, this guy who has a blog. Simo Ahaba. Yeah, heard of him. And they're all like, well that's what we, we do as well. And then so I kind of grow into that space. I'm like, yeah, this is actually a pretty nice community of people. And then again and then I got on to measure Slack just get to.
Mo Kiss
Oh my God, that's crazy.
Michael Helbling
Responsible for so many jobs in this industry. I have also hired someone because they mentioned CMOS blog in the interview process. So. No, it's a good.
Val Krul
Like I was going to say.
Michael Helbling
No, no, no, I was. Well, cuz sometimes I ask the question like how do you stay up to date on like what's going on in the industry and trends?
Val Krul
Yeah.
Michael Helbling
And they're like well there's this blog, this guy named I'm not sure. Simo. Simo. I was like, say no more. I know you know your stuff. If you're reading SEO, you're going to be all right.
Josh Cohearst
Yeah, yeah, exactly.
Mo Kiss
Rock Flag and Supermarket Analogies.
The Analytics Power Hour Episode #251: The Continued Rise of the Analytics Engineer with Josh Cohearst Release Date: August 6, 2024
In episode #251 of The Analytics Power Hour, hosts Michael Helbling, Mo Kiss, and Val Krul dive deep into the burgeoning role of the Analytics Engineer within the data and analytics ecosystem. Joining them is special guest Josh Cohearst, a seasoned Analytics Engineer at Xebia and co-author of the influential book Fundamentals of Analytics Engineering. The conversation explores the definition, evolution, and significance of the Analytics Engineer role, distinguishing it from traditional data engineering positions, and examining its impact on organizational structures and data practices.
Josh Cohearst opens the discussion by providing a comprehensive definition of Analytics Engineering:
“The analytics engineer is kind of a bridge between business and data engineering.”
[03:53]
He draws parallels to the evolution of web development, likening the Analytics Engineer to the full stack developer who seamlessly integrates both front-end and back-end tasks. In the analytics domain, this role has emerged to address the increasing demand for data-driven decision-making by blending technical expertise with business acumen.
Val Krul seeks clarity on differentiating Analytics Engineers from traditional Data Engineers:
“A Data Engineer is usually a person that will do like a little bit more low-level stuff... setting up ingestion... validating your data that's coming in. Whereas Analytics Engineers work more in SQL to map raw data to business processes.”
[06:31]
Josh elaborates, emphasizing that while Data Engineers focus on data ingestion, infrastructure, and ensuring the data pipeline is robust, Analytics Engineers are tasked with transforming and modeling this data to make it semantically meaningful for analysts and business users. This involves intricate data modeling techniques and a deep understanding of business logic.
The conversation shifts to how organizations are structuring their data teams:
Josh Cohearst observes a mix of centralized and decentralized models:
“I see a lot, I see a mix really... centralized or with the finance organization. So it's really a mix.”
[25:33]
He notes that Analytics Engineers can be found across various departments—finance, marketing, web analytics—each bringing unique perspectives and requirements. This decentralization often leads to silos, where different analytics teams may not communicate effectively, underscoring the need for a more unified analytics strategy.
Michael Helbling touches upon the dynamic landscape of data engineering:
“A lot of data engineers are not as familiar with event-driven data... you have to think about how to make a meaningful aggregation of that over a certain period.”
[27:43]
Josh discusses the convergence of traditional data engineering with modern analytics needs, highlighting how Analytics Engineers are pivotal in bridging gaps between streaming data and static data sources, ensuring comprehensive and cohesive data models.
A significant portion of the episode delves into the shift from Extract, Transform, Load (ETL) to Extract, Load, Transform (ELT) processes:
Josh Cohearst explains the technological advancements enabling this transition:
“We went from row-based storage systems to column-based storage systems... and now with powerful single-server databases like DuckDB, things are getting simpler.”
[29:24]
This shift allows organizations to load all data first and then perform transformations as needed, offering greater flexibility and scalability. The hosts discuss how this impacts data modeling and the workflows of Analytics Engineers, facilitating more agile and responsive data practices.
The topic of data privacy and governance surfaces as a critical responsibility of Analytics Engineers:
“Analytics Engineers can identify where privacy issues might arise... applying masking strategies to sensitive fields.”
[36:08]
Josh emphasizes that while Analytics Engineers may not manage privacy directly, they play a crucial role in implementing technical safeguards and collaborating with data owners to ensure compliance with data protection standards. This involves setting up processes to mask or anonymize sensitive information, thereby safeguarding data integrity and privacy.
Mo Kiss raises concerns about career progression within the relatively new field of Analytics Engineering:
“It's been really difficult to help people with their professional development in a space that is quite new.”
[41:06]
Josh offers insights into fostering professional growth, highlighting the importance of continuous learning and adapting to evolving tools and practices. He suggests leveraging resources like their co-authored book and participating in community meetups to build a robust skill set. Additionally, he underscores the value of a well-rounded portfolio that blends technical prowess with strong communication and consulting abilities.
The episode touches on essential tools and best practices that define the Analytics Engineering role:
Josh stresses that while tools are vital, the underlying concepts and problem-solving approaches are equally important for successful Analytics Engineering.
As the episode wraps up, the hosts and Josh reflect on the rapid growth and importance of Analytics Engineering in today's data-driven landscape. They encourage listeners to embrace the evolving roles within analytics, emphasizing the need for continuous adaptation and collaboration across departments.
Josh concludes with a call to action for aspiring Analytics Engineers to define their roles clearly, build a strong foundation in both technical and business skills, and actively engage with the community to stay abreast of industry trends.
Notable Quotes:
“The analytics engineer is kind of a bridge between business and data engineering.”
Josh Cohearst
[03:53]
“Analytics Engineers can identify where privacy issues might arise... applying masking strategies to sensitive fields.”
Josh Cohearst
[36:08]
“I think Analytics Engineering is trying to take that a little bit away from all the different analysts and put it into one central place.”
Josh Cohearst
[08:50]
At the end of the episode, the hosts share recommended readings and resources for further exploration:
Upcoming Events:
Listeners are encouraged to join these events to engage with the analytics community, share insights, and continue their professional development.
Stay Connected: For more insights and discussions, connect with The Analytics Power Hour community via LinkedIn, email, or the MeasureSlack chat group. Share your thoughts, questions, and experiences to keep the conversation thriving.
Keep analyzing and stay ahead in the ever-evolving world of analytics!