
While we don’t often call it out explicitly, the driving force behind much of what and how much data we collect is driven by a "just in case" mentality: we don't know exactly HOW that next piece of data will be put to use, but we better collect it...
Loading summary
Matt Gershoff
Foreign.
Tim Wilson
Welcome to the Analytics Power Hour. Analytics topics covered conversationally and sometimes with explicit language.
Michael Helveling
Hi, everyone. Welcome. It's the Analytics Power Hour and this is episode 253. You know, almost every time I've attended the Superweek conference in Hungary over the past seven years, I a major theme is how much our industry is changing. And lately, especially with privacy regulations, new laws that impact our industry. And the other thing I usually take from that conference are new ideas about where the industry's heading and how we adapt to these changes. And I think this conversation on this show will be similar, I think in a lot of ways. And with new constraints on how, when, and where we collect and store data, it's high time to embrace new paradigms where we can find new ways of thinking about data collection and usage in a privacy first world. So let me introduce my co hosts, Julie Hoyer. Welcome.
Julie Hoyer
Hey there.
Michael Helveling
It's awesome. And Tim Wilson, who has been with me many times at Superweek. Welcome.
Tim Wilson
Thought we weren't going to talk about that. It's on what happens at Superweek, stays.
Michael Helveling
At Superweek, stays at Superweek. Well, I, you know, yeah, we won't talk about a lot about it. Okay. All right. And I'm Michael Helveling. And our guest today needs no introduction, but let me do a little bit. He is the CEO of Conductrix, an amazing thinker and speaker, and is our guest again for the third time. Welcome to the show. Matt Gershoff.
Matt Gershoff
Thanks for having me. A real honor.
Michael Helveling
It's awesome. I'm thankful to have you too. And actually, as I was thinking about, I was like, well, you've been there at most of these super weeks as well, and your company Conductrix, sponsors that event. And I remember that very fond. I mean, seeing you there all those years is also really fun. But one thing, Matt, you know, in this topic, specifically you consistently in our industry, for me anyway, are usually one of the people who sort of about five years ahead of what a lot of people are talking about. And so I think it's really interesting that one of the things you're really talking about now around sort of this world of new privacy laws and things like that, is about adopting a mindset of just in time or just enough data or sort of privacy first mindset around data. And so kind of maybe as a starting point, what got that going for you and what kind of spurred that as sort of a major area of thinking and writing for you over the last couple of years?
Matt Gershoff
Sure. Well, first, thanks for having me and thanks for having Me, Tim and Julie. This is going to be fun. Looking forward to it. Well, actually, you know, just to step back a little bit. The work that we've been looking into and working on within Conductrix around privacy engineering and data minimization is really less about privacy per se, and really more about things thinking about why we're doing analytics and experimentation in the first place. And so I think for us we have a slightly different view of the value of experimentation. And just so that the listener understands, where I'm coming from is that Conductrix is in part an experimentation platform where you might do ab testing and multi armed bandits and that type of thing where you're, you're trying to learn the basically the marginal efficacy of different possible treatments. And for us, we really feel like the value of experimentation is that it provides a principled procedure for organizations to make decisions intentionally, to make them explicitly, and to consider the trade offs between competing alternatives. And ultimately the reason for doing this is to act as advocates, sort of the front line for the customer. And so we have a much more, I guess, hospitality or omotenashi approach to why experimentation, why one really should be doing experimentation. And I think that's true of analytics more generally. It's like, really why are we doing it? And I think one of the issues that I've seen in the almost 25, 30 years that I've been in the analytics space is that sometimes analytics tends to become kind of lose that focus and we tend to have programs that become almost ritualized. So we sometimes start doing behaviors just to, just to do them and we kind of lose the focus of really why and what the ultimate objective is. And so for us, part of the reason why privacy engineering and data minimization is something that we've gravitated towards was one part of that is really about respect and being customer focused. But also two, is that it, it really forces one to think intentionally. And we ask the question what is sort of the marginal value of the next bit of data? Like why should we collect this next piece of data or the added data? And to really have some sort of editorial and expertise about why we might be getting more information about the user when we might not really need it in the first place. And so this idea of intentionality is really what underpins both experimentation for us as well as why we were interested in moving towards having a more data minimization approach to the experimentation platform.
Tim Wilson
So you said sort of the ritualized behavior which you know, you came up with is that as I recall, you sort of Came up with, with two. And then you added a third. You said, oh, there's, there are these mindsets of data. I want to get data just in case, you know, just in case I need it. And that I think falls under that kind of ritualized behavior. Gather, gather all the data, not, not considering the incremental value of it. And you contrasted that with just in time. And then you added like just enough I think a little bit later. But is that, does that fit that we're kind of the making a broad generalization in analytics. And I think even in experimentation there's a tendency to say that next bit of data, the cost to collect it is near zero. So let me collect it just in case down the road. And that just has kind of ballooned out that you add on a million additional data points and now you're just in the habit of just collecting everything and sort of lost the idea that you're actually trying to figure out what you're doing with it.
Matt Gershoff
Yeah, that's a good question. That's a good comment. Really what it is is that if you think about it, the GDPR and data privacy, most of that conversation has been around compliance, which, and what you can't do. And a lot of that is really sort of a procedural thinking. Like do you, do you follow certain procedures for risk mitigation? And really what I think the privacy legislation is really about is to encourage privacy being embedded in technology, being embedded in processes. By default, it's not that you shouldn't collect data if it's required. It's not that if you have a task and you need the data in order to achieve the task. No one's saying that one shouldn't collect that. It's really about asking for a particular task whether or not the data is pertinent. And it's about being sort of respectful to users and not collecting more than that's needed. Now that privacy by default is in contrast to what I think a lot of the thinking had been, or it currently is in sort of analytics and data science, which is really a data maximalist approach, which is collect everything by default. And again, as you say, the sort of the marginal cost of the next level of granularity. Right. So we can think of more data as being finer and finer levels of granularity for any particular data element or it could be additional data elements and it can also be additional linkage. And so that's sort of that, that whole 360 and so that every element or event can be traced back or associated with an individual. So you kind of have those three dimensions to expansion of data. And so I was really trying to point out is that a lot of that is a lot of that data collection is somewhat mindless. It's just that just in case and underpinning it isn't really an explicit objective. Right. We're not, we don't have a particular task and we're collecting data for this particular purpose, like in an experiment I was talking about just in time is because we have the task. I need to know whether the marginal efficacy of one treatment over another, one experience over another. And so then I need to go out and collect data for that task versus just in case. It's really, I don't really know what the question is that I'm going to ask, but I'm going to collect it anyway. Now why am I going to collect it? Well, really there's sort of a shadow objective which is one based upon magical thinking, which is all of the value is in that next bit. It's almost like the gambler who's at the table and they're losing and they just have to believe that the next hand is where the big giant payoff is. That often gets rationalized in data science. And venture land is sort of fat tails, right? And so there's, there's some sort of huge, there's huge payoffs out there lurking in the shadows and you just need to have reach some sort of threshold of, you know, critical mass in order to achieve it. And I'm not saying that that doesn't exist, but it's unlikely that it exists in the probabilities that, that it, that people think. So that's one side of things which is this magical thinking that all the value is in the data that I haven't collected. And then secondly, it's about minimizing regret. So it's like, well, I don't want to not have collected it in case I need it in the future. My boss asked for it and so we collect it. And that's sort of collection by default. And you know, that is not consistent with the privacy by default. And that's really the law. And so that's not to say though, that discovery isn't something that's also important. So it's not about being paternalistic and saying don't collect data or there's a certain way that you have to do it. Really all we're talking about is just being thoughtful about it and being intentional. So it's like, hey, I think perhaps that if we had the company may think, or you folks may think that, hey, from this particular company or our client, if they had X data, then they could solve tasks A, B, C, D, X and Z, whatever. And that's. That seems totally reasonable to me. Then you have a reason to go collect that data and then check. Okay, well, does it look like this data is informing these decisions or helping us make decisions? But that's entirely different than just collect everything. And I think that just in case, collect everything. One, it being mindless, right. There is no objective to having it other than to have it really opens organizations open, you know, up to grift, you know, the sales pitch, which is, you know, can you afford not to collect it? You know, a lot of that stuff. And that's prevalent in our industry. And so I really think it's really about being mindful and it's really about this idea that the real value is not in the data or in any statistical method or any technology. It's really in the editorial and the expertise and really the taste. It's like, does the company have taste to be thinking about what is going to be useful for their customers and to be cognizant of what the customer's needs are and have empathy for them and to be using information about them in a way that's respectful? That's, that's really all that underpins all of this.
Michael Helveling
It's time to step away from the show for a quick word about Piwick Pro. Tim, tell us about it.
Tim Wilson
Well, Piwick Pro has really exploded in popularity and keeps adding new functionality.
Michael Helveling
They sure have. They've got an easy to use interface, a full set of features with capabilities like custom reports, enhanced e commerce tracking and a customer data platform.
Tim Wilson
We love running Piwick Pro's free plan on the podcast website, but they also have a paid plan that adds scale and some additional features.
Michael Helveling
Yeah, head over to Piwick Pro and check them out for yourself. You can get started with their free plan. That's Piwick Pro. And now let's get back to the show.
Julie Hoyer
Well, it's funny too that working with a lot of clients that do the just in case collection because again, it is widespread. It's the norm across the industry, I would say. I have run into so many situations where we go and they ask a very important business question. We start with like that question first and then they say, and we have all this data that we can pull in and we have so much, we should be able to answer this, no problem. And time and time again I start getting into like the actual requirements of what the data needs to be able to do to answer this great question. And then we find out that even though just in case they've been collecting all of it, it's not in the right structure or things can't be, you know, join the right way, whatever it is between the tool and the actual data structure itself, we can't answer the question they care about. And so it would still be then defining in that moment going forward, like, what do we actually need to be collecting for you to answer this business question? And it's funny because one of the examples I had was actually working in Adobe analytics or actually Adobe cja, and we were bringing in a data set from, let's say, like Salesforce. And I started to have this conversation with my stakeholders saying, you're asking great questions, but you're asking questions that we're used to being able to ask the data that would come in through Adobe that we were used to for years with Adobe analytics. And now you have this data coming in from Salesforce, which was structured and designed to answer different types of questions. And so they don't map perfectly together. And so now we're starting to talk to them about how could we rework this and actually bring in the data in a way to answer the questions you care about and that your stakeholders coming to you actually need.
Matt Gershoff
Yeah, the main thing is to be intentional now, but to be fair, like some of those companies that you've, you've mentioned in the past, they were sort of masters of this, collect everything and magical stuff is going to happen. And then all of the use cases wound up being, you know, error handling because the site was broken. And so, you know, that's not really a community that is been totally innocent of maybe overselling, you know, collecting data. I mean, data is, data is not information. And I think it's important to think about, you know, kind of like the, the entropy of, of, of what you've collected, like what's, it's like how compressible is the data. And so a lot of times you have data, but it's not information. It doesn't help you reduce uncertainty in a particular question that you're asking. And that's what information does. And just because there's bits being collected does not mean there's more information.
Tim Wilson
Well, and it feels like my concern is that it's already a problem. It already is the, I mean, you said it was kind of the laziness of avoiding thinking of saying we'll just collect everything. The number of Times that I've got experiences where somebody said, oh, the data collection requirements are pretty straightforward. Just collect everything, you know, and it's like, well, no, that's, that's lazy and simple for you to articulate. It's actually showing that you're not thinking through what you're going to do. I feel like we've been, we've been in that mode with lots of forces sort of pushing that idea. That idea of I want to have the option to look at this data and hopefully it's structured well with the, a chunk of the world of AI and the, the next generation of the technology vendors jumping on that train are kind of spinning the. Well to do AI like the more data the better. And you know, in their article, you know, we're running out of data already to train the models and I'm, I'm afraid that's pouring kerosene on a, on a raging, poorly functioning fire already that now people get to wave their hands and say I'm doing this for the future of AI. It's just like the next level of a lack of intentionality of surely if I get even more data then the AI will be able to kind of run through it. But it's, it's really just amplifying. I think the same problem that you articulated when very clear and concise questions, you know, may mean that you need to collect a very small amount of data for the next month as opposed to you've got boatloads of data you've captured for the last five years that actually aren't that helpful. But you're going to force yourself to go wade through that trying to do something that if instead you had intentionality and said I'll just go forward like, like having that historical data, it actually makes it harder to have the discussion of what's the best data to collect just enough of just in time to answer that question. Oh, that's, that's net new data. And it's like, well, what net new data? What are you talking about? We have, you know, this ocean of data. What can you do with that? Well, what I can do with that is a much more complicated, messier actually less good at answering the question. But yes, we're checking off the box that you can point to your, your just in case mindset is having help me answer a question. It actually wasn't the best way to answer the question in many cases.
Julie Hoyer
Yeah. And I get so many times like what can. Just do what you can do with the big messy historical data that we just in case capture when I tell them like oh well to really answer this, you know. Yeah, maybe it should be different data looking forward in a test and they're like eh, yeah, well we don't want to do that. So what's the best you can give us from the other stuff?
Matt Gershoff
Yeah, and just, just to be fair, I didn't use the word lazy. I just think maybe just unaware. Yeah, I mean I just think it's, I think the value is in being aware and being explicit. That's what I think data teams and companies should be doing. And I think that's where the success is. And it's not in doing analytics, it's analytics in the service of having a well thought out understanding and model of, of the customer and the environment that you're in. But this again this isn't to be paternalistic and saying I don't know, it's not for me to say what companies in particular context should be doing or shouldn't be doing. I just know for us when we re architected the Software Back in 2015, we were aware of GDPR and we read up on privacy by design, which are principles I think came in mid-90s by Dr. Ann Kavokian I believe. And there are seven main principles and the GDPR and you know, other privacy frameworks have incorporated those principles into their legal frameworks. And you know, one of them is principle two which is privacy by default. And so and I think principle three or four might actually be by embedding. And this idea is that the software and systems should have these should be privacy by default by design and it shouldn't be like a bolt on. And so customers should be able to use the services by default in a privacy preserving way. And it's really only in cases you need to like move up from the default as opposed to the current approach which is collect everything. And moving down from that, it's really inverted and it really should be. You should be collecting as little as possible to solve the task. And we just realized that actually experimentation at least, and I'm not saying everything but at least in experimentation, many if not most and actually most of the tasks in A B testing and experimentation can be done following a data minimization principle, which means we really do not need to link all the information together, we do not need to collect IDs and we can store data in what are known as equivalence classes. So you can kind of think of that as like a pivot table. And so the data is stored at basically an aggregate level. But even Though the data is stored in an aggregate way, which allows us to use ideas from privacy approaches such as K anonymization. We can talk about that if that's of interest. We can kind of use ideas of K anonymity to help the client a be able to audit or what data has actually been collected in a much more efficient way. So it's very easy to know what you have and whether or not it's in breach of any privacy guidelines you might have. But also it means that we can do the analysis in a much more computationally efficient way. And so there's a lot of nice benefits from following or embedding privacy by design principles into your systems and procedures, which are beyond just having less data about the individual. The main thing is that it encourages this idea of intentionality, just being aware of what you're collecting and why. But that doesn't mean it's appropriate in all cases. That's not, that's not what we're saying. That's not what I'm saying here. It's just more of an option.
Tim Wilson
Well, and, and Matt, because I've now, like, read and seen you talk about this like, it kind of blew my mind a little bit when it sort of clicked. And I think it was an indication of how sort of stuck in the standard way of doing things was that when it comes, if we just talk simple AB testing on a website and we know that we need to know, let's just go with A and B, that we've got, that you're treated with A, you poke around on the website some more. You convert or you don't convert. Store a row. Your B, you poke around on the website, you convert, maybe you don't convert. And the amount. And it seemed like, well, obviously you have to have every one of those rows. And then when you're done, you just kind of, you know, you pivot it and you compare the conversion rates and yeah, you got to do some other little T test kind of, kind of math. And what kind of blew my mind is you were like, well, wait a minute, what if. Instead you just incremented counters? Because that step that I just glossed over of saying I've taken 10,000 rows of individual users and rolled them up so that I could do the actual calculations that are done, you know, behind the scenes. You were like, well, wait a minute. If what you need is a count, you can just increment how many people got A, how many got B. If you need the, the, you know, the, the sum of how many Converted, you don't have to have all those rows. You can just increment a counter and say you're a. I need to track you in the session long enough to increment the counter. I don't need to store a whole row. I just need to increment a counter. And then where I really counted was like, oh, and then if you need sum of squares, I can square each value and then do the sum. Because, you know, like so, so like you're literally getting from. You have what was 10,000 rows and it winds up being two rows that you're just incrementing. And that was kind of your, your point saying I can do. I can give you all the results that you get from a standard A B testing platform in a standard basic A B test. And that's just one scenario. But I didn't gather even IDs. I just had to have in a very limited temporal way until I could log which class they went in and what the result was. And I can just keep incrementing that. So one did I state that fair like that if the listeners are like, what is he talking about?
Matt Gershoff
Yeah, I don't want to. Yeah. So yeah, I don't, I don't want to get in too much like, because this is, like, this is gonna, you know, I don't want to lose the listener here in too much minutia here. But just, but yes, you're right. And so really, you know, the realization was. And what some of the listeners I'm sure are aware of, but some may not be.
Tim Wilson
To be fair, you headed down the K anonymization path before I tried to do my summary. So I don't want to be like, oh, Tim, oh Tim, you're getting too detailed.
Matt Gershoff
No, we're getting yes. No. And really, let's blame Julie because we said beforehand that she was supposed to keep us from. But. But just at a high level. It turns out that actually what underpins most of the analysis or an approach to mostly analysis of the tasks that experimentation folks in experimentation need to do is really is regression. It's like least squares. I'm not going to go into like, we don't have to go into like how it's done and all that stuff. But it turns out that one is able to do a regression analysis, do various regression analyses on data that has been stored in equivalence classes in a certain way. So the main takeaway is that we can store data in an aggregate way such that we can do the same analysis as if we had the data or most of the same types of analysis as if we had the data at the individual level. And so what, what, what are the types of tasks that we can do? Well, as you said, we can do T tests, which is sort of like the basic frequentist approach for doing an experiment when we're kind of trying to evaluate the treatment effect and try to account for the sampling error. But also things like multivariate analysis and ANOVA analysis of variance, which you might do for multivariate tests, you might be doing something like interaction checks. So maybe you have some sort of, like Conductrix has some sort of alerting system where we're checking between different A B tests whether one a B test might be interfering with another underneath the hood. That's really for this, the folks who know some stats in your listener base, it's really just doing like a nested partial F test between two regression models, a full model and a regression news model. All of those things can be done.
Tim Wilson
And even I was going to say that, but I was trying to keep it.
Matt Gershoff
I was trying to keep. It's more than T test. And even, you know, there's like a lot of buzz and I think exaggeration around things like Cupid, which is really regression adjustment in the experimentation space. Even that can be done on aggregate data. Now the main point about it being aggregated is really about data minimization, which is one reducing the cardinality of any data field, which is the number of unique elements that we might want to store. So instead of storing, you know, the sales data, the pre sales data of the user from some arbitrary precision of sense, maybe it makes sense to have it in some sort of 10 bins that represent sort of the average value of each bin. So from 0 to 10, where the average value in the 10 bin is like $1,000 or something. So the main idea is to reduce sort of the fidelity sort of down sample some of the data that you're collecting so that you have less unique elements within each data field and to collect fewer data elements maybe to decide when you want to co collect elements. So one can collect the data such that let's say there's 10 segments, types of segment data that we might want to collect within the experiment. We can store those as 10 separate tables so that you can do 10 separate analyses or you can have them stored, you can collect them co collect them. Maybe we want to have these two or three collected at the same time or maybe up to 10 as you add, you co collect data, you increase the joint cardinality, the Number of unique combinations. And that's the thing that you kind of want to manage. It's like how many unique combinations of segment information do we want to collect? And the measure that we might want to use is the number of users that kind of fall within each one of those groups, each of those combinations. And maybe we want to have at least 10 users that fall into each one of those combinations, such that we're never really collecting data on any individual user. We're collecting data on collections of users who look exactly the same. And so that's really that idea of K and on is how many other people look exactly the same in the data set. And so you might want to have some sort of lower bound that, say five or 10. And that's a good way to measure. It doesn't provide privacy guarantees, but at least it's a good measure to be aware of how specific or the resolution of the data you're collecting about at each individual.
Michael Helveling
I like what you're saying. I think one of the challenges that I'm thinking of right now, and maybe it's just dumb, but I feel like a lot of organizations lack the underlying knowledge to start making those groupings or, or buckets in the first place. And then sort of my question is sort of then how do they get that level of information or knowledge to be able to take that next step?
Tim Wilson
Or is that they feel emotionally like the making the buckets. They're like, but buckets are less precise. I need to be more precise. And that's the.
Michael Helveling
That's going back to the first thing, which is sort of like our nature is to just try to glom onto every piece of information possible. But like, there's just people with a lack of knowledge. So let's say somebody said, hey, I'm going to fight my instincts to try to do this. Privacy by design. And now what I need to do is I need to group users like the way you just described and do K anonymization. How do I know how to set those up so that they're gonna be realistic?
Matt Gershoff
Well, how do you know the data you collect? I mean, first of all, you're making the decision at a certain level of granularity anyway, like that's implicitly being done. Secondly, again, I just want to step back. This isn't the main. The main takeaway here really is about just at least being thoughtful about it. It may be that you don't change your behaviors at all, maybe for total fine. And in the whatever context someone is working in it may be appropriate. One use case is like let's say you're in a financial organization or healthcare where there is, you're, you're a regulated industry or you know, you want to have some sort of, you have to collect the data anyway. Let's say that is, that is, that is private data but you want to do analysis. There's this idea of sort of global and local privacy that really comes from differential privacy. But global privacy is where you have a trusted curator, right? And so you have the data. A good example of this would be the US Government and the census. So the data that's collected by the census is extremely private information, information about citizens. And when that data is released, it needs to be released in such a way that private information about any individual is not leaked. And so in that case the trusted curator is the, is the Census Bureau. But they have a mandate to release information for the public. And so you could be in a situation where you're an organization that has this information, you want to do analysis. So you might want to release data to your analyst team of the private data that has been privatized in some way. And so one would be to use data minimization and this sort of idea of K and on. And but there's other approaches. There's differential privacy and so that's something I know, I just spoke at the Pepper conference, which is Privacy Engineering and Respect conference. And like there's meta is there and Google is there and whatnot. And they often have situations where they collect data and they want to do you build tools or analytics on it, but they release internally data that has either been subject to differential privacy or various data minimization principles.
Tim Wilson
So that's one define, can you how, how easy is it to give a high level explanation of what different differential privacy.
Matt Gershoff
Well, I'm not an expert, I'm not an expert on it and not it's not super easy. But at the high level, as far as I understand it, it's essentially, I believe it's the one approach that actually provides privacy guarantees. So you actually have a particular privacy guarantee around it. And the main idea is that you inject a certain known amount of noise into the data. So the data is potential perturbed by a certain quantity of noise which is defined by a, what's known as a privacy budget. So basically you inject noise, it's usually either Laplacian noise or Gaussian noise into the data set such that when a query comes back it's a noisy result. And so it essentially has certain guarantees that any individual. You have a difficult time differentiating between two data sets. One that has an individual in it, a particular individual in it, and an adjacent data set that's the same, except it does not have that individual in it. And whether or not the query results are consistent with or without that individual. And so that is probably terribly unclear to the listener. But the main idea is that you inject noise. You inject noise into the data set. It's actually quite complicated and at first it looks like amazing. We took a look at it and we're thinking about doing it. And I believe the census now is using differential privacy. And it is useful in a situation where you need to release a lump of data like you need to release like one particular query, like the census, and they release the results and they've applied a differential privacy mechanism to it. It gets a lot more complicated when there's a lot of ongoing queries on the data because there's a privacy budget and there's this idea of composition, simple composition, advanced composition. It's kind of, it's somewhat related actually. It's deeply related to Pearson Nyman hypothesis testing, actually. And so these ideas about inflation of type 1 error rates and all that stuff is, is not completely dissimilar to the idea of consuming privacy budget and whatnot. And so it's not clear to me how one would actually manage it in an organization. And two, whether or not organizations would accept noisy data. People kind of freak out about that. But there is this trade off, of course between privacy and utility. But again, the interesting bit, I think the takeaway is one, privacy by default is the law, at least in Europe and to various degrees in different states. And what I found, you know, can be often frustrating is that most of the privacy conversation is around again, procedure and compliance. It's like, you can't do this. And it's like not productive. It's like, well what like help, help. Give me some tools to think about what we actually can do. Like we care if you care about outcomes. And what is, I think of interest for the listener might be, is to look into privacy engineering, which is really more a community and approaches about design based thinking to build systems that have properties, privacy properties in them. And that gives a way forward to actually build stuff and to build stuff that has these privacy properties as part of them, as opposed to what I feel a lot of the privacy conversation is about not doing stuff and people trying to like block you from doing anything. Very sort of bureaucratic in its approach, very legalistic. And this is a much more engineering approach. And really this whole conversation that we're having is really just about providing an example of a company that has, has applied these privacy engineering principles to their software. Now it's really going to be up to everybody else to decide when and where it's appropriate for them. But it is a way to actually build stuff as opposed to just not being able to do anything.
Tim Wilson
So it's interesting. I never read the seven principle, the Privacy by design seven principles until prepping for this episode because you bring up principle number two a lot. But principle number seven is the respect for user privacy and keeping the interest of the individual uppermost. And I feel like that that may be a cudgel that I start, you know, swinging around like I watching on LinkedIn as people are, you know, posting these diatribes. If you're not taking your first party data and pumping it into this other system and giving it to that, what are you doing? This is, this is insane. And it's. You quickly watch the comment thread. Some people say yeah, use my tool to do that. You have other people arguing about the logistical complexity of doing it. And then there's like a tiny little thread that is saying is, is that in the individual's best interest? Like everything about that. Sometimes it is. I mean I think you were using example earlier that if, if you need data from somebody in order to provide them something that they want, it is in their interest to provide it. But, but that feels like another whole tranche of the Martech industrial complex that, I mean there is nothing about that Principle number seven of, of keeping the interests of the individual uppermost, which I think is another piece of that, that maybe just a little. Well, another hobby horse. I can mountain gallop around.
Matt Gershoff
Yeah, well, seven and, and two, I bring up mostly because it's, it's, it's privacy as the default. That's key. I think that's the key bit is that it should be the default. And I definitely think, you know, one should not be getting their guidance from the marketing tech industrial complex. Like that's a problem because there's perverse incentives there. Like that industry is incentivized to push collect everything and magical thinking like, you know, people will sell a bot, a magic box, if people want to buy a magic box. And I think that's the antithesis, I think of being thoughtful and mindful about why you're doing something. Unless you know, the optics of buying a magic box have value, that's okay. I don't. It's not for me to judge like, like what is why you're doing something. It's just one should have thought about why they're doing something.
Julie Hoyer
But it feels like this way of thinking will end up being more productive for people long term though, because we are, to your point, going to continue to run into restrictions privacy wise. Right. And I think people that are still holding onto this idea that I have all this historical data and if I can just look backwards and answer any question and understand each individual and watch their entire path through my website, I'll be able to answer any question I need to make any decision about the business. But it feels like if someone could let go of some of that like baggage of the way the industry and the story's always been told to us that you can start by saying like, what is the best question to answer right now for the business to make a decision moving forward? And what's a way to actually ask that and answer it? Looking forward again by doing experimentation rather than trying to do a very complex historical analysis. And then you can go about actually designing and engineering the data again, moving forward. And I run into this so much with my clients where I do feel like you just get stuck in the cycle of looking backwards. That it is refreshing to hear that this is tactical steps in way of like selling that forward thinking mindset instead and seeing that it could be really freeing for probably a lot of companies.
Tim Wilson
I don't think it has to be experiments. Like, I mean, I think you could even have stuff that if you're not tracking something and they're like, well, what's going on here? It's like, well, we could just keep a counter, you know, we're, we're at our physical store and somebody saying we want to know how many people are looking at how many people look at produce versus, you know, toilet paper. And one option would say, well, we gotta have cameras mounted. So we've tracked all of that so we can answer it just in case, like if you ask that question or if all of a sudden that becomes a very important question to answer, say, cool, we're gonna take all that money we didn't invest in this super complicated tracking system that had to store everything and we're just gonna, you know, send some resources. You know, it's gonna take me two weeks to answer the question. But very, very precisely because I know exactly what you're, what you're looking at. And it may not be even an experiment. I mean that does seem like a, it is such a radical shift, like A change in. I'm not. I'm not optimistic that we're going to be able to affect that sort of a shift because there are a lot of pressures that don't want it. And it's to. Matt, I think your point, like, it's so easy to get sucked into the. The compliance mindset for privacy. Well, what do I. My default is everything. What do I have to turn off or what layers do I have to put on so that I'm backsliding at a slower rate from what I'm used to doing as opposed to. Or. And you hit it quickly like this. The simplicity of the computation. Well, there's a simplicity of. If you have no data and you have a really clear question and you say, what's the minimal data I need to collect to answer that question? That in many cases becomes a lot simpler for a lot of the questions.
Matt Gershoff
Yeah.
Tim Wilson
Now, the problem is you're. Yeah. You're leaving a few questions that you could have answered otherwise.
Matt Gershoff
And this isn't. And just to be.
Julie Hoyer
But I like that you're not tied to the old way they were collecting it. Like, so many times you ask a good question and the data they have in that topic is not like, in a way you can even use it. So I love, though, that this frees you up to say, how exactly do I need the data to answer the question instead of. Again, you're married to, like, the baggage of what's already been done. And they're like, well, I spent a lot of time and money and effort, so you gotta figure out how to use it.
Matt Gershoff
That's a. That's a great point. And also, just to be clear, this isn't like Gershoff's point. Like, this isn't like me. This is like it's encoded in the law. Like, that's what.
Tim Wilson
Gershoff's Law.
Matt Gershoff
No. Yeah. It has nothing to do. Right.
Michael Helveling
It's like 100%.
Matt Gershoff
It's not like I'm bringing this to the table. It's like that's, you know, privacy by design is embedded in things like GDPR, Article 25 and Principle 5.5C. I think. So it's not. It's not like it's. It's not like I'm suggesting that people do this special thing. It's really. This is what's out there. This is part of, like, the expected behavior. Behavior, especially at least in Europe, I guess. And what are some ways that we might want to think about it? And. Oh, yeah, also it is, I think, supports this idea Which I think is really the main point from my perspective is that the value of the value is not in this technology. It's not in our software or other company software. It's, it's, it's not in any statistical method or the analytics method. It's really about being thoughtful about what it is you're trying to do and being thoughtful about what the customer might care about and being explicit about how you're allocating resources and then thinking about things at the margin. And a nice added benefit of thinking about datumization and privacy engineering is that it is consistent with thinking that way. That's really the main thing I think, I think it's, that's what's nice about it is that it helps us think through and be, have clarity about why we're doing stuff. What you wind up doing is, is not for me or any of us to say. It's really going to be ultimately for everyone in whatever context they're in. That's all. It's, it's really just calling that out that we, we can actually have sort of outcomes. One of the, it's not, it's not going to be my last call, but it's Jennifer Palka who wrote Recoding America. There's a really good podcast with her on Ezra Klein his podcast. And I think she has great clarity on where she talks about procedural thinkers, outcome based thinkers. And I think that's a really, she kind of frames it in a way that I think about all the time. And a lot of privacy conversation is really procedural. It's like, do we have you followed this process check? You know, have we have we hit the check marks? Yeah. Great. You know, but it's sort of like doesn't tell you how to do anything. It doesn't tell you about how to improve your outcomes. Whereas the privacy engineering side of things, things is really outcomes based. It's like how do we actually do stuff? And I think the one thing that is the theme that runs through analytics and marketing analytics specifically is about outcomes. We really should be caring about outcomes and actually being productive.
Tim Wilson
I mean you can say that it's not you saying this, but I mean as you're saying that, I mean, I think you're pointing it out. But if you look at all of the hand wringing around GDPR and different kind of privacy legislation in Europe and then they're oh, these countries are saying that their interpretation is Google Analytics is not valid. Like as soon as that sort of becomes the debate, it becomes the regulators don't Understand digital and that's not reasonable. And let us rationalize why the way that we're doing things is fine. So like that then that just sucks all the oxygen out of the conversation is what's the ruling going to be as to whether this platform is allowed in this region based on this argument? And it feels like it just by default moves four steps away from the underlying intent and the principle and then has a debate kind of in the wrong space where you're pointing out that like no, no, no, where it started is valid and let's not rip it away from there and go have an argument somewhere else that's already missed the point.
Michael Helveling
Yeah.
Matt Gershoff
And you don't have to be part of that argument. That's like you don't, that's, that's a decision that you make. Like do you, are you in? Do you, Is that what you care about? It's not what I care about. And so you know, we just want to make good product and that's, that's respectful of our users and is consistent with, with some of these principles and it has some like nice benefits and, and we're just. I'm chatting with you all right now it's really like here A, is an example and then also B, again making sure we just don't just mindlessly collect data. Now there's a reason to like push back on that is that, you know, privacy or data minimization is, is the default. And so you make that what you will like. It's, it's really going to be up to everyone else. But, but I think it's, it's valid just to sort of point it out. But yeah, I mean there's a lot of nonsense out there, Tim. So you know, so what? Right? Like there's a lot like, you know, there's, there's, I mean if you're getting your information from LinkedIn primarily, what's LinkedIn, it's like a lot of people like self promoting their stuff and people like, are they really experts? You look at it, not a lot of people aren't. And there's a lot of like nonsense multipliers. There's a lot of agencies out there. Like people just, you gotta step back and think about what the perverse incentives are. And there's a lot of perverse incentives out there. And you know a lot of folks are selling product and are selling services and what is new often is something that they can use to sell. And I just think by being, again, you know, I overuse the word intentional but just Being thoughtful and mindful is a protection against acting in a way that isn't rational. And you can bump up what they're saying to see if it's sort of consistent. Consistent with, with what your actual needs are. And again, I'm, I, I sell software and so, you know, people can be, you know, can, you know, I have my biases as well. And so, you know, I'm well aware of that. But again, this isn't, this is stuff that is not made up by, by us, by me. I mean, that's, it's just, it's kind of the law and just a, just a way of thinking about it. But again, we're not selling. There's no one way to do things and we're not being paternalistic about it. It's not for me to say or any of us to say how others should. Well, you all are some of your consultants, so I guess it is kind of for you to be, to give guidance. But, you know, it's ultimately, you know, the way we look at, it's our job to give. It's almost like being a doctor and there's various treatments and we may have a preference about what we think a type of treatment works, but it's ultimately up to the client to think through what are the trade offs between different interventions and does one approach work better for them? They are in a better position to know. It's just really our job to give them options. And ultimately, if they do something they want to do an approach that isn't what we would have done, that's totally fine. It's not for us to say. It's just our job to give, to be, act in good faith and, and kind of give them options.
Michael Helveling
I love that we've got this conversation done now because I think we're going to be referring to it again and again and again over the next many years. This is good on a lot of levels for a couple reasons. One, because when we start seeing vendors in five years talking about this, we'll know where it came from. And, and as we sort of seek out and pursue sort of almost like a new set of first principles as analysts around how incorporating privacy in a proactive manner works. It's starting at this, at this sort of juncture. It's a lot of food for thought. All right. This has been outstanding. Yeah. As per usual. And thank you, Matt. Thank you very much.
Matt Gershoff
Well, thank you so much for having me. It's been a, it's been a real pleasure.
Michael Helveling
It's good. I'm I've got a lot of thoughts going on, as I usually do when we talk, and none of them are very well formed, and most of them probably don't make any sense. So it's going to take a while. But this is really good. And I think I echo what you were saying, Julie, which is sort of like, this is the first time I've sort of looked at privacy stuff and not felt sort of like this. Oh, they're just crushing our fun, and we have to follow all these rules. There's now sort of like, okay, there's a path forward, and I can get excited about that. Now I'm intrigued, and I want to go learn more about how do I incorporate that as part of a central part of my path out from here, which I think is.
Julie Hoyer
Can I just say I. I do to echo that, Michael. I started to feel at the very end, I was starting to, like, culminate all my thoughts finally into something coherent of. I really like that this way of thinking gets rid of the fear of feeling like they're losing something with the privacy laws out there and the new regulations coming, because I feel like that's what always, always the conversation is about, is we're losing this, we're losing that. Oh, no. You know, you want to hold on tighter because you feel like things are being pulled away from you. But this kind of breaks that, like, fear cycle. And, yeah, it feels kind of like a new day. Like, oh, turn the page. There's a new way to start. You can start fresh. It's okay.
Michael Helveling
None of our tools support it yet, but then we can start going and building that future.
Matt Gershoff
No, not. Yeah, come on, Come on.
Tim Wilson
Yeah, there might be one.
Matt Gershoff
That was quick. That was. That was a quick get, Michael. That took all of 43 seconds.
Tim Wilson
If only somebody been thinking about this.
Michael Helveling
Back in 2015, like I said, in five to seven years, when some of the vendors start talking about this, you know where you heard it first. All right. One thing we would love to do on the show is go around the horn and share a last call, something that might be of interest to our audience. Matt, you're our guest. Do you have a last call you'd like to share?
Matt Gershoff
Sure. Actually, is it okay if I have a. I have a couple. Yeah, go for it. One is, since we were talking about this, and I. And I. And I just want to be clear that, you know, I'm. I'm sort of adjacent to it. I'm not. I'm not an expert in the. In the privacy engineering space, but there Are experts, experts there. It's just amazing community and I highly recommend anyone who's interested in any of this to attend Pepper, which is the Privacy, Engineering Practice and Respect Conference. It just happened last month and it's coming up next year. But I highly recommend folks, and I can give you all a link if you want to put that on the page for the podcast. Really some of the most inclusive, which.
Tim Wilson
Actually we'll link to the talk you did there is available on YouTube, right?
Matt Gershoff
Yep. It's that conference and really it's some of the smartest people you've ever met and also some of the warmest and most inclusive community. It's very Star Trek rather than Star wars vibe. So it's great and then kind of more literary, but sort of think we talked a little bit about cardinality and sort of ideas of information and whatnot. Is kind of the. I recommend the short stories of Borges. Not sure. Argentinian writer. The Garden of Forking Paths and the Library of Babel. Those are two of his short stories. And I think if you want to be like in the know, data scientist, like sort of a literary data scientist, those are two good short stories to have read. And then once you start reading those, you'll get. You'll get hooked. So those are my. That's my last call.
Tim Wilson
Wait. I assume it will make it through the editing. But I was introduced to the Library of Babel by Joe Sutherland as we were working on this book. So we have a whole. It's actually in the book that we're working on as a explanation and illustration of the Library of Babel. So I should actually read the short story, I guess, instead of just the Wikipedia entry.
Matt Gershoff
Oh, no, it's great. You should read and definitely Garden of Forking Paths, which is often referenced in, you know, research design, which is, you know, people refer to that when talking about researcher degrees of freedom and reproducibility of studies and whatnot. So there's a lot of, you know, a lot of the ideas that are adjacent to what we work on are embedded in these. These great short stories.
Michael Helveling
Very nice. All right, what about you, Julie? What's your last call?
Julie Hoyer
My Last Call is actually inspired by a previous show not long ago with Katie Bauer. I was looking through some of her different articles and I came across one that was titled Deciding if a data leadership role is something you actually want to do. It was an interesting read overall, if that's like a point in your career that you're at. But I just felt like she broke it into a Lot of helpful ways that she thought about making a decision, decision about what next role she wanted. And she talked a lot about, you know, titles and way she thinks about your titles, which I think a lot of people run into that at different points in their career. So I thought that was just a great way of framing it. She then listed a bunch of great questions that she actually used when going through interviews for different roles. And I kind of started to think about how I feel like they would be super helpful. Even me as a consultant thinking about asking my stakeholder or can I ask or can I figure out the answer to these types of questions with like where my stakeholder sits in their org, what is their actual job? What is their role compared to their peers, what is their manager like, who are they working with? What are their relationships like? And she just outlined a lot of different great scenarios of how data teams fit within organizations. And so whether you're using those questions to ask when you're interviewing for a new role, roles, or like I said, I'm kind of inspired to use them in different scenarios. I thought it was a great read.
Michael Helveling
Excellent. All right, Tim, what about you?
Tim Wilson
So I feel like I'm going to be pulling some of these as we've turned in the initial full draft manuscript for the book, which means I've learned a few things that I'd either forgotten or were new things coming out of the brain of Joe Sutherland. And one of them is it's an oldie but a goodie. It's a kind of an academic paper published on the National Library of Medicine at the nih. And the paper is titled Parachute Use to Prevent Death and Major Trauma Related to Gravitational Systematic Review of Randomized Controlled Trials. So it's from 2003 and it is a brief academic paper where these two people who basically kind of dared each other, the notes at the end kind of explain, hint at what happened, but basically they were looking, saying if scientific evidence really requires a randomized controlled trial for high stakes things, then surely we should just go and do a survey of all the randomized controlled trials around the efficacy of parachutes. And the result, they had a whole plan on how they were going to find the outcomes and their meta analysis and what they were going to do. And the results are that our search strategy did not find any randomized controlled traffic trials of the parachute. So it's kind of a little bit of poking fun at the scientific community, but in a kind of a delightful way with some pretty funny footnotes. And it actually did get kind of Published in a way. So it's just kind of a good reminder of being clear on the question you're trying to answer and what your, your options are for answering it. So that's random. What about you, Michael? What's your last call?
Michael Helveling
Well, it's interesting, I. I had a conversation recently with my niece who was getting ready to start the school year and she's taking an AP statistics class, which I didn't even know that kind of class existed in high school. But we started talking about some of the pre work that she got assigned and I realized I was like starting to explain some foundational statistics concepts, you know, that she was kind of like struggling with. And it reminded me of this book I read early in my career called the Cartoon Guide to Statistics. Because whenever I go back to sort of those first things, I'm always reminded of that book which I got recommended to me by actually by Avinash Kaushik way back in the day. So that's my last call. I think I may have done it before, but it's been many, many years and that conversation sort of brought it back up. So if you're getting into statistics or you just want to have a better foundation in statistics, that's actually a great book to have on your shelf to pull off and, and read. And some of the stuff we talked about today I kept up with because I've read that book and it's a cartoon, so it's easy. So anyways, Cartoon Guide is funny. There you go.
Tim Wilson
It's on my, it's on my shelf and I never could make it through it. I should, I should go back and read it now. I feel like I was, didn't. Yeah, I should try it again.
Michael Helveling
It probably make more sense. Yeah, because you. What was funny was how much I realized I'd actually learned over the years about statistics in just trying to explain a couple things and I realized like, wow, I actually know a couple of things about statistics now, which I think that's important I should know. But you know, it's.
Matt Gershoff
And I think if we're being honest, all due to the Conductrix quiz.
Michael Helveling
Oh yeah, absolutely. Absolutely.
Julie Hoyer
Full circle.
Michael Helveling
It's a full circle moment. 100%. Well, yeah. This has been obviously such a great conversation and I know as you're listening, you may have questions, you may have input, there's things you might want to share that we would love to hear from you. And the best way to do that is through the Measure Slack Chat community or you know, as much as we're on LinkedIn as well. And also you could email us@contactnalyticshour IO. And I think, Matt, you're pretty active on that community as well as on the tlc.
Matt Gershoff
Yeah. Highly recommend folks sign up for the Test and Learn community run by Kelly Wertham. That's a great space to learn about all things experimentation in an inclusive space.
Michael Helveling
Yeah, absolutely. And we heartily recommend it as well. And it's a great place to explore these ideas and keep this conversation going as well. So love to hear from you and keep learning more about privacy engineering, privacy by design, K anonymization, differential privacy. I mean, all new and amazing concepts for me today. So awesome. All right. And of course, no show would be complete without a huge thank you to Josh Crowhurst, our producer, for all you do behind the scenes to make this show happen. We thank you very much, sir. And of course, thank you, Matt, so much for coming back on the show. It's always a pleasure. Makes me reminisce about all the awesome times we've had at Super Week and other places. It's always a delight to hang out and talk.
Matt Gershoff
Thank you so much for having me. I really appreciate you all welcoming me back and it was great to meet you, Julie.
Julie Hoyer
Yeah, you too.
Michael Helveling
Awesome. And I think I speak for a random assortment of co hosts that I may have that I've incremented a couple of times. When I say, no matter how you're trying to drive forward privacy, remember, keep analyzing.
Tim Wilson
Thanks for listening. Let's keep the conversation going with your comments, suggestions and questions.
Matt Gershoff
On Twitter @nalyticshour, on the web at.
Tim Wilson
Analyticshour.Io, our LinkedIn group and the Measure chat Slack group.
Matt Gershoff
Music for the podcast by Josh Crowhurst.
Tim Wilson
So smart guys wanted to fit in, so they made up a term called analytics.
Michael Helveling
Analytics don't work.
Matt Gershoff
Do the analytics say go for it no matter who's going for it. So if you and I were on the field, the analytics say, go for it. It's the stupidest, laziest, lamest thing I've ever heard. For reasoning in competition, text was like, Tim and Mo were supposed to be cool, almost like secret agents and like just had their shit together. And Michael was just kind of like, you know, did you ever see what's that movie with Matt Damon and Alec Baldwin and it's like all bosses and Wahlberg. And there's that scene where Alec Baldwin is like the police commissioner and he's all like frantic and he's sweating and he's just like totally discombobulated. That was how I thought of Michael, but just, like, totally out of sorts. Just. And then Tim and Mo would just kind of come in and just be like, cool cucumbers and like, just have their shit together. And Michael never played it correctly. And he edited it out. He wouldn't say, oh. Anyway, I said, yeah, I had, like. I had a dialogue for him. No, that was the whole bit.
Tim Wilson
But how did you really feel?
Matt Gershoff
But Michael, I can't believe, Like, I thought he would just, like, lean into it. But no, he was too embarrassed or he, like, didn't like, you know, he's like. His ego was just great to play. Yeah, he. He just didn't want to play it. I. You know, he just couldn't play it up. He's like, I'm too serious for this. I'm not going to be the one.
Tim Wilson
Who doesn't know what's going on.
Matt Gershoff
Well, you're not the one who's answering the questions. That was the whole point.
Michael Helveling
Vision. I just didn't understand the vision. Not. Not cut out.
Matt Gershoff
Julie picked up on it. Julie picked up on it. That was.
Julie Hoyer
No, Michael said that verbatim in one of the episodes. He literally stopped midway into the quiz and goes, why am I always panicking? Why am I so frantic?
Matt Gershoff
That's the whole bit that was like the. The narrative theme. Mo and Tim were like. Like the 007s.
Tim Wilson
Rock flag and there's a lot of nonsense out there.
Michael Helveling
Nice.
The Analytics Power Hour
Episode #253: Adopting a Just In Time, Just Enough Data Mindset with Matt Gershoff
Release Date: September 3, 2024
In Episode #253 of The Analytics Power Hour, hosts Michael Helveling, Tim Wilson, and Julie Hoyer engage in an enlightening discussion with Matt Gershoff, CEO of Conductrix. The conversation centers on the evolving landscape of digital analytics, particularly in light of stringent privacy regulations and the urgent need for data minimization. The episode delves into shifting from a data-maximizing mindset to a more intentional, privacy-centric approach.
Matt Gershoff opens the dialogue by challenging the conventional analytics paradigm that often prioritizes the collection of extensive data "just in case" it proves useful in the future. He explains Conductrix's philosophy:
Matt Gershoff [02:47]: “We really feel like the value of experimentation is that it provides a principled procedure for organizations to make decisions intentionally... Why should we collect this next piece of data or the added data?”
Gershoff emphasizes that the true value lies not in the volume of data but in the intentional use of data to drive meaningful decisions. This approach aligns with the principles of privacy by design, ensuring that data collection and usage respect user privacy by default.
The conversation progresses to explore the nuances of privacy engineering versus traditional compliance-based approaches. Gershoff articulates that privacy engineering is not merely about adhering to regulations but embedding privacy into the very fabric of data collection and analytics processes.
Matt Gershoff [06:12]: “We're asking for a particular task whether or not the data is pertinent... It's about being mindful and intentional.”
He advocates for evaluating the marginal value of each data point, questioning the necessity of its collection, and ensuring that data practices are inherently respectful of user privacy. This contrasts sharply with the "collect everything" mentality prevalent in the industry.
Tim Wilson and Julie Hoyer probe deeper into the practical implications of adopting a "Just In Time, Just Enough Data" mindset. They discuss how excessive data collection can lead to inefficiencies and obscure the actual business questions that need answering.
Tim Wilson [07:24]: “The marginal cost of the next level of granularity... It just has kind of ballooned out that you add on a million additional data points.”
Gershoff illustrates how Conductrix applies data minimization in their experimentation platform by aggregating data into equivalence classes instead of storing individual user data. This not only enhances privacy but also improves computational efficiency.
Matt Gershoff [28:18]: “The main takeaway is that we can store data in an aggregate way such that we can do the same analysis as if we had the data or most of the same types of analysis as if we had the data at the individual level.”
The discussion transitions to the complexities of complying with regulations like GDPR. Gershoff explains the challenges and benefits of implementing differential privacy and K-anonymization techniques.
Matt Gershoff [34:50]: “Differential privacy... you inject a certain known amount of noise into the data... it's deeply related to statistical hypothesis testing.”
While acknowledging the technical intricacies, Gershoff underscores the importance of outcome-based privacy strategies over procedural compliance. He advocates for engineering solutions that enable organizations to respect privacy while still deriving valuable insights from data.
Michael Helveling raises a critical concern about the readiness of organizations to implement such intentional data practices, highlighting the knowledge gap in setting up effective data bucketing and anonymization.
Michael Helveling [31:18]: “How do they get that level of information or knowledge to be able to take that next step?”
Gershoff responds by reiterating the necessity of thoughtful design and suggests that privacy engineering should be seen as an integral part of the analytics strategy rather than an afterthought. He emphasizes that adopting these principles can lead to more sustainable and respectful data practices.
As the episode nears its conclusion, the hosts and Gershoff reflect on the broader implications of this shift for the analytics community. They encourage listeners to engage with privacy engineering communities and continue educating themselves on best practices.
Matt Gershoff [53:06]: “The value... is really about being thoughtful about what it is you're trying to do and being mindful about what the customer might care about.”
Episode #253 of The Analytics Power Hour offers a forward-thinking perspective on data analytics in the age of privacy regulation. Matt Gershoff provides valuable insights into how organizations can transition from data maximalism to a principled, intentional approach that respects user privacy and enhances decision-making processes. The discussion serves as a call to action for analysts and businesses alike to rethink their data strategies in alignment with privacy-by-design principles.
Key Takeaways:
Intentional Data Collection: Focus on collecting only the data necessary to answer specific business questions.
Privacy by Design: Embed privacy considerations into the core of data analytics processes rather than as an afterthought.
Data Minimization Techniques: Utilize methods like K-anonymization and differential privacy to protect user data while maintaining analytical integrity.
Outcome-Based Approach: Prioritize the meaningful application of data over the sheer volume of data collected.
Cultural Shift: Encourage organizations to adopt a mindset that values thoughtful and respectful data practices.
Notable Quotes:
Matt Gershoff [02:47]: “Why should we collect this next piece of data or the added data?”
Tim Wilson [07:24]: “It just has kind of ballooned out that you add on a million additional data points.”
Matt Gershoff [28:18]: “The main takeaway is that we can store data in an aggregate way such that we can do the same analysis as if we had the data or most of the same types of analysis as if we had the data at the individual level.”
Michael Helveling [31:18]: “How do they get that level of information or knowledge to be able to take that next step?”
Matt Gershoff [53:06]: “It's really about being thoughtful about what it is you're trying to do and being mindful about what the customer might care about.”
This summary encapsulates the essence of the episode, highlighting the critical shift towards a privacy-first, intentional data collection strategy. It provides valuable insights for analysts and businesses aiming to navigate the complexities of modern data privacy regulations while maintaining effective analytics practices.