
Why? Or… y? What is y? Why, it's mx + b! It's the formula for a line, which is just a hop, a skip, and an error term away from the formula for a linear regression! On the one hand, it couldn't be simpler. On the other hand, it's a broad and deep...
Loading summary
Julie Hoyer
Welcome to the Analytics Power Hour.
Tim Wilson
Analytics topics covered conversationally and sometimes with explicit language. Hi, everyone. Welcome to the Analytics Power Hour. This is episode number 267. I'm Tim Wilson from Facts and Feelings, and according to the logistic regression that I ran personally on the last 40 episodes of this show, there is a 72.3% chance that I'm joined for this episode by Julie Hoyer from Further. Julie, is that you?
Chelsea Pelaretti
Hey, look at that. Yes, it is. Here I am.
Tim Wilson
Look at my model kicking ass. Sweet. So it also says there's a 61.4% chance that Michael Helbling from Stacked analytics will be another co host. Michael? Uh, Michael. Michael. Uh, okay. No, no. Michael. Let's see. Next up, the model said there was a 41.7% chance that Mo Kiss from Canva would be co hosting. Mo, are you there?
Mo Kiss
I am. But is her model any good?
Tim Wilson
Well, plugged in everything I had when I created the regression, but it still couldn't perfectly predict who would be co hosting. Does that mean my model was wrong? Did a model even really exist? Well, maybe the answer is I don't think so for either one. But I have questions. And when we have questions and we have a podcast, we get to find someone to answer them. In this case, we reached back into our archives for one of our favorite past guests. Chelsea Pelaretti, also known as the Chattastician, is a statistician and data scientist who was our guest way back on episode number 149. By day, she is a consulting statistician with our friends at Recast, but she also has a passion for teaching, bringing interest and excitement about math and statistics to the masses in fun and engaging and even endearing ways. She has a PhD in Computational and data sciences, which the last time she was on, she was still working towards, so. So she has since completed that. And she was an assistant professor at Chapman University up until last year, teaching computer science, data science, and machine learning, which made for some pretty awesome social media content. She's still keeping her foot in teaching. She's actually currently teaching a math through video games seminar as an adjunct professor. And that's, you know, she just likes teaching stuff. And maybe I botched my intro. I was doing so well. But today she is our guest. Welcome back to the show, Chelsea.
Julie Hoyer
Thank you. It is a pleasure to be here, but it makes me feel very old thinking how long ago I it was that I was last on the show.
Tim Wilson
Okay, well, you know, so now you're.
Mo Kiss
Making me feel super, super old.
Tim Wilson
Yeah, I believe you're right. I was in my 40s, and that was a long time ago.
Julie Hoyer
I was in my 20s.
Tim Wilson
Oh, okay. Okay. Well, the passage of time. So this show is. It's actually a direct result of the listener survey we did last year, which we had a bonus episode that came out a little while back that talked about that. And we had multiple respondents who requested in one way or another that we cover, like, specific statistical methods on the show. And this is really, really kind of our first attempt at doing that. So we'll. We'll see how it goes. I'm not ashamed to say I got pretty excited as I was thinking about this show because I realized how much I've been faking various things for a while. And this is my opportunity to ask questions as though I know the answer when I don't. And then I will know. So.
Mo Kiss
And I can ask the questions like, I don't know the answer when I really don't know the answer.
Tim Wilson
So, like, that's good.
Mo Kiss
Good compliment.
Tim Wilson
And Julie will be the only one who understands the answers. So there we go. We've got the full. The full set.
Chelsea Pelaretti
It'll be a refresh for me, too. I don't get to do as many regressions in my day to day as I would like.
Tim Wilson
Well, what seemed like a great place to start would be with that kind of absolute workhorse of prediction, which is, you know, plain old regression. And Chelsea, you're pretty deep. And if I understand the. All the content I read from recast pretty well, then you're pretty deep in the world of kind of Bayesian statistics and causal inference when it comes to doing media mixed modeling work. So does regression, like, come up in your day to day at all? Or is it, like, is that, like, too basic? You've moved on to fancier things?
Julie Hoyer
I mean, there's definitely a time and place for fancier methods. But linear regression is probably the first thing I try in any problem that it might be a good fit for and definitely has a place in my day to day still. And I think there's a sense in which you can think even really complicated. Mmm. Models like you can build with recast or other tools. There's a sense in which it's just an extension of ideas that are present in linear regression. So even if you're not actually using a linear regression, you're really capitalizing on the ideas that using linear regression teaches you. So in that sense, it.
Tim Wilson
It never goes away that checks. I feel like I've watched my. I'll Count myself as one of the people who, when they finally understood kind of what MMM was and then decided to try to explain it, it. You always wind up with the slide that shows the formula for regression and says, look, so your dependent variable is. And your independent variable. And the coefficients mean. So you're saying that is a. Even. Kevin Hartman, like, has a video on, like, the basics of regression. And I think, think he uses. Basically, he doesn't say mixed modeling, but he uses that as an example. So. Okay, that's good to know. So should we define regression and we'll see where it goes from there? Like, if you're. Did you have to do that with students, like, say, here's Regression Intro 101. How would you explain it?
Julie Hoyer
Yeah, absolutely. I mean, if we're just talking about linear regression, it's basically a model that you can look at both predictively, so trying to actually make predictions with it or inferentially trying to understand the relationship between variables. And it's a super simple model because all it is is that equation for a line that you learned back in, I don't know, middle school or whenever that came up for you in the math curriculum, where it's Y equals MX plus B. Right? That's the definition of an equation of a line. And that's exactly what linear regression is. It uses various variables in order to predict something that you're interested in, whether it's revenue or conversions or something like that. By combining the predictors that you have in a linear way, all that means is that every single predictor variable you have, you're going to multiply it by a number, add all of those together, and that's going to be the prediction of your model.
Mo Kiss
Can we go back to when you said it's still such a big part of your day for problems where it's appropriate, when you are sitting at your desk doing your work, what are the problems that you're like, no, I'm definitely not trying that first.
Julie Hoyer
That's a really good question. I think it depends a bit on what tools you have available. My way of working is that anytime a problem comes up, I want to try the simplest method possible to solve that problem.
Tim Wilson
So.
Julie Hoyer
So if it's a problem where perhaps just a graph is going to solve the problem and answer my question. I wouldn't go as far as linear regression to answer that question. And on the flip side, if I have a problem that is super complicated and I know, for instance is going to violate some of the assumptions of linear regression, then I might Skip over it just because I know that any answer that I get out of it might not actually be usable and I don't want to invest more time there. For instance, if we have a really complicated problem, for instance, in marketing, one of the things that we talk about a lot is that we have not a problem. We have an interesting scenario where when we spend money it doesn't have an effect right away all the time. Sometimes it takes a while for that spend to actually have an impact in your market. And that might be something that's kind of hard to represent with a linear regression. You may at that point need to graduate to more complicated models. So let's say you. If I already know that something simpler is going to work, or I know for sure that my problem just doesn't fit the parameters of a linear regression, those are the times when I would say, eh, I'm not even going to try. But pretty much any other time I want to try it because it's a super interpretable model, it's super easy to run compared to other options and most people have a little bit of knowledge about what a linear regression is. So it's really easy to communicate results even to non technical stakeholders.
Tim Wilson
So I feel like you skipped. I mean you kind of briefly hit it as to whether or not it wouldn't be a fit. Like using the Y equals MX plus B. Like you have to have a Y and that Y has to be singular. So if somebody said, can you look at this data and kind of put them into logical groups? Like then where you don't have a, like you need to kind of have a dependent variable and one or more like the framing of the problem needs to kind of lend itself to saying I'm trying to find a relationship between one or more of these things in this other singular thing. Right. Like that's.
Julie Hoyer
Yeah, well you can always, yeah, you could always have like a multivariate regression where you have. You're actually predicting a vector of values but in its most simple form. Exactly. And one of the things you mentioned kind of made me think about. Yeah, it's, it's really only for supervised questions. Supervised in the machine learning world. Meaning we already know the answer for some set subset of our data that we can train on. So for instance, if you're doing customer segmentation, that's an example of something where you don't have a correct answer. Right. We don't know what these latent groups are that are in our customer data. And so we can't use something even a modified version of linear regression, because that's an unsupervised problem. We don't have the answer in order to train our model. So that's an example of where linear regression, or any extension of it wouldn't be, be a good fit.
Chelsea Pelaretti
And you also mentioned, like, assumptions where you said, I have to make sure that no assumptions around a linear regression would be broken. And then I'd say, yes, it's a good fit to use. Can we talk a little bit about those? Like, obviously you just said supervised, but what are some of the other ones?
Julie Hoyer
Yeah, there's a ton of them. Some of them have really funny names. So I'll give you a little bit of a warning before we get there. But one of the most important ones is something that sounds so, so silly, but is the assumption that the relationship between our variables and our outcome is linear in the parameters. So what that means is whatever columns of your data that you're going to plug in as predictors into your model, the relationship between them and your outcome has to be linear and additive, meaning that as your predictor variable increases, there's some type of constant relationship between that increase and the increase in the predicted outcome that you're trying to predict. So, for instance, if you're trying to predict, like, how much revenue am I going to get depending on how much I spend in Facebook, In a linear regression model, your coefficient says no matter how much I spend, every increased dollar that I spend in Facebook is going to increase my predicted revenue the exact same amount. Not maybe the most good assumption. In a lot of cases. An additive basically means that all of the impacts that my different predictor variables are going to have, they're kind of independent, and we're adding them all together at the end. And that's just reflected in that Y mx B formula, right? Every single predictor is getting multiplied by a constant coefficient. And then we're adding up all of those effects together to get our predicted outcome.
Tim Wilson
But that doesn't mean the predictors can't be squared or cubed or combined across. Multiple, like, interaction effects are basically taking X1 and an X2 and multiplying them together.
Julie Hoyer
Exactly.
Mo Kiss
Okay. Okay, now you've got to go a bit slower. You got to go a bit slower on that one. You lost me.
Julie Hoyer
Okay, that's linear in the parameters, right? The parameters of your regression model are the intercept and a coefficient or multiple coefficients. So when we say linear in the parameters, all we mean is that whatever it is that our predictors are, we're multiplying them by a constant, adding them together. But exactly as you said. So say you think that there isn't a perfectly linear relationship between. We'll stick with the example of Facebook spend and revenue. One of the things that we could do is I like to call just feature engineering. We could take the amount we're spending in Facebook and we could add a new column that is the amount we spent in Facebook squared. And so now we have two predictors. We have how much we spent in Facebook, how much we spent in Facebook squared. But that still fits into the mindset of linear regression because the actual columns you're plugging into your model still linear in the parameters. Our Facebook spend squared is just a new predictor that we are assigning a coefficient, multiplying it by that constant coefficient, adding it to our prediction. And the same happens for interaction. So interaction terms are just the value of two or more predictors multiplied together. And again, it's just feature engineering. You're creating a new column to put into your regression model and that allows you to understand the relationship of when these two things co occur together, how am I adjusting my expectation or my prediction based on the fact that they're occurring together? And I think they have some type of relationship.
Tim Wilson
So there are a couple ways. The feature engineering thing is kind of a fascinating. And I have, I think, two questions. One, like, choosing the features feels like there's a lot of kind of art in that. But there's also, as I understand it, there's a risk that if you, if you chose two features that are strongly, strongly correlated with each other, it could cause problems. And why am I now blanking on what that is?
Julie Hoyer
Multicollinearity.
Tim Wilson
Multicollinearity. And it was there. I was like, it's, yeah, totally there.
Mo Kiss
I was the tip of my tongue.
Tim Wilson
It was. But I guess so that's because if we look at it and say, I'll just throw it at the data, but the data is just going to have X, it's not going to have X squared. It's not going to have the square root of X. It's not going to have x meets x 1 meets x 2. Like what's kind of the approach and what are kind of the risks if you try to get too fancy with that?
Julie Hoyer
Well, one of the risks is that you'll be just too fancy and no one will want to talk to you. But the major risk there is that you're going to be misspecifying your model and. Or you're going to be overfitting your model. So this is a super common thing if you're not really thinking through what it is that you want to include in your model. And let me give you an example. So let's say that I am fitting. We call. So when we add X squared, X cubed, blah blah blah to the predictors of our model, we usually call that polynomial regression. And if you have a polynomial of 75 degrees, I don't know if that means anything to you, but what that means is that the line that you can fit to your data is incredibly wiggly. It can literally hit every single point, probably, or a lot of the points in your data set. And that's going to lead to overfitting. So one of the risks that you take when you do things like this feature engineering is that you might be overfitting. Now there's ways around that. I don't know if you want to get into things like regularization which can help you understand, or like pull back the impact of terms that are K.
Tim Wilson
Fold, cross validation would fit in. No, damn it.
Julie Hoyer
Not quite. Okay. Very related though. Yeah. Too soon. I think you're jumping the gun a little bit. Okay, but, but the other point I wanted to make there. So you know, you have tools like regularization, I'll give you. So the real world way to define regularization is it's any method that makes your model a little bit simpler. In the practice of regularizing coefficient estimates, usually what that means is pulling them closer to zero, unless there's evidence that having a non zero coefficient really improves the fit of your model. So if you're familiar out there with lasso or ridge regression, that's what they're doing. Those are methods of regularization that basically encode the idea that in the real world most effects are exactly zero or close to zero, unless we have pretty strong evidence to the contrary. So that's one thing you can do. But my favorite way of approaching this is always leveraging the subject matter expertise that we have. You know, I have a really technical background, but I can't build good models unless I understand the context of what we're building and what we're building for. And so honestly, my favorite way to approach this is often to get the opinion of subject matter experts who are able to give insight, at least you know, to some degree about what we should include. For instance, we are talking about interaction terms which basically says when two predictors co occur together, how does that change our prediction? A lot of times subject matter experts will have a good idea about which ones should be included in the model. And which ones are just so ridiculous we would never even want to try them. So while there are statistical techniques to handle the risks there, my favorite way is with, if it's available expertise.
Mo Kiss
How do you go selling that to especially, like clients and whatnot? Because I feel like sometimes they think it's like disconnected. Like you go away and build the best model and sometimes convincing them that you need the business context or I guess, guidance on assumptions. I don't know, I feel like there could be resistance or that there could be pushback, that they're bringing bias in. Like, what's your take on that?
Julie Hoyer
Oh, my gosh, that is a huge concern. Especially I work a lot with Bayesian models. So you not only have kind of this prior information coming through the settings you're choosing of your model, what type of model you run, but also through the priors that enter into the analysis. And I think the way that I talk about it has to do with leveraging different sources of data. Right? So we actually have your data, the data that we're going to use to train your model to fit your model. But we also have sources of information that come through your years or decades of expertise. One of the examples I liked to give in my classes, even before I was in the marketing space, is that if I'm doing an experiment and I'm looking at the click rate of an email that I'm going to send out, maybe I'm like ab testing it. It would be silly of me if I didn't leverage the expertise of the marketing people in the room who know that click rates are probably going to be. I'm making this number up around 2%. It would be insane if I got an 80% click. Like something went wrong there if I got an 80% click rate. And while I understand the desire to not bias your model, and certainly that can be a concern, it really feels like throwing away information to not include that expertise in your analysis where appropriate. So through a Bayesian prior or through some model setting configurations, another thing that comes up especially with mmm, these are such complicated models. Not always. I think Tim might have a story about not the most complicated MMM model, but basically these are often really complicated models. And there just isn't enough data for us to get really good estimates on the parameters that we would like to know in your model. And so we need that prior information in order to fit a model that makes sense. Right. If you tell us, hey, ROIs are certainly not going to be 200, that's really helpful. To the model. And I don't think that's biasing. But at the same time, we do caution people a lot to make sure that they're not overly specifying the prior information in a way that tells the model exactly what to say. We still want the data to have a say in what the model is learning, but it would be silly to assume that you don't have expertise from years and years in the marketing space that can help us inform our analyses. But I will say there is a lot of pushback. And often what it comes down to is we'll put very loose parameters of say, okay, if you really don't know, here's different things that you can try and we'll show them. When you have information that's reasonable, it actually improves the insights you can make. And that really happened to me a lot when I was a professor. We would often do a lot of consulting with people who weren't super familiar with Bayesian models, for example, and there was so much pushback about, like, I don't know what a prior should be. Shouldn't the data tell me what the prior should be? And what I usually like to tell them, at least in the psychology space, which was like, primarily where I was working, is, okay, if I told you that this intervention that you're testing had an impact where it improved people's IQ by 70 points, what would you come back and tell me? And most of the time they would say, that's insane. You did something wrong. And I say, that's prior information, even if it's very loose. You're giving us a little bit of an idea of what a reasonable value to expect would be. And that can be super helpful in an analysis.
Mo Kiss
Tim, I'm drawing so many parallels right now between this and when you talk about bracketing for setting targets. What's your, what's your reaction to that?
Tim Wilson
Yeah, I mean, I was thinking the same thing. Like I will frame, I mean, the same thing. Running into setting targets for KPIs. And people say, what are you talking about? Like, I have no idea. And then you say, well, what if your KPI is to give them this drug or this intervention and increase their IQ by 70 points? And they'd be like, well, no, like there's no way it's going to do that. I'm like, oh, well, I guess you do have some expectations. So. But I had not made the link to using that for as priors to go into a Bayesian. That's. That's wild. Does it work if you've if you've got a. If you've got, say, two parties, they're both subject matter experts. And one we'll just go back to Facebook and say, one says, I think Facebook is definitely heavy, you know, is driving sales. And the other person says, I don't think Facebook is doing, you know, anything. Is that still useful to say, yeah, we'll plug it in to, we'll include it in the model. And then. And then the model can, with caveats, come back and say, yeah, it looks like there is a detective now whether there's where it's actually Facebook or something. Confounding. Separate discussion. But is that still useful? It's like this could be at play, included in the model and then see if it's detectable.
Julie Hoyer
Yeah, I mean, in cases like that, where there's not a lot of overlap in the subject matters, subject matter, experts, opinions, you have a couple of ways that you can approach this. One thing that I would do before I even ran a model though, is ask them why they think that is their internal data. What is the assumption that they're making that made them come to that conclusion? Because I would be super shocked if two people with access to the same information, with the same assumptions came to such different conclusions. We never want to include vibes as priors. We want to have kind of informed decisions and reasons why we believe this stuff. And so that would be my first line of defense, because I think once you uncovered that, you'd figure out that they're making really different assumptions, they're applying it in really different contexts, and that's why they're coming to different answers. But if magically that was the case that they really had the same information and they were just coming to very different conclusions, I think one of the things that you could do is you could have a very wide prior. Right. When you take the collective expertise of the different experts, then there does seem to be a lot of uncertainty. And one of the things that you said that was interesting is you're kind of talking about the point estimate of what they expect in terms of, let's say, an roi. One thinks it's really low. One thinks it's really high. But when you extract information that might be useful in an analysis as a prior, you really have to make sure that you're also so thinking about uncertainty. So maybe if you ask them for uncertainty around their estimates, you would find a lot more overlap in what they believe and it might be easier to translate. That being said, if they don't and they can't agree. Then I might do something called a sensitivity analysis where I ran the model with expert ones priors and ran the model with Expert 2's priors and then saw how the model performs. Right. If your model can't forecast well, if it has high variance, it's, you know, adjusting all the time and giving you different insights. That's a bad model. And if one of the priors lends itself more to that poor performance, probably not a good prior or maybe there's something misspecified in our model. But I would test something like that and just see like does it negatively impact the insights we're getting from the analysis?
Chelsea Pelaretti
Can I take us back a little bit? When you talked about using regression for prediction or like inferentially, because when you're talking about like this feature engineering, do you have to go about it differently when you are using it to create a model to predict something? Right. Compared to like you're trying to inference something? Because I've, I think I've used regressions more. So when I'm looking at historical data, we have a business question they're asking and we're trying to infer if a relationship exists. And I feel like that I've always, you know, read things and had the experience talking to colleagues where you have to be really careful about not looking at the results and then tweaking. Right. You don't want to like bring in bias of your features to make it look good at the end. And so we always had these discussions about choosing your features and doing a lot of other work to determine, you know, what relationships you need to represent or not before running it and kind of being like that's the answer. Whereas it sounds like though when you're using it for prediction, you're trying to find the model that has the best fit on what you're training it on. So then you can use it moving forward with high confidence, higher confidence. Right. But can you, am I misinformed there? Like, do you do feature engineering differently for those two scenarios?
Julie Hoyer
Yeah, that's a great question. I think in general, because your goals are so different, right. With prediction, all you care about is that the output of the model, the thing that it's predicting is as close to what the real value is in the real world. Whereas when you're doing inference, what you care most about is, are the parameters of your model, say your regression coefficients, are those accurate to the real life relationships? And often that kind of veers into causal inference territory as well. But basically I would say you would approach it differently, there's a lot of overlap. But for instance, in the predictive space, yeah, we might have a little more freedom to play with our model and try and get the best prediction, but we have to be really still careful about overfitting to the sample of data that we have, which is to Tim's earlier point, why we do things like cross validation or any type of model validation that basically says if we hold out some data from being used to fit the model, can the model still make good predictions? Because if it can't, then it means that we've probably done what you were describing, which is we've kind of over engineered our model to fit too specifically to the sample of data that we have. And it doesn't generalize well. And so that's, you know, the bane of every data scientist life ever, is overfitting. And so we do still have to be careful about that in a predictive sense. However, when we're doing inference, sometimes we have to be a little bit extra careful. An example that I like to give is say you're a bank and you're trying to predict if someone is going to default on a loan that you've given them. In a predictive sense, if knowing whether that person has a yellow car helps you understand and makes good out of sample predictions for whether that person is going to default on their loan, I don't care. I don't care if that's a real relationship, if that's causal. I just want to know if this person is going to pay me back or not. Whereas if you're doing an inferential model, I think we might want to put some more thought into that. Right? We want to do a dag, right, to look at the causal relationships and see if maybe, I don't know, there's some confound there between having a yellow car and defaulting on your loan. And so you might want to put a little bit more thought into that.
Mo Kiss
So wait, have I got this right? So in the case where you're trying to predict, you don't care about a causal link with the yellow car. Have I got that right?
Julie Hoyer
It might not. It depends.
Mo Kiss
Because you care about the prediction is the primary thing that's the most important, the accuracy of the prediction. And then take me through the second, and then take me through the second piece.
Julie Hoyer
Yeah. So in the predictive context, first of all, if having a yellow car doesn't help me predict values that the model's never seen before. Right. So if the out of sample accuracy is bad, do not want to include it. But if it just is say like tangentially related to a different construct we can't measure maybe wealth or eccentricity or something like that that we don't have a good measure for. But having a yellow car is a proxy for, then if that helps me make a prediction, great. It's giving me information about something else. Even though having a yellow car itself is not what is making someone default on a level. Well, maybe I don't know the power that yellow cars have. In an inferential case, what we really care about is are the relationships that we're modeling accurate to the real world. For instance, if I have, let's go the frequentist route, if I have a significant p value on my regression coefficient for having a yellow car, then is that a real relationship? Is that a causal relationship? And when we say causal, what we really mean is if we change the color of your car, if I got in my camo garb and re spray painted your car, is that suddenly going to change how likely it is that you're going to default on your loan payment? That's what we really care about. And so in that case it does matter that really I guess if I'm thinking about it deeply, I don't think the color of your car is causally related. I think there might be some other process like your wealth or how, you know, chaotic and wild you like your car colors that is causing both loan default probability and causing you to have a yellow car. In that case, then I might really care. I'm not really truly estimating a causal impact there. And so that insight isn't going to help me. Right. If I am a bank and I want more people to pay me back, I am not going to then go out and spray paint their cars yellow because that's not a causal factor in paying back your loan.
Tim Wilson
But that's, I mean you mentioned kind of a DAG in passing that goes, it goes back to talking to a subject matter expert who may be into the psychology and they, if you were diagramming it out saying why would a yellow car. And they said oh, that's a, that, that's actually a, you know that, that may be related to flightiness or essence eccentricity or something. You can capture all of that as sort of assumptions or kind of likely relationships that could then guide. Because. Right, because that could give you like. Well, instead of yellow car, if yellow car is kind of a proxy for something else, there might be a better proxy. So maybe you should look at that Other proxy that we can measure and use that instead. Which just gets you back to picking parameters while working with a subject matter expert that are.
Julie Hoyer
Yeah.
Tim Wilson
The best.
Julie Hoyer
Yeah. You can't build very many good models without a subject model. Subject matter expert. And I think it really comes down to. Are you asking a causal question? And often in the space of marketing, we are. Right. When we want to know the effectiveness of. We keep saying Facebook, there are other marketing channels. But if you want to know the effectiveness of Facebook, what you're really asking is, if I change my Facebook spend, is that going to have an impact on whatever it is that I'm measuring? And so we really are asking a causal question in a lot of these scenarios. If you just want to know what things are associated, then we might not care. Right. So if you have a model that's predicting the LTV of a customer, again, it might not matter if you know that they have a yellow car, they're going to spend huge amounts of money with you. Great. That helps me kind of forecast what my customer LTV is. Plan accordingly for whatever server space or whatever it is you need to serve customers like that. And yet it's not the case that I might want to go spray paint my customers cars yellow because that's not actually going to have an impact. So often in marketing, we are asking these causal questions because we want to be able to take an action and understand what impact that action would have on whatever it is we're measuring.
Mo Kiss
Okay.
Julie Hoyer
Weird question.
Mo Kiss
I feel like I'm always doing this and I'm like. But about the business, I'm just curious. So, like, I feel like I'm with you on the yellow car analogy, which I love, by the way. Although I do want to see you in, like, some kind of garb spray painting Tim's car at some point.
Julie Hoyer
So can do.
Tim Wilson
My first car was a yellow 78 Chevy Monza. So, you know. Yeah. And I've always, never defaulted on a loan, so.
Julie Hoyer
Well, there you go. Proof.
Mo Kiss
Okay. So the bit that I. I'm. Yeah. Obviously much more on the business side, how important is it? Like, those two scenarios make sense to me. One is about getting the best prediction. One is about establishing causality. If I were a stakeholder, though, and they saw a little bit under the hood of this yellow car situation, they probably wouldn't understand why. In case A, it's important, but in case B, it isn't. So when. I mean, how much are you showing under the hood? Or how important do you think it is to understand Tim is Pointing out his book, being like, mo, go back and read it for the fifth time.
Tim Wilson
No. Buy it for your. Get it for your business partners. Yeah, no, sorry.
Julie Hoyer
I would love Tim's take on this since he pointed out his book. What do you think?
Mo Kiss
Yeah, Tim, hot seat time.
Tim Wilson
I just remember we were. And it was Joe Sutherland who kind of made that point. I think in the book. More about. I think you can have that discussion. Like, what are we really trying to do here? Like, I feel like the way Chelsea, you were framing it, like, do you care, like, if the analyst has a really good understanding of the distinction in that specific case, like with the yellow car, like, do you, do you care more about just making the best prediction of whether they are going to repay defaults on their loan or not? Or are you really trying to understand kind of the relationship as to what's causing it? And I think in business a lot of times they would say it's more about prediction than, I mean, I think that's like a fundamental. That is a fundamental concept that I wish business users could make that distinction. I don't think we try to educate them. And it's not that hard to do. Right.
Chelsea Pelaretti
And I wonder too, like, if you ask them, is the data point of the prediction, their likelihood to default, going to help you make business decisions, or is the data point that having a yellow car, like, has a causal relationship with you defaulting? Like, which one's going to help you make a business decision? I would argue it's the first one.
Mo Kiss
Right. Like, depends where you are in the business.
Chelsea Pelaretti
True. But if I know you have a yellow car, like, I can't change your yellow car. I mean, I guess you might, but in marketing, treat them differently.
Julie Hoyer
Yeah, but in marketing, you could target yellow car havers.
Tim Wilson
Yeah.
Julie Hoyer
Okay.
Chelsea Pelaretti
There you go.
Mo Kiss
That's why I think in marketing you would be more interested in the causal relationship. In finance, you would care more about the predictive quality because you're responsible for like a company forecast and you care about accuracy. So even where you are in the business might make a really big difference.
Chelsea Pelaretti
That's a good point. And actually I'm. I have an example that I have. I selfishly want to know if a regression then was a good choice here because I feel like it's maybe, maybe it's. I think it's fitting more in the inference side. But we were working with a hotel chain and they.
Tim Wilson
And by the way, if you're listening to this and you're at a hotel chain who worked with Julie, it was the Other hotel chain she was working with. This was totally not you.
Chelsea Pelaretti
Yeah, totally. I mean, it was, it was an interesting question. I'm more questioning if I chose the right way to analyze it. So it's on me. They wanted to know, they had this idea that to help them personalize search results online, that they should be using distance between the person searching and where they wanted to go as like a feature. And so what we started to do was try to say, okay, is there actually because they were trying to obviously increase profit, have people stay places longer, you know, spend more with them, whatever. So we broke it down and said, okay, well, let's start and say, is there even a relationship between distance between where they're searching and wanting to book and profitability? And we broke down profitability into like three ways to look at it. But we used a regression and we ended up finding that like, sure, it says there's a relationship that exists, but like, it's not actually a good. It's like not impactful. It was like you had, you know, the statistical, like P value was there, but the actual coefficient. Like we pretty much were able to tell them that no, these are not great variables to use possibly as like eventual predictors. And again, it was kind of like a weird one. I, we worked a lot with some co workers that it was a. It felt sticky. But we were pretty much trying to just say, like, does a good relationship exist here? Or not to use this almost as like a feature in a more complex model or prediction.
Julie Hoyer
Well, I think the good news is I don't want to set this up as a dichotomy of like inference and prediction are these two concepts and they're both worthy goals. They're not completely separate. Right. We often want to make sure that like we have a good model and that looks the same in both concepts contexts. But in your specific case, you're actually bringing up a really interesting thing. Are things impactful versus are things statistically significant? And I think that that's a really important distinction because, for instance, I might be so confident, especially in these kind of big data scenarios, I might be so confident that when someone is further away, let's say every hundred miles further away they are from their destination, it's going to increase my profit by a hundredth of a cent. And I'm so sure that's true. But that might not be something that you want to action on because of the cost of implementing whatever algorithm or whatever promotion or whatever it is that you're going to do in response to this. It's just not worth a hundredth of a cent per hundred miles. And so I think that's a really important distinction because when we are doing statistical inference, one of the methods we can do that with is frequentist statistics. And often what you're looking at is a P value which just tells you how confident am I. Well, let me rephrase this to be more specific, a P value basically tells you how compatible is what we're observing with a world where this effect truly is zero. So if there really is no effect between the distance that someone is from their destination and whatever profit metric you're using, how compatible is the data that I observed with that world? And often the answer is going to be not compatible. This would be a ridiculous thing to observe in a world where there is no relationship. And that's what the P value really tells you is like how compatible is what I observed with this idea of not having a relationship. But that being said, even if I am pretty certain that my data is incompatible with a world where there's no relationship, it doesn't mean that that relationship is like practically significant is often the term people use. And so I think you have to distinguish between that. Now, that being said, we're sort of veering into the territory of using p values to do variable selection in regression models, which is a little bit iffy and kind of a can of worms to get into. But I do think that that distinction between a statistical test and a practical significance of the result is so important and it comes up a lot. If you're like a B testing something and you get a non significant result, oftentimes people just throw those tests away, they'll say ah, not significant. Can't use any of the insights here, but that's the wrong way to think about it because if you think about the way that frequentist testing works, which is often what is reported with these tests, you might have a non significant P value because the null is true. You are actually living in a world where your A B test had no effect, whatever variant or intervention you're testing, no effect. Or it could be that there's so much uncertainty about the effect that we can't exclude zero, but there might be some evidence that the effect is positive or negative in either direction. And by itself null hypothesis significance test, don't distinguish between those scenarios. And so it's one of those cases where it's super important to not just use what typically is outputted. So a null hypothesis significance test, maybe you pair that with an equivalence test which Tells you if something is practically an effect is practically equivalent to zero. Maybe you look at effect sizes to see what that effect is or how precise your measurement is. Are we very certain it's zero, or are we so uncertain it could be zero? Because those are very different things, very different results that you can get from a test.
Tim Wilson
Oh, this is killing me. Because we are not going to get to logistic regression. We're not going to get to talking more about squares and how OLS is least squares and why is it squares? We're not going to talk about time and why time is uniquely. And if I stop now, Mo's going to say, I have one more question.
Mo Kiss
I do.
Tim Wilson
And that's why I left time for Moe's one more question.
Mo Kiss
Stupid question. And it is something that has been top of mind since we first started speaking. And dear listeners, I promise we will put up a picture of this in the show notes because I can't talk about it without the picture. And I realize we're on a podcast, so. Okay, so I'm gonna do my best to explain what we're looking at. Basically, on the left, we have a line kind of going up, like what you would see in a typical linear regression. As you increase spend, revenue goes up, profit goes up, et cetera. Right. And I feel like I spent so much time looking at this particular graph, and the reason I think this is really important is because I sometimes wonder if people's familiarity with linear regression means that often we, like, interpret this relationship. You increase spend, revenue increases linearly. And I sometimes get, like, concerned that we're. We're always trying to untrain this out of our stakeholders. And so the graph that we have on the right shows a diminishing return curve. So basically, for every extra dollar you spend, you have less and less revenue. Right. And, like, I feel like, fundamentally, for a business that uses MMMs, we're constantly trying to, like, unpick this with our stakeholders. And I'm really curious to get Chelsea's perspective. Like, we constantly are talking about linear regression. So easy. It's so simple. It's great. Everyone should know how to use it. And I feel like I've got the opposite problem where I'm like, it's so common. I feel like I'm trying to get people to unlearn it. And is that a fair observation?
Julie Hoyer
Yeah, I mean, I think the problem is that we teach such a fixed set of tools. When people learn something like linear regression, it's often not taught in a way that would allow them to Plug and play some more complex methods on top of it. When really what we should be teaching is those base core concepts of regression that then allow you to plug in. Oh, before we plug in spend into our regression model, we're going to saturate that spend because you can't spend a million dollars in 10 minutes and have the last dollar be as effective as the first dollar. But I think that that's something that's really important to distinguish because it's not that regression is wrong here, it's that we also need something like that saturated spend on top of it that makes it complicated. And because people have learned linear regression in such a fixed way, that results in the graph on the left, which for again, people listening at home is a straight line. No matter how much you spend, every dollar is going to bring you the same same amount of revenue as the previous dollar. Because we've taught regression in such a fixed way, people aren't able to make that generalization that, hey, what if I actually plugged in not spend, but saturated spend? That talks about, okay, when I am spending something, I am not necessarily going to get a dollar's worth of effect from my millionth dollar. Maybe I'll get $0.01's worth of effect my millionth dollar. And so if you think about it in that terms, we can still use the ideas behind linear regression, but it does have that added complexity on top. And because we teach linear regression sort of as this thing that comes out of the box and you can't really alter it, it's hard for people to understand that those two things are still incredibly related. And it makes the real world scenario with saturated spend feel much more complicated than it actually has to to be. Because when I explain it like that of like, our predictor is not just spend, it's saturated spend, that's something that can maybe make a little bit more sense. Although you know, the actual complexity of implementing that is a bit more difficult, at least the insight is still really comprehensible. But that's not what they're thinking about when they're thinking about linear regression, which is probably causing a lot of the problems that you're, that you're describing.
Tim Wilson
But if you're talking to a subject matter expert, I feel like you just described it in a really, really good way. Like how, how are you gonna spend the millionth dollar? Well, you're gonna, you're gonna be casting your net broader like you, you targeted initially, you know, why? So it feels like, I don't know, that feels like another One where you're acknowledging the importance of the marketer being a subject matter expert and that, that what they actually know does kind what the analyst or the statistician is doing with the data. Like they can come together. It can be a kumbaya.
Julie Hoyer
It should. Yeah. I'd argue you're gonna do much better work that way.
Mo Kiss
I guess my thing is I feel like the marketers who are like on the tools get that right. Because they understand saturation. I think it's more when you're trying to explain it to in your leadership that it gets really tricky.
Julie Hoyer
Well, you can give them some examples of. You know, I always liked the example of if you're, if you have an influencer channel at first, you're going to be able to scale up really well, right? You're going to be able to find influencers that are targeting your niche audience or people who are really likely to align with your brand. But if you keep spending up, you're eventually going to get to that random influencer who has 100 followers and all they do is review you toe socks. Is that going to be an effective spend for you? No. And I think that that's really important that like you are really limiting yourself. If you wanted to spend a billion dollars an influencer, you're really not going to be able to scale up in that way. And hopefully that's an example that makes them both laugh but also go, oh yeah, my last dollar is not going to be effective because, you know, I can only find so many influencers that are really aligned with my brand.
Mo Kiss
All right, I'm gonna use the toe sock example.
Chelsea Pelaretti
Toe socks and yellow cars.
Tim Wilson
That was toy socks and yellow cars. Good. We, I, I saved Buffer for, for Mo's last question. It, it created chaos that you dear listeners did not have to experience because of the magic of editing. With that, we are going to have to wrap and the last thing we like to do is go around and do a last call, something that each of us found kind of interesting, share worthy, related to the show or not. And Chelsea, you are our guest. Would you like to go first with the last call?
Julie Hoyer
I really would. You said this didn't have to align perfectly, but I did want to choose a recommendation that I think listeners who would enjoy this episode could also use. So actually I'm going to recommend a YouTube channel. I hope I'm pronouncing this correctly because I've never heard them say it on the YouTube channel, but it's Rittvik math and it is an excellent resource for data science and statistics, if you're just starting out or even if you're a little bit more advanced as a way to get really intuitive simple explanations of data science concepts. I've definitely used this as a resource when I was a professor and honestly I still use it now as a resource when my job is exciting explaining statistical models to people who are not building statistical models every day. So highly recommend I will maybe link some of my favorites for you to put in the show notes.
Tim Wilson
That would be awesome. Wow. More fun fun videos to watch. So we know what Michael that's where Michael Helwing's gonna gonna go to the video guy that he is. All right, Mo, what about you? What's your last call?
Mo Kiss
I have a twofer, but one's kind of cheating a little bit. Firstly, I do want to do a shout out to the Recast blog. It is absolutely phenomenal. There was a recent post that I just saw that is hitting very close to home and there's a lot of learnings about like mmms, incrementality, all that sort of stuff that I just find are written so simply and normally using very similar ways of explaining things that I probably would adopt so but probably more eloquent and refined and available for you in a blog Simple. So definitely check it out@get recast.com My second one is I am in the midst of perform like our performance cycle where we do like all our performance reviews and growth and impact goals and role changes and all that sort of stuff. And unsurprisingly I've been using a lot of AI, but also I wrote a prompt to look for bias and feedback which led me down a complete rabbit hole. And then I found this article that was really interesting which was called Unfair but Valid Feedback. The seeming contradiction and the thing I really took away from it is sometimes feedback feels really unfair or potentially biased, but it could still be valid too. And so like how do you take away the valid component while potentially challenging like the bias or the way it was delivered or something like that. So I just, yeah, I found that one super, super interesting and made me think really differently about the feedback that is being shared very widely with both myself and across the team.
Tim Wilson
Very cool.
Mo Kiss
So yeah, those are mine.
Tim Wilson
Nice. Julie, what about you?
Mo Kiss
Totally different spectrums.
Chelsea Pelaretti
Well, mine is going to take us for another turn, hard turn here. But we were talking about our, you know, our pets with fun names at the beginning of the episode and I had recently read that an animal microchip company Save this Life abruptly shut down recently, and there could be tens of thousands, hundreds of thousands of pets that use these chips, and they just aren't a business anymore. So. And why I kind of chose this one was one, the pet component, but two we recently have talked about on episodes too, like what happens when a company, you know, gets bought and what happens to their whole database. Well, in this case, what happens when a company shuts down and they have a whole database of data people are expecting to be able to use if they, you know, lose their pet, whatever. So I wanted to call that out more as a PSA to everyone who loves their pet and might have them chipped. You can get it checked. They included that chips starting with 991 or 900164 are from Save this Life and you can actually re register them. And there are lots of other companies you can re register with. And they even included that only 6 in 10 microchips are actually registered. So I guess overall, maybe just check that that chip is registered so you can find your. Your beloved pet if you ever need to.
Tim Wilson
Going. Going hard on psa.
Mo Kiss
What a fuzzy way to. What a fuzzy way to fit.
Chelsea Pelaretti
Tim, what about you?
Tim Wilson
Well, just because I think it was before, we might have been recording it maybe in the outtakes. So there are stories behind everybody's pets names, but we. Chelsea, what is your. What is your dog's name?
Julie Hoyer
My dog's name is Nova, which is short for anova, or An Analysis of Variants. She's named after that because I got my start in psychology statistics where ANOVAs were really big. So a little nod to my past.
Tim Wilson
Yeah. So additional cred. If the content from the episode wasn't already like, wow, she's. She knows her stuff. She's got a dog named Inova. So my last call is a blog post by our longtime friend of the show, Matt Kershaw, called Adjusted Power Multi Arm Bandit. And what I really liked about it is that it's literally just working with, like, a basic understanding of power and confidence in looking at test results and how to think about test design under different scenarios, which anybody who's dealt with Matt Kershaw much has sometimes left very, very confused. And I almost followed it all the way through, but I liked it just because it was like, if you think about what you're really trying to do, even if you've got 10 variations that you're trying to test, that doesn't mean you definitionally need some crazy sample size. If you're really clear on what problem you're trying to solve. So maybe that even gets to our inference slash prediction discussion earlier. So it was a good, fun read. So with that, Chelsea, thanks so much for coming back on. I don't know how long we'll have to wait to ask you to come do a part two on regression or hit one of the other. I mean, ANOVA could be the. Could be the next next one up.
Julie Hoyer
It's basically regression. We've essentially covered it.
Tim Wilson
Wow. Okay. Boom, we got a twofer. So you your activity on the socials, where are you hanging out the most these days?
Julie Hoyer
I am hanging out mostly still on Twitter and also Blue sky, which I'm not gonna lie in my head, is pronounced blue ski. So I was really scared I was gonna say that out loud. So I'm there. You can also find me on TikTok, though I don't post a lot. Anytime you search for ElsieParlette, I should probably come up. But yeah, I think those are the two places that I post the most. And you'll get the most meme for your energy in those places.
Tim Wilson
I'm finding out how little I am on Twitter now and that I have had the longest, slowest direct messaging going back and forth with Evan Lapointe, like two weeks. Because I'm like, oh, shit, I'm really not checking Twitter. But I am, I am. I feel like I'm almost fully converted to Blueski. Skeeting on the Blueski.
Julie Hoyer
Don't reinforce that. It's gonna come out one day.
Tim Wilson
So no show would be complete without thanking Josh Crowhurst. We've. We've not given him too, too much special work to do on this episode, but there's a couple special little challenges for him on this one. We love to hear from our listeners, so you can find us on LinkedIn on the measure Slack. We would love a review on whatever podcast platform you listen to us on if you want to reach out and tell us what you thought of this. Our first attempt at hitting a statistical method, if you were one of the listeners who requested that or if you weren't but thought, hey, that was a terrible idea. I mean, that was a great idea. Would love to hear what you think. So regardless of whether you are regressing logistically or linearly, or whether you're inferencing or progr predicting no matter what you're doing for Mo and Julie, I know they join me in saying you should keep analyzing. Thanks for listening. Let's keep the conversation going with your.
Julie Hoyer
Comments, suggestions and questions on Twitter at nalyticshour. On the web at analyticshour IO, our.
Tim Wilson
LinkedIn group, and the measurechat Slack group.
Julie Hoyer
Music for the podcast by Josh Crowhurst.
Tim Wilson
Troll Smart guys want to fit in, so they made up a term called analytics. Analytics don't work.
Julie Hoyer
Do the analytics say go for it no matter who's going for it. So if you and I were on the field, the analytics say go for it. It's the stupidest, laziest, lamest thing I've ever heard. For reasoning in competition.
Tim Wilson
And then there are people who just have hot sauce they like any other dog after.
Chelsea Pelaretti
Yep, yep, we have a Cholula.
Julie Hoyer
Yeah, good hot sauce.
Chelsea Pelaretti
Great job.
Tim Wilson
I can't remember Mo's dog's name. I can.
Mo Kiss
My dog had. My dog had her name when I got her. But I decided to reach, like, change the spelling. So that was Kaylee from Firefly instead of, like, Kaylee. That's a very obscure TV show.
Tim Wilson
That's Mother Nerd. That's some nerd thread that people.
Mo Kiss
That is literally the only nerdy thing I ever have been able to say ever.
Julie Hoyer
Ever.
Mo Kiss
That is so not normally my thing that I could just throw in there. But anyway, I was going through a five minute.
Tim Wilson
Oh, I got two or three ways I want to go with this. So unless foul or literally the person who's not here.
Mo Kiss
Wow, you spend way too much time with that lady.
Tim Wilson
I. Yep, I. I do. I should not have rolled off a LinkedIn Live earlier today onto a podcast. Okay, so, yeah, like I said, we'll stop.
Julie Hoyer
Rock flag.
Chelsea Pelaretti
And significant doesn't mean impactful.
The Analytics Power Hour: Episode #267 Summary
Title: Regression? It Can be Extraordinary! (OLS FTW. IYKYK.)
Guest: Chelsea Parlett-Pelleriti
Release Date: March 18, 2025
Hosts: Michael Helbling, Moe Kiss, Tim Wilson, Val Kroll, and Julie Hoyer
In episode #267 of The Analytics Power Hour, hosts Tim Wilson, Julie Hoyer, and Mo Kiss delve deep into the world of regression analysis, exploring its foundational role in digital analytics and its evolving applications. The episode features returning guest Chelsea Parlett-Pelleriti, a seasoned statistician and data scientist, who brings her expertise to the discussion.
The conversation begins with an exploration of linear regression, its definitions, and applications.
Chelsea elaborates on linear regression's fundamental equation, drawing parallels to the basic line equation learned in school.
Julie reinforces the idea that even advanced models often build upon the principles of linear regression.
The hosts discuss scenarios where linear regression is appropriate and situations where more complex models might be necessary.
Julie emphasizes the importance of understanding the problem context before selecting a modeling approach.
A crucial part of the discussion centers on the assumptions underpinning linear regression and their implications.
This linearity implies that each predictor has a constant effect on the outcome, a concept that can sometimes be limiting in real-world scenarios.
The conversation shifts to feature engineering—creating new predictor variables to capture complex relationships—and the challenges it poses, such as multicollinearity.
Mo Kiss (15:00):
"What's the approach and what are the risks if you try to get too fancy with feature engineering?"
Julie Hoyer (15:00):
"One of the major risks is misspecifying your model or overfitting it. For example, a polynomial of 75 degrees can make your model overly flexible, fitting every data point but losing generalizability."
The hosts differentiate between using regression for prediction and for inferential purposes, highlighting how goals influence modeling choices.
This distinction is essential for analysts to determine the appropriate use of regression in their projects.
Integrating expertise from SMEs is emphasized as a critical factor in building robust regression models.
Effective communication of regression findings to non-technical stakeholders is discussed, stressing the importance of clarity and relevance.
Julie offers strategies to bridge this gap, such as using relatable examples and focusing on actionable insights.
A memorable part of the episode is the "yellow car" analogy, used to illustrate the difference between predictive power and causal inference.
Mo Kiss (44:06):
"I'm concerned that stakeholders interpret a linear relationship—like spending on Facebook always increases revenue—as a flat, unchanging effect, ignoring diminishing returns."
Julie Hoyer (48:10):
"We teach regression as a fixed tool, but in reality, you can incorporate complexities like saturated spend to better reflect diminishing returns."
This analogy underscores the necessity of understanding the underlying assumptions and real-world implications of regression models.
The discussion touches on techniques to prevent overfitting, ensuring models generalize well to new data.
The hosts briefly explore Bayesian regression models, highlighting the integration of prior knowledge into the modeling process.
As the episode wraps up, each host shares valuable resources and final insights.
Julie Hoyer (50:42):
"I recommend the YouTube channel Rittvik Math for intuitive explanations of data science concepts."
Mo Kiss (53:25):
"Check out the Recast blog for accessible insights on MMMs and incrementality."
Chelsea Pelaretti (54:31):
"Ensure your pet's microchip is registered, especially with recent company shutdowns affecting data accessibility."
Linear Regression Fundamentals: Understanding the basic equation and its applications in both prediction and inference.
Assumptions Matter: Linear relationships, additivity, and the importance of checking model assumptions to ensure validity.
Feature Engineering: Creating new predictors can enhance models but risks include multicollinearity and overfitting.
Prediction vs. Inference: Clear distinction between building models for accurate predictions and for understanding causal relationships.
Collaboration with SMEs: Leveraging subject matter expertise is crucial for effective model building and variable selection.
Communication is Key: Translating complex statistical concepts into actionable business insights for stakeholders.
Tim Wilson (03:50):
"When we have questions and a podcast, we get to find someone to answer them."
Mo Kiss (15:00):
"What’s the approach and what are the risks if you try to get too fancy with feature engineering?"
Julie Hoyer (48:10):
"We teach regression as a fixed tool, but in reality, you can incorporate complexities like saturated spend to better reflect diminishing returns."
Chelsea Pelaretti (29:54):
"Sometimes a non-significant p-value doesn't mean there's no effect; it could mean there's too much uncertainty to rule anything out."
For listeners eager to deepen their understanding of regression and its applications in analytics, this episode offers a comprehensive exploration of foundational concepts, practical challenges, and strategic insights. Whether you're a seasoned analyst or new to the field, Chelsea Parlett-Pelleriti's expertise, combined with the hosts' engaging dialogue, provides valuable takeaways to enhance your analytical toolkit.