Transcript
A (0:00)
Foreign. Welcome to the Analytics Power Hour. Analytics topics covered conversationally and sometimes with explicit language. Hey, everybody, welcome.
B (0:16)
It's the Analytics Power Hour. This is episode 285. You know, for some reason, I have always been Bayesian when it comes to statistics. I didn't arrive there on purpose. It was just sort of intuitive. I also wasn't paying super close attention to my priors along the way. So who knows really how I ended up this way. And on this episode, we're going to do something incredibly brave and slightly reckless. We're going to talk about Bayesian statistics without scaring anyone who doesn't have a PhD. And if you do have a PhD, keep a stress ball handy. It might get turbulent. But before we get into it, let me introduce my co hosts, Mo Kiss. How are you?
C (0:58)
I'm good, thanks for asking.
B (1:00)
Nice. I'm hoping you'll be my buddy on this episode. And Tim Wilson. How you doing?
D (1:10)
I'm going great.
B (1:13)
I've never come right out and asked you, Tim, but are you or have you ever been a frequentist?
D (1:21)
Beats the hell out of me. I live my life for P values that are less than 0.05.
B (1:27)
So that sounds like something a frequentist with. Anyways, I don't know.
A (1:31)
Okay.
B (1:31)
I'm Michael Helmlich. Well, we needed a guest. Someone whose work we've been appreciating for years now in the area of media mix modeling. Michael Kaminsky is the co CEO of Recast. He's also one of the organizers of the locally optimistic Slack community and he was previously a guest on episode 232 and now he is back once again. Welcome back to the show, Michael.
A (1:54)
So glad to be back. Thanks for having me. Really excited for this one.
B (1:57)
So you heard Tim say it. He is a huge fan of P values and hacking them to get the answers he wants. Why? Is he wrong? No, I'm just kidding. We're going to talk about Bayes. So maybe just start in on sort of a little bit of an explainer into Bayesian statistics.
A (2:19)
Yeah, so this is a great question. So there's a lot of different directions that we can take the conversation. I'll try to do a little bit of an overview of how I think about Bayesian statistics and how this fits into the sort of universe of different types of analytical strategies that people might take. So maybe we'll start with a little bit of history. So people, I think today hear Bayesian statistics and if you came up through college and maybe graduate degrees like I did, Bayesian statistics sounds like a New thing. You maybe didn't learn about this in university. When I took statistics classes and econometrics classes in university, never talked about Bayes, never really talked about Bayes Theorem. And so when I first started hearing about it, I was like, oh, this is some new thing that people invented. But that's not really true. If we think back about the history of probability and statistics, Bayesian statistics is sort of the original type of statistics. It's a very simple mathematical approach to thinking, thinking about how do we calculate probabilities, how do we estimate the probability of something happening in the future. And really, it's the frequentist approach, largely spearheaded by this guy, R.A. fisher, around the turn of the 20th century, that was sort of the new statistics at the time. And because this frequentist approach was very convenient for a lot of important questions that were relevant at the time, it really gained a whole ton of popularity. But Bayesian statistics has been around for a very, very long time. It's the original type of statistics, if we want to call it that, but it has been resurging in popularity in recent years for some very specific technical reasons, which we might get into later. So what is the idea behind Bayesian statistics? The way that I like to think about it is that Bayesians tend to think from simulations. Where what we want to do is in general, as a Bayesian, is we want to build a model that describes the world. And here, model, you can generally think about it as simulation. I want to build a simulation that I think describes the world. What are the rules of the natural universe of the thing that I am trying to model. And then I want to compare the implications of that simulated world with actual data that we observe, and then try to learn about some parameters of interest from comparing the simulated world that I code up, generally in some software programming language with the data. Again, a lot of people, when they're thinking about statistics, are thinking about a B test, they're thinking about regressions. But there's all kinds of statistical models that we can imagine that are way far outside of that. If we think back to the COVID pandemic, a lot of people did really interesting work trying to model the spread of disease. You might have some biological model where you're trying to think about, okay, there's some coefficient of spread and there's some amount of people being treated. And not then what we want to do is we want to see, well, how well does this model fit the data? What would the coefficient of spread have to be in order to explain the pattern of data that we see. That is a very natural Bayesian approach to statistics. We're going to start with some model informed by science, informed by our understanding of the world, and we're going to compare the implications of that model with data that we actually observe and then use that to infer things about some parameter of interest. I'm going to pause there. Hopefully that was a reasonable summary.
