
Loading summary
Liberty Vittert
Hello and welcome to the Harvard Data Science Review Podcast. I'm Liberty Vittert, the feature editor of the Harvard Data Science Review and along with me is my co host and editor in chief, Shao Ming. As September marks the start of the NFL regular season, this month's episode will explore how data science is used in professional football to make the sport more entertaining, competitive and safer. The NFL constantly evolves the methods by which data is collected and utilized to enhance the game. In 2017, the league partnered with Amazon Web Services to launch Next gen stats, massively increasing the complexity and the amount of data collected each game. With AI in the picture, the intersection between sports analytics and data science is an even more fascinating topic of discussion. Joining us today is Mike Lopez, the senior director of football data and analytics at the National Football League at the NFL. His work centers on using data to enhance and better understand the game of football. So Mike, thank you so much for being here with us today and talking about this, especially as football season is gearing up and things are moving. I think we want to sort of start at the beginning of this. What was the first use of data analysis by the NFL and how has data been used to enhance the game so far? And sorry, this is a multi part question and what's the future? You know, if we're sort of talking about what the analysis is right now, where do you see the future of data analysis really going?
Mike Lopez
Sure. I grew up outside Boston and my dad was a high school football coach. And when I was 4 or 5 years old and I would make a table of where the defensive linemen on the team that he was going to go up against, where they were aligned. This is coaching in the late 80s and 90s. He's been using data his whole life. Football coaches have always been using data, right? They're looking at the opponents, getting a sense of what the tendencies are, identifying potential weaknesses, clocking how fast players are running. Every football coach that's coached this game has been doing that type of thing, you know, far before, far before I was born. I think what's really changed is that the granularity of data has massively improved in the last two decades. Instead of that coach or that four year old son tabulating the defense and figuring out where players were aligning and where, you know, a potential weak spot might be, all that's now done. Automated, right? You could use the chips that are in the player's pads to figure out where they were aligned. You could take the video and extract how fast a player in college is Moving, even if you weren't even at that game. What's really picked up in the last 5, 10, 15 years has been that the access to data is greatly improved. And so instead of the elements of a football game living in tables or in small files instantaneously right after the game, they're available to teams, fans and players. And that starts with the play by play data. It extends to the GPS locational data that's in the pads. And for us at the league office, that's an untapped resource, right? That is information, that is trends, that is plays, that's schemes, that's things that maybe we should be knowing about. And teams are no different. So that's really where the game is now, where it's going to go. We're getting more data, we're getting optical tracking data. So instead of having just the GPS data, which is sort of like a little dot moving around, you know, at some point you might have the sort of pose of a player and being able to learn about what is the quarterback's rotation, what is the spin on the ball, some of the more fancier questions. And all this is exciting because by and large, no one up until now has had the ability to look at those data sets.
Liberty Vittert
And just to follow up on that, give us a summary of where the data comes from.
Mike Lopez
So I would say for the better part of several decades, football was more or less deduced to a box score or maybe 150 rows of play by play. So you play fantasy football, you get your wide receiver catches and their yards, your quarterback's rating, your average net punt, right? That's the type of thing that you would get from that data that brought us maybe until roughly 2013, 14. Back around that point, the NFL entered a partnership and created sort of what's now called the next gen stats group. That is the team that put the GPS chips, they're from a company called Zebra that go in each of the player's shoulder pads, that emits at a frame of 10 frames per second, the location of all the players in the ball, on the field, wherever they go. And so in some sense, once a player is on the field, their tracking device lights up and that will give their XY location. And then from the XY you can get their speed, their acceleration, their orientation, things like that. So those are, I would say the two primary data sets that folks would work with would be sort of the vox core data and then the next gen stats. There's at this point a bunch of scouting companies too that would be Adding maybe some football terminology to play level data or to player level data. So did a player blitz on this play? What coverage were they in? What did the quarterback throw to his first read? Those are, I would say, what kind of exist. Now in some areas there is video data, so taking what exists in a still frame or in a video and extracting variables that you can't get out of just the GPS data. So it's, it's coming to the NFL. In some areas it exists already in the college space, but it's, it's much messier than some of the data sets that we've dealt with so far. So, you know, we'll have to temper expectations in terms of how and why the, both the league and teams will be able to use that.
Liberty Vittert
What's proprietary? What data is proprietary to the NFL and what data is proprietary to each team? Or is there any, can any fan, can any person just access this data?
Mike Lopez
There's different layers to that. In some sense there's sources right now you can go on. There's open source packages in both R and Python that you can get every play. In the last 20 years in the NFL, some public maintainers have done an excellent job of putting that in the hands of future data scientists. Our league office runs a competition called the Big Data Bowl. It's something that we started when I got the job in 2018. And our job of that is basically to take what was previously proprietary data, put it out there and really sort of rev up the future of football data and analytics. Because at that point, you know, like I mentioned earlier, you know, we kind of went from 155 rows in a game. That's our plays now, our next gen stats. I mean you're talking about 60,000, 100,000 rows per game. So it changes the skill set that you need to be able to use to analyze that pretty quickly. So in some sense the sort of live in game and maybe post game data will take some level of a subscription or something like that to get our next gen stats like that. But at the same point right now, if you go on Kaggle, which is a data science competition site, you can get the last five years of big databull data, which is our player tracking data. It's what the clubs use when they're analyzing players and it's really state of the art for what a data scientist at this point would want to do if they were interested in this career.
Shao Ming
Well, thank you, Mike. And that's a lot of data, which is great. But before I ask You. My question I need to clarify for our international listeners. We're talking about American football. This is not the football that most people are thinking about. That's why you may hear we're using hands all the time. The serious question is, you have been working there for six years. Can you share maybe one or two examples where the data analytics produced by your team made some significant impact, such as improving games entertainment values, driving growth, or enhancing player safety?
Mike Lopez
Sure. So I'll point to one that I think is a pretty good use case of where our team could come in the league in American football, at least we set our rules each March. And when I say we, it's actually the league office doesn't really play a role. We are supposed to be an objective observer in the sense that we provide data to the teams and the teams then go vote on it. So whatever Mike Lopez wants or whatever the league office wants doesn't really matter because the teams get to vote on it. And it's not like a. Well, if seven teams want it and 15 teams don't, it works. You got to get to 24 votes. So back in 2019, there was a lot of teams that put several submissions into potentially reviewing certain types of fouls. You know, no different than right now. Our replay room will look to figure out if a player caught the ball. They might look to figure out if should we have called defensive holding or should we call pass interference or should we have called offensive holding. So there were several rules that were submitted back then, and there were lots of opinions from teams about what was the most impactful foul in the game. Now, our job as impartial observers, we have lots of different ways we could judge impact at that point at the league office, there was not a win probability model. Win probability, something has been around for several years in football, our team, and in fact, this was myself and one of our other data scientists at the time, we didn't even use our own right. We took ones that we knew existed publicly. We got them to work with our data, updated them a little bit to the sort of game at that point. But at that point, there was a lot of, I would say, anecdotal evidence flying around about what the league should do. Our team sat back, listened, took it all in. We had several charts on win probability. We explained how it works, why it matters, and, you know, when the league was deciding how and where to potentially start trying to review penalties, they were using our win probability charts. Right. So we were able to say, here are the most impactful fouls Here are the most impactful ones that we call incorrectly. Here are the most impactful ones that we miss. And then it's sort of a combination of both, sort of video, but then also, you know, our charts. Right. So, you know, we're putting up a bar chart and then we're following it up with, here are the plays that you might be thinking of. And then by the same token, we're able to say, you know, here are the four or five fouls that other folks have mentioned as possibly impactful. Here's just maybe why they're not as impactful. And so I think that was kind of the first example where our team, we kind of brought some of the modern day football analytics to the league office and the decision makers and at least supplemented the conversation just to follow up on that.
Liberty Vittert
We always tend to ask people about the successes with data, but as we all know that work with data, there's big failures too, and take like, oops moments. Has there ever been some sort of new data or new analysis? We were like, oh, this is going to be so cool. We're going to be able to do this, this, and it's just totally useless.
Mike Lopez
Yeah. I mean, I would say for every 10 graphs we make, one of them sees somebody that's important. Right. And a lot of that is, you know, our job is to tell people when we think there's something they should change, not just to say like, hey, we did an interesting analysis and we do a lot of interesting analyses. And sometimes we find things and we're like, let's put a pin in this. And if somebody ever tells us that they think it matters again, we have it ready. But we don't, we don't need to sort of interrupt the working nature, the weekly timeline of NFL season, unless we think that there's something like, hey, this is, this is actually worth stopping. I mentioned kind of the timeline of the league. There's very little the league does in a season change the rules. Right. That that entire conversation waits until after the season. In fact, everything we're doing now, it's the end of almost the end of week three. Everything we're doing now is for a presentation that we'll make in January, February or March. And so, yeah, there's a boatload of things where it's like, hey, let's look into it. Oh, no, maybe not that interesting. Occasionally, hey, let's look into it. And then you share it and you want them to do something about it. And then it's like, you know, maybe they decided it Wasn't actually that important. So we certainly have our false positives and our false negatives of, of what we do. You sort of have to take the wins. And then also we try to do a pretty good job of listening and being in the, in the hearts and heads of the folks that are making the decisions, just to get a sense of the cadence of where we think the conversation will be going in the.
Liberty Vittert
Off season, in that sense of being in the hearts and heads of people that are making the decisions. I mean, even though people were using data long before any of us were working in data science, there's still that sort of human instinct element that I imagine coaches and players must have. So how do you think data, especially all the data driven decision making, how does it complement the human instinct? And have you seen cases where the human instinct was right and the data analysis was wrong?
Mike Lopez
Yeah, oh, absolutely. I mean, I would say a good number of times, you know, it's almost like we're data scientists, right? We're supposed to facet graphs a good number of times we see a trend and then we facet it, and then, you know, suddenly that trend is actually the opposite of what we initially set out to. And a lot of times that context comes from the football experts. Our data science team. We're now sort of up to almost a dozen of us. You know, we're watching almost every game. We're trying to track as much things as we can, but we don't always know that context. We try to be purposeful about, you know, hearing complaints or hearing arguments and saying, hey, listen, the data is pretty robust, that this actually isn't something we need to worry about. One of the areas our team has done a lot with is dealing with the NFL calendar, the schedule, and how they set the league schedule. And every club in May will get their 17 game schedule. There are 32 clubs, there are 32 complaints, right? Every club has something that they think there's some egregious error the league has made that sort of said, hey, this is too difficult, why are we doing this? And so we have a pretty robust pipeline of research that says, hey, listen, there are five or 10 things that push back will come from the clubs, and almost all of it doesn't matter. We have 10, 15, 20 years of data that says teams have been in this situation before, they'll be in it again, and at the end of the day, the better team will usually win. So trying to sort of fight where we feel like there's a lot of data and then maybe when we're less confident in some of our findings, maybe we're a little bit more open to certainly being wrong. But ultimately we're statisticians. Our job is to understand error and some of that error is probably on our end too at times.
Shao Ming
Now, from what you just discussed, I can't help but ask you the question about causal inference. I know you have done quite a bit causing inference yourself. As you know, you mentioned a lot of things you do in the end is you want to convince the decision makers to maybe change their mind, change their practice. So whatever you tell them, I assume there's some kind of understanding that if they make a change, the change will come, which is whether we statisticians tend to be very careful, say we only provide association, not a causation. But in the end, when you take actions, the causal part kind of comes in. Nevertheless. The question I have for you is that as you know, the golden standard causal inference, so it's experiment, design, right. Do A B testing, which itself has problems, but still it's better in your work, do you get to this kind of experimental design, type of thing to drive the causal inference or you do a lot more of these observational study, eliminating confounding factors. How does it work?
Mike Lopez
I've been here six plus years, I have yet to do any experiment or see any experimental data. I see, you know, not all sports data is observational. Right. And I think if we were to move to the business side of things, you know, we could do AB testing with marketing emails or you know, which game to show in a certain market or something along those lines. Ultimately, unless we are able to acquire a new minor league. Right. And maybe have certain teams play under certain rules and then compare other teams playing under certain rules and you can do some randomization there. Almost every data set we're getting is observational. And so I think that's where the question that Liberty asked about sort of some of the context around what we're doing. And there's always context, right? There's always going to be some type of either observed or unobserved variable that's out there that we might be missing. And so not only that, we don't have a playing field where we get to test things. So not only do you have some of the things that maybe you can't measure, you also have potentially a boatload of unintended consequences that come with any decision you make because those types of policies haven't been set before. So trying to think about sort of the long term manifestation of Some of the things that we do, whether it's slightly tweaking this rule or slightly tweaking that rule, you know, it doesn't just affect the player or the play affected. It can often have a more dramatic impact on the game.
Liberty Vittert
Do you have an example of that, where sort of there was unintended consequences from a decision or a rule that was made?
Mike Lopez
Oh, gosh, I have so many. In general, we're dealing with unintended consequences all the time. One of the major ones in football right now is how defenses are playing in terms of how they're facing offenses. And for a long time, every NFL analyst would tell you that teams should pass the ball more. And what has happened is that defenses have said, well, listen, if teams should be passing the ball more, I should maybe be playing a slightly different style of defense. And so they're taking a more passive approach to playing defense, which means, you know, in football terms, you're sort of putting more players further away from the line of scrimmage. The unintended consequence of that is that teams are now running the ball more often. They're not passing it right. So we had this area that you should be passing the ball more, and now teams have all the space to run, so now they're running it. And the real crazy unintended consequence is that our games are going much faster. In other words, because you get rid of the pass play, and teams are now running the ball more. Their quarterback is holding onto the ball and scrambling it. They're not throwing incomplete passes. Incomplete passes stop the clock. And so we don't have this whole boatload of stoppages that we used to have. And so we now have fewer plays in a game. You know, we. For the last decade or so, we were at 150, 155 plays per game. This year, we're a little over 145. So we've taken, you know, we. Not necessarily meaning we, but the strategy of the game. We've dropped almost 10 plays per game. That's, you know, somewhere in the RAD of like 8%. And so scoring's down a little bit. If you look at some of the league trends this year, scoring is down, passing yards are down. There's a lot of other trends that are down. But the really weird thing is we just cut out a bunch of plays. And most of that was sort of the unintended consequence of some of the offensive and defensive strategy. So that's one that maybe we didn't play a role in, but it's some of the results of which we are playing pretty close attention to just because it's can certainly impact where the game is going in the future.
Liberty Vittert
Does that then make, and forgive my total lack of knowledge of football, the fact that you have less plays, would that mean the game lasts less time, like so that you have less time for commercials or you have, you know, less airtime as planned by the channels or what would be sort of the consequence of that besides just the game has changed?
Mike Lopez
Yep, it's been a, it's been a couple minutes faster this year. Yeah, we're a tick over three hours typically or maybe in the 305 range. I think this year it's around 301. So the games are shorter slightly. Right. You're probably not sitting at home noticing much difference, but we certainly are. You know, if you're playing fantasy football, I'm sure your quarterback's fantasy stats are much worse than maybe the projections are set out. I'm sure that lots of folks that are interested in projecting both player and team performance are paying attention to. Fortunately, the games are still really close. We have a lot of exciting comebacks. We have a lot of games decided by one possession still. So that I think is probably most important. But just the nature of the clock always running has sort of changed a little bit about what we can expect from a game.
Shao Ming
So Mike, you have worked with baseball data, football data, hockey data. Can you share with our listeners about the obvious. There's similarity to it, but there must be some differences and also anything particularly kind of a contrasting these different kind of sports data.
Mike Lopez
I mean the really nice thing about paying attention to other sports is that there's almost always lessons to be learned and you know, familiarity with baseball. I mean my thesis in undergraduate at Bates College in 2004 was comparing Nomar, Garcia, Pera and Derek Jeter. And the whole point of that thesis is trying to figure out like what predicts from one year to the next, which was the really cool thing at the time and still is. Like how do you take a set of metrics on a player or a team and understand what's signal and what's noise? And you know, fundamentally 20 years later, that's still what a lot of our analysis is about, is we're measuring things and ideally we understand is this skill repeatable? Is it luck? Is it based on the play collar, Is it based on the defense they were facing? And I think whether you're looking at football, whether you're looking at hockey, whether you're looking at football or Basketball. Right. Any of those sports, we're almost always after the same sort of golden egg in the sense that, like there's some new stat, there's some new context you can account for. There's some new strategy. Right. That is better, that is more repeatable than maybe what was done before. I mean, a really good example that all the sports are getting into now is how to judge athleticism of a player. Right. The NFL can do that now because we have next gen stats chips that tell us how fast players can go. Well, baseball, you can go on MLB site and get every player's sprint speed. So, you know, there's a group that's done that before that's put in the work to figure out how fast players are. NFL wants to do a change of direction to figure out how wide receivers get open. Well, basketball and hockey have some sense of maybe some of that change of direction too. Right. So a lot of the skills that you would take from one sport will often port over to the other. And in fact, I mentioned our data science competition. The winning algorithm our first year was code that was copy and pasted from soccer or some version of it. Right. So you're able to not only take learnings and lessons and questions, but quite literally you're able to take the code that was used in one sport and make it work in another. So I think that's kind of the really cool thing about paying attention to other sports is you get some of those similarities.
Liberty Vittert
You know, I know that cte, and I'm not going to use the full terminology because I'm going to pronounce it wrong, but head injury, constant traumatic head injury, has been so discussed and there was obviously a movie on it, and it's been sort of really much more in the sphere of public discussion. Is there any work you all are doing to reduce injuries? I know you talked about fouls and sort of how you determine a foul. Is that the kind of stuff that's being used to try to decrease injuries or are there other types of analysis that you guys are doing?
Mike Lopez
Yeah. So, I mean, I don't want to speak on behalf of our. You know, we have a whole health and safety arm that looks into this and they've been looking at, you know, with, with head and neck experts and biomechanical engineers. They've been looking at that type of stuff, you know, at this point almost for a decade. I think where I'm most comfortable discussing and sort of talking through is maybe how we would interact with those folks.
Liberty Vittert
Yeah, totally.
Mike Lopez
Yeah. This Year, the NFL has a new kickoff play. So if you watch the first play of an NFL game, you'll notice that instead of the ball and the players starting on one end and then running really, you know, kicking the ball off and running fast down together, the players on both the kickoff team and receiving team start five yards apart, much closer to where the ball ends. That play was the result of collaboration between several groups, including ours, and including folks from the health and safety team, which, you know, by and large was driven because of some of the injury rates. On the kickoff play, they identified that there was a play that had a elevated risk of injury. And, you know, working together with both the special teams coaches and the, you know, folks at the league office, we were able to say, hey, you know, the injury experts say this, this is going to be a safer play. We have reduced speed, reduced space, right. With which to have collisions. And so, you know, where our group would come in and say, hey, listen, we do this, this is what we expect for scoring field position, unintended consequences, how it would affect officiating, what will the strategy be for teams? You know, through the first couple of weeks, we've, you know, we haven't necessarily had a monstrous increase in returns, but we've had more kickoff returns than we had last year. And the expectation is we'll also have a safer play. So, yeah, I mean, I think the goal is to try and think of rules and ways to play the game that, you know, that are easy to officiate, that are competitive, that are exciting for fans, and then of course also, you know, make sure that they are, you know, in terms of the health and safety aspect that they are, you know, thinking of both the current and the future player.
Shao Ming
Mike, one of the students, a Cowboys fan, wanted us to ask you this question. Is there any correlation between having the highest paid NFL player on a team and winning the super bowl within the next three years?
Mike Lopez
One of the really interesting elements of the NFL that unique is that it is a hard salary cap. And so what the teams do with that money that they can allocate to players has changed. Right? And if you look, compare now football to where it was 10, 15 years ago, you know, there are different positions that are more highly valued, there are different positions that are more lowly valued. So it's probably less about maybe a specific player that you're devoting it to. And I know the question is specific to one specific player on that team, but maybe more about the overall distribution. And if you're able to find value at Other positions, maybe you can afford more than the quarterback. If you're not, then maybe you can afford less. It's one of those things that at the. Certainly at the league office, we try and pay less attention to. In fact, even sometimes when we're reviewing film, we'll just use, like, player numbers in the team. So we try to be team agnostic and sort of supportive of all the players. And realistically, I think that probably helps us do our job the best. But it's probably too tough of a question to answer without acknowledging that there probably are multiple ways to put together a good team.
Shao Ming
Now, here's another question from students. Michael, you're doing something very cool that as a student, if I want to do what you do, how do I get your job? What should I study in school?
Mike Lopez
So I would say, I mean, if you were to make a scatter plot in your X axis is, I would say, coding data science skills. And your Y axis is loving knowing football and the ins and outs of it. You know, you want to be in the top. Right? Right. And so if you're really, really good at coding and you have a little bit of knowledge, you can absolutely help. NFL teams, if you love football, it is your passion, and you want to do anything, but you can't code that much. You can probably find a role. It just maybe wouldn't necessarily be as sort of a football data scientist. Our team, the folks that we hire at the league office, and the ones that fit in best, like I said, they're in that top. Right. We are fixated on finding people who can code. And when I say code, it can really be several skills that fit there. It could be maybe more. On the data engineering side, we're able to take some of the algorithms that we're doing or some of the models we're fitting and then help us produce those at scale. It could even be maybe more on the statistics side, right. Where your job is fitting new models and trying to work with new training data and new testing data. And then even on the applied side, we have, you know, we have folks in our team that work with our officials, our subject matter experts. You know, I would put them in the top 50 people in the country in terms of how well they know the NFL rulebook. So there are various skill sets that will help, you know, someone that's on a team or at our league office. And it's trying to combine really the combination of those two things.
Liberty Vittert
Well, I have 800 more questions, so I could talk to you forever, but I don't want to take all your time. So we will wrap up with our final question. We always ask everybody this. We give everyone a magic wand question. So our question to you is, if you could wave your magic wand, what would be the data you want that you don't have that could predict for the draft who's going to be a superstar? Like, what is the data point that. What is the data you'd want to collect, you know, to see you have all these players you're looking at, you're drafting. How do you figure out who's going to be the superstar? Who's the next Tom Brady?
Mike Lopez
The funny thing about the NFL draft is you don't even necessarily need the next Tom Brady. The NFL has a, you know, relative to the other leagues, it's a very difficult sport to contextualize and predict, from college to the pros. You know, in other words, analysts in every sport make what they call draft curves, which take the sort of relative pick and then some value of performance as sort of your Y axis. Right? You're trying to guess, if I have a pick here, what is it worth? Well, the NFL's draft curve, so to speak, is more horizontal, right? It almost doesn't look like a straight line, but it's more horizontal. Whereas if you look at the NBA's, the NHLs, or major league Baseball, it's a really steep curve. So to answer your question, we are pretty far off from whatever that ideal data is, because if we had it, we would have a steeper draft curve right now. I mean, the major context that you need to account for is the thing that I think every analyst would love to be able to have. I mean, quarterbacks, the most positioned in football, it's probably the most important position in all of sport. It'd be great to be able to, in our analysis of college quarterbacks, to be understand what were they asked to be doing that play? Because when I watch a video of that play, I don't know what they were asked to do. I can look at their eyes to figure out where they think they were asked to do. But not only what were they asked to do, what was the play call? Did they change the play at the line of scrimmage? What did the coach tell them to do? All those types of things are really difficult to account for. And that's just, that's one position, right? And now if I want to be able to have a magic wand and impact that across all the draft, that's a, that's a pretty tough data set to get But I think that's ultimately where and why football has fallen short, maybe of the other sports is that it's hard to account for what players in college were asked to do. And it makes it harder then to transition to guess what they're going to do when they get to the pros.
Liberty Vittert
So would that be fair to say that it's almost a mixture between decision making and what they're told to do? Like that's the sort of data that you want is to understand the player's decision making process versus what they're told to do.
Mike Lopez
Yeah. And really the whole team decision making process, like what was the team asked to do? Because the hard part is, you know, your college quarterback and you get one offense coordinator, you got a new one coming in the next year, they're going to change every single one of your words. Right. Is a new play call. Instead of this formation, is that formation. Instead of this route, it's that route. And so it can be really hard to do that at scale across several teams, across several years because the language is always changing and things like that. So that's why it's such a hard problem is because you don't always know the quarterback's intent, you don't always know the team's intent, and then once you even set out to do it, you're going to have a hard time necessarily even putting that type of data together.
Shao Ming
Well, thank you, Mike, for sharing with us about how data science works at NFL, but also all these lessons and the challenges. And I particularly appreciate your advice to the students that in order to do any thing, you need two things right, Passion and the skill. Now, clearly I cannot do what you do. I do have a passion. I do have skills, but I have a passion for the wrong football. So I'm not even going to play. So but with that, thank you, Mike, and for this really informative and wonderful conversation. Appreciate it.
Mike Lopez
Yeah, thank you so much for having me. It was a lot of fun.
Liberty Vittert
Thank you for listening to this week's episode of the Harvard Data Science Review podcast. To stay updated with all things HDSR, you can visit our website at HDSR, mitpress, mit.edu or follow us on X and instagram @thehdsr. A special thanks to our executive producer Rebecca McLeod and producers Tina, Toby Mack and Arianwin Frank, as well as our assistant producers Gavin Yang and Bell Riley. If you liked this episode, please leave us a review on Spotify, Apple or wherever you get your podcasts. This is been the Harvard Data Science Review. Everything Data science and data science for everyone.
Release Date: September 27, 2024
Host: Liberty Vittert
Co-Host: Shao Ming
Guest: Mike Lopez, Senior Director of Football Data and Analytics at the NFL
As the NFL regular season commences each September, the Harvard Data Science Review Podcast delves into the pivotal role of data science in shaping professional football. Hosted by Liberty Vittert and Shao Ming, this episode features Mike Lopez, the NFL’s Senior Director of Football Data and Analytics. Lopez discusses how data science enhances the game's entertainment value, competitiveness, and safety, while also exploring the evolving landscape of sports analytics with advancements in AI and big data.
Mike Lopez traces the history of data analysis in the NFL back to the coaching strategies of the late 20th century. He explains, “Football coaches have always been using data, right? They’re looking at the opponents, getting a sense of what the tendencies are, identifying potential weaknesses...” (Lopez, 01:39). While early data usage involved manual tracking of player positions and tendencies, technological advancements have exponentially increased data granularity and accessibility.
Since the NFL's partnership with Amazon Web Services in 2017 to launch Next Gen Stats, data collection has become more sophisticated. The integration of GPS chips in players' shoulder pads and optical tracking systems provides real-time, high-resolution data on player movements, speed, and orientation. Lopez highlights, “We could use the chips that are in the player’s pads to figure out where they were aligned. You could take the video and extract how fast a player in college is moving...” (Lopez, 01:39).
The NFL's data ecosystem comprises several layers, including play-by-play data, Next Gen Stats, and scouting reports. Lopez outlines, “The primary data sets are the play-by-play data and the Next Gen Stats from Zebra’s GPS chips...” (Lopez, 03:53). While much of this data remains proprietary, the NFL fosters broader analytics engagement through initiatives like the Big Data Bowl, allowing data scientists worldwide to access and analyze player tracking data via platforms like Kaggle.
Mike Lopez shares specific instances where data analytics have significantly influenced the NFL:
Rule Changes and Foul Analysis (07:54): In 2019, the NFL used win probability models to assess the impact of various fouls. Lopez explains, “We were able to say, here are the most impactful fouls that we call incorrectly...” (Lopez, 07:54). This data-driven approach helped refine officiating standards by highlighting which penalties most affected game outcomes.
Kickoff Play Modification (23:00): Collaborating with health and safety teams, the NFL revised kickoff formations to reduce injury risks. Lopez states, “We reduced speed, reduced space,... leading to fewer collisions...” (Lopez, 23:00). This change not only enhanced player safety but also influenced game dynamics by increasing kickoff returns and slightly shortening game durations.
While data science has driven numerous successes, Lopez acknowledges the inevitability of failures and "oopsy moments":
False Positives and Negatives (10:50): Lopez remarks, “For every 10 graphs we make, one of them sees somebody that’s important...” (Lopez, 10:50). Not all data-driven insights lead to actionable outcomes, and the NFL often navigates these inaccuracies by prioritizing robust, evidence-based findings.
Unintended Consequences (17:00): Changes based on data can lead to unexpected outcomes. For example, promoting a pass-heavy strategy inadvertently encouraged more running plays and faster game tempos, resulting in fewer overall plays per game (Lopez, 17:00).
The interplay between data-driven decisions and human instinct is a recurring theme. Lopez emphasizes the balance required: “We try to listen and be in the hearts and heads of the folks that are making the decisions...” (Lopez, 12:17). While data provides invaluable insights, coaches and players often rely on instinctual judgments that may sometimes counteract statistical evidence.
Addressing the complexities of causal inference in observational sports data, Lopez admits, “Almost every data set we're getting is observational...” (Lopez, 15:31). Without experimental designs like A/B testing, establishing causality remains challenging. This limitation necessitates cautious interpretation of data and an appreciation for potential unmeasured variables that could influence outcomes.
A significant focus of NFL data analytics is player safety, particularly concerning head injuries. Lopez discusses the collaborative efforts behind rule changes aimed at reducing injury risks: “The NFL has a new kickoff play... the injury experts say this is going to be a safer play...” (Lopez, 23:00). By integrating data with biomechanical research, the NFL continuously seeks to create a safer playing environment without compromising the game's integrity.
Lopez highlights the transferable nature of data analytics across different sports. Knowledge and methodologies from baseball, basketball, and hockey inform NFL data strategies. For instance, metrics like player sprint speed and change of direction are universally applicable, enabling cross-disciplinary innovations: “The winning algorithm our first year was code that was copy and pasted from soccer or some version of it...” (Lopez, 20:12).
For students and aspiring data scientists aiming to work in the NFL, Lopez offers crucial advice: “You want to be in the top [right]. If you’re really, really good at coding and you have a little bit of knowledge, you can absolutely help...” (Lopez, 26:29). He underscores the importance of combining robust technical skills with a deep understanding of football to excel in this specialized field.
In a visionary closing, Lopez shares his "magic wand" wish for data science in the NFL: “What is the data you want to collect to see you have all these players... to predict for the draft who's going to be a superstar...” (Lopez, 28:39). He identifies the challenge of accounting for a player’s decision-making and team dynamics, emphasizing the complexity of predicting player success based solely on available data.
This episode of the Harvard Data Science Review Podcast offers a comprehensive exploration of how data science intersects with professional football. Mike Lopez provides invaluable insights into the current applications, challenges, and future directions of data analytics in the NFL. From enhancing game strategies and player safety to navigating the intricacies of causal inference, the conversation underscores the transformative power of data in modern sports.
Notable Quotes:
Mike Lopez (01:39): “Football coaches have always been using data, right? They’re looking at the opponents, getting a sense of what the tendencies are, identifying potential weaknesses...”
Mike Lopez (07:54): “We were able to say, here are the most impactful fouls that we call incorrectly...”
Mike Lopez (10:50): “For every 10 graphs we make, one of them sees somebody that’s important...”
Mike Lopez (15:31): “Almost every data set we're getting is observational...”
Mike Lopez (23:00): “We reduced speed, reduced space,... leading to fewer collisions...”
Mike Lopez (26:29): “You want to be in the top [right]. If you’re really, really good at coding and you have a little bit of knowledge, you can absolutely help...”
Mike Lopez (28:39): “What is the data you want to collect to see you have all these players... to predict for the draft who's going to be a superstar...”
Harvard Data Science Review Podcast is produced by the award-winning Harvard Data Science Review journal. The podcast offers in-depth "case studies" on how data science influences news, policy, and business decisions, featuring expert guests who discuss the nuances of data-driven insights in various fields.
For more episodes and updates, visit the Harvard Data Science Review website or follow them on X and Instagram @thehdsr.