Harvard Data Science Review Podcast: "If You Want to Be a Data Scientist (or a Player) for the NFL, This Is for You…"
Release Date: September 27, 2024
Host: Liberty Vittert
Co-Host: Shao Ming
Guest: Mike Lopez, Senior Director of Football Data and Analytics at the NFL
Introduction
As the NFL regular season commences each September, the Harvard Data Science Review Podcast delves into the pivotal role of data science in shaping professional football. Hosted by Liberty Vittert and Shao Ming, this episode features Mike Lopez, the NFL’s Senior Director of Football Data and Analytics. Lopez discusses how data science enhances the game's entertainment value, competitiveness, and safety, while also exploring the evolving landscape of sports analytics with advancements in AI and big data.
Evolution of Data Usage in the NFL
Mike Lopez traces the history of data analysis in the NFL back to the coaching strategies of the late 20th century. He explains, “Football coaches have always been using data, right? They’re looking at the opponents, getting a sense of what the tendencies are, identifying potential weaknesses...” (Lopez, 01:39). While early data usage involved manual tracking of player positions and tendencies, technological advancements have exponentially increased data granularity and accessibility.
Modern Data Collection Methods
Since the NFL's partnership with Amazon Web Services in 2017 to launch Next Gen Stats, data collection has become more sophisticated. The integration of GPS chips in players' shoulder pads and optical tracking systems provides real-time, high-resolution data on player movements, speed, and orientation. Lopez highlights, “We could use the chips that are in the player’s pads to figure out where they were aligned. You could take the video and extract how fast a player in college is moving...” (Lopez, 01:39).
Data Sources and Accessibility
The NFL's data ecosystem comprises several layers, including play-by-play data, Next Gen Stats, and scouting reports. Lopez outlines, “The primary data sets are the play-by-play data and the Next Gen Stats from Zebra’s GPS chips...” (Lopez, 03:53). While much of this data remains proprietary, the NFL fosters broader analytics engagement through initiatives like the Big Data Bowl, allowing data scientists worldwide to access and analyze player tracking data via platforms like Kaggle.
Impactful Use Cases of Data Analytics
Mike Lopez shares specific instances where data analytics have significantly influenced the NFL:
-
Rule Changes and Foul Analysis (07:54): In 2019, the NFL used win probability models to assess the impact of various fouls. Lopez explains, “We were able to say, here are the most impactful fouls that we call incorrectly...” (Lopez, 07:54). This data-driven approach helped refine officiating standards by highlighting which penalties most affected game outcomes.
-
Kickoff Play Modification (23:00): Collaborating with health and safety teams, the NFL revised kickoff formations to reduce injury risks. Lopez states, “We reduced speed, reduced space,... leading to fewer collisions...” (Lopez, 23:00). This change not only enhanced player safety but also influenced game dynamics by increasing kickoff returns and slightly shortening game durations.
Challenges and Failures in Data Science
While data science has driven numerous successes, Lopez acknowledges the inevitability of failures and "oopsy moments":
-
False Positives and Negatives (10:50): Lopez remarks, “For every 10 graphs we make, one of them sees somebody that’s important...” (Lopez, 10:50). Not all data-driven insights lead to actionable outcomes, and the NFL often navigates these inaccuracies by prioritizing robust, evidence-based findings.
-
Unintended Consequences (17:00): Changes based on data can lead to unexpected outcomes. For example, promoting a pass-heavy strategy inadvertently encouraged more running plays and faster game tempos, resulting in fewer overall plays per game (Lopez, 17:00).
Data vs. Human Intuition
The interplay between data-driven decisions and human instinct is a recurring theme. Lopez emphasizes the balance required: “We try to listen and be in the hearts and heads of the folks that are making the decisions...” (Lopez, 12:17). While data provides invaluable insights, coaches and players often rely on instinctual judgments that may sometimes counteract statistical evidence.
Causal Inference and Analytical Limitations
Addressing the complexities of causal inference in observational sports data, Lopez admits, “Almost every data set we're getting is observational...” (Lopez, 15:31). Without experimental designs like A/B testing, establishing causality remains challenging. This limitation necessitates cautious interpretation of data and an appreciation for potential unmeasured variables that could influence outcomes.
Enhancing Player Safety through Data
A significant focus of NFL data analytics is player safety, particularly concerning head injuries. Lopez discusses the collaborative efforts behind rule changes aimed at reducing injury risks: “The NFL has a new kickoff play... the injury experts say this is going to be a safer play...” (Lopez, 23:00). By integrating data with biomechanical research, the NFL continuously seeks to create a safer playing environment without compromising the game's integrity.
Cross-Sport Analytics and Data Sharing
Lopez highlights the transferable nature of data analytics across different sports. Knowledge and methodologies from baseball, basketball, and hockey inform NFL data strategies. For instance, metrics like player sprint speed and change of direction are universally applicable, enabling cross-disciplinary innovations: “The winning algorithm our first year was code that was copy and pasted from soccer or some version of it...” (Lopez, 20:12).
Advice for Aspiring Data Scientists in the NFL
For students and aspiring data scientists aiming to work in the NFL, Lopez offers crucial advice: “You want to be in the top [right]. If you’re really, really good at coding and you have a little bit of knowledge, you can absolutely help...” (Lopez, 26:29). He underscores the importance of combining robust technical skills with a deep understanding of football to excel in this specialized field.
Final Thoughts: The Quest for Predictive Data
In a visionary closing, Lopez shares his "magic wand" wish for data science in the NFL: “What is the data you want to collect to see you have all these players... to predict for the draft who's going to be a superstar...” (Lopez, 28:39). He identifies the challenge of accounting for a player’s decision-making and team dynamics, emphasizing the complexity of predicting player success based solely on available data.
Conclusion
This episode of the Harvard Data Science Review Podcast offers a comprehensive exploration of how data science intersects with professional football. Mike Lopez provides invaluable insights into the current applications, challenges, and future directions of data analytics in the NFL. From enhancing game strategies and player safety to navigating the intricacies of causal inference, the conversation underscores the transformative power of data in modern sports.
Notable Quotes:
-
Mike Lopez (01:39): “Football coaches have always been using data, right? They’re looking at the opponents, getting a sense of what the tendencies are, identifying potential weaknesses...”
-
Mike Lopez (07:54): “We were able to say, here are the most impactful fouls that we call incorrectly...”
-
Mike Lopez (10:50): “For every 10 graphs we make, one of them sees somebody that’s important...”
-
Mike Lopez (15:31): “Almost every data set we're getting is observational...”
-
Mike Lopez (23:00): “We reduced speed, reduced space,... leading to fewer collisions...”
-
Mike Lopez (26:29): “You want to be in the top [right]. If you’re really, really good at coding and you have a little bit of knowledge, you can absolutely help...”
-
Mike Lopez (28:39): “What is the data you want to collect to see you have all these players... to predict for the draft who's going to be a superstar...”
About the Podcast
Harvard Data Science Review Podcast is produced by the award-winning Harvard Data Science Review journal. The podcast offers in-depth "case studies" on how data science influences news, policy, and business decisions, featuring expert guests who discuss the nuances of data-driven insights in various fields.
For more episodes and updates, visit the Harvard Data Science Review website or follow them on X and Instagram @thehdsr.
