The Most Data-Driven Formula for Success: Formula 1 - Harvard Data Science Review Podcast

Summary6 min read

Harvard Data Science Review Podcast Summary

Episode: The Most Data-Driven Formula for Success: Formula 1
Release Date: March 31, 2025
Host: Liberty Vitter
Guest: Rob Smedley, Renowned F1 Race Engineer and Strategist

1. Introduction to Rob Smedley and Formula One's Data-Driven Nature

The episode opens with Liberty Vitter introducing Rob Smedley, a seasoned Formula One (F1) race engineer and strategist with a rich history at Ferrari, Williams, and the Formula One Group. Currently leading the Smedley Group, Rob emphasizes the pivotal role of data in driving innovation both on and off the track.

Key Quote:
Liberty Vitter [00:01]: "Formula One accelerated into an exciting fusion of speed and data analytics."

2. Rob Smedley's Role in Formula One

Shali Meng engages Rob in discussing his extensive experience within F1, highlighting how integral data science has become in the sport. Rob traces his journey back to the late '90s, noting the exponential growth of data usage in F1 compared to other sports and industries.

Key Quote:
Rob Smedley [01:28]: "Data is centric to everything that we do in Formula One."

3. The Importance of Data in Formula One

Rob elaborates on how data underpins every decision in F1, from strategic race decisions to car design optimizations. He underscores that data-driven approaches make the sport highly objective while simultaneously demanding greater efficiency and effectiveness from teams.

Key Quote:
Rob Smedley [01:28]: "Data is more prevalent and more advanced than other sectors... it's very, very intensely focused on data science."

4. The Origin of the Term "Formula One"

In a lighter exchange, Shali inquires about the origin of the term "Formula One." Rob candidly admits his lack of historical knowledge, adding a touch of humor to the conversation.

Key Quote:
Rob Smedley [03:19]: "I've got no idea why it's called Formula One."

5. Key Performance Metrics in Car Performance

Rob delves into the primary performance metrics crucial for F1 cars: engine power, aerodynamics, and tyre performance. He explains how these elements interact to optimize speed, downforce, and grip, respectively. The discussion highlights the complexity of balancing these factors to achieve peak performance.

Key Quote:
Rob Smedley [04:30]: "Engine power controls principally how fast you go in a straight line... aerodynamics is about generating downforce... tyre performance or grip maximizes the car's adherence to the track."

6. Data-Driven Decision Making and Real-World Examples

Shali probes Rob for specific instances where data-driven insights led to significant changes during his tenure with Ferrari or Williams. Rob emphasizes that data informs virtually every decision in F1, making it challenging to single out individual instances. However, he acknowledges that simulations and aerodynamic optimizations frequently result in impactful design changes.

Key Quote:
Rob Smedley [09:42]: "Every decision that we make... is driven by data."

7. Communicating Data Insights Within the Team

Rob discusses the challenges of conveying complex data insights across different team levels, from engineers to board members. He stresses the importance of tailored communication strategies to ensure that data-driven insights are effectively utilized at all organizational levels.

Key Quote:
Rob Smedley [11:16]: "Data is a much maligned term... the real value from data comes not from the data itself, but from the insight that you drive."

8. Data Sanitization and Quality

Addressing the critical aspect of data quality, Rob explains the processes involved in sanitizing data—cleaning, normalizing, and synchronizing various data sources. He underscores that high-quality, sanitized data is foundational to generating meaningful insights and optimizing performance.

Key Quote:
Rob Smedley [15:09]: "You need to synchronize all of the data because if they're not synchronized... it's very difficult to understand cause and effect."

9. Personalized Data for Driver Performance

Rob highlights how personalized data analyses aid in tailoring car setups to individual drivers' styles and preferences. By understanding unique driving behaviors, teams can optimize both the vehicle and the driver to enhance overall performance.

Key Quote:
Rob Smedley [19:15]: "The team has to optimize for the individuals rather than trying to have this theoretical optimum."

10. Evolution of Data Analytics and AI in Formula One

The conversation shifts to the advancements in data analytics and artificial intelligence (AI) within F1. Rob shares both the successes and setbacks teams have encountered, such as the 2022 regulatory changes introducing new aerodynamic concepts that led to unforeseen challenges like porpoising.

Key Quote:
Rob Smedley [23:16]: "There's definitely... areas where we've had big misses, like the aerodynamic porpoising phenomenon in 2022."

11. Future Opportunities and Innovations in AI and Data Science

Looking ahead, Rob expresses excitement about the potential of AI, particularly neural networks and scientific machine learning, to revolutionize F1. He envisions these technologies enabling the exploration of billions of optimization combinations, far beyond human capability, to enhance car and driver performance synergistically.

Key Quote:
Rob Smedley [26:57]: "Neural networks... the Formula One problem is almost perfect for machine learning or an artificial intelligence problem."

12. Contribution to the Broader Machine Learning Community

Rob discusses how the stringent data quality and advanced data science practices in F1 can benefit the broader machine learning community. The high-fidelity, physics-driven data generated in F1 serves as an exemplary dataset for training and refining machine learning models.

Key Quote:
Rob Smedley [30:06]: "We're helpful... because of the more traditional simulation software that we've generated that does get us to within, you know, half a percent accuracy."

13. The Magical Wand Question: Ideal Driver Model

In the final segment, Rob shares his ideal solution for F1 data analytics—a high-fidelity neural network driver model. Such a model would seamlessly integrate driver behavior with vehicle dynamics, allowing for unparalleled optimization of both elements in tandem.

Key Quote:
Rob Smedley [33:30]: "I would build a neural network of the driver model... to optimize for the combination of both driver and vehicle together."

14. Conclusion and Final Thoughts

Shali and Liberty commend Rob for his insightful, data-centric discussion, emphasizing how F1 serves as a pinnacle of data-driven decision-making. The episode concludes with gratitude towards Rob for sharing his expertise and highlighting the profound role of data in shaping the future of Formula One.

Key Quote:
Shali Meng [34:13]: "Formula one is simply the most data driven formula. It really drives insight, decisions, actions, everything."

Key Takeaways

Data Centrality: In Formula One, data is the lifeblood that informs every decision, from race strategies to car design optimizations.
Performance Metrics: Engine power, aerodynamics, and tyre performance are the cornerstone metrics that teams focus on to enhance car performance.
Data Quality: Sanitizing data—cleaning, normalizing, and synchronizing—is crucial for deriving meaningful insights and making informed decisions.
Personalization: Tailoring car setups to individual drivers' styles through detailed data analysis can significantly boost performance.
AI and Machine Learning: The integration of advanced AI techniques like neural networks holds immense potential for further optimizing both car and driver performance in Formula One.
Broader Impact: The high-quality, physics-driven data from F1 can serve as a valuable resource for advancing machine learning models beyond the realm of motorsports.

This episode provides a comprehensive exploration of the symbiotic relationship between data science and Formula One racing. Rob Smedley's expertise illuminates how meticulous data management and innovative analytics propel F1 to the forefront of technological advancement in sports.

Loading summary

Transcript30 lines

[00:02]
Liberty Vitter
Hello and welcome to the Harvard Data Science Review Podcast. I'm Liberty Vitter, the feature editor of the Harvard Data Science Review, and I'm joined by our editor in chief, Shali Meng. Today we're talking about the world of Formula One racing. Formula One accelerated into an exciting fusion of speed and data analytics. Our guest, Rob Smedley, says that it is the data that drives innovation on and off the track. Rob is a renowned F1 race engineer and strategist with extensive experience at Ferrari, Williams and the Formula One Group. Now he leads the Smedley Group, where he continues to lead advancements and expand the possibilities in motorsport analytics. We'll discuss Rob's journey into this fascinating sport, how data shapes race strategies in driver development and the biggest transformations AI has brought to Formula 1. Whether you're a motorsports fan, data scientist, or simply curious about cutting edge technology, join us to explore how data fuels Formula One's future.
[01:08]
Shali Meng
Well, thank you, Robert, for doing this podcast with us and we really appreciate it. Now, most people in data science or broadly have heard about Formula One, but they probably don't really know what it's like to work for Formula One. What's behind the scene? Could you just tell us a little bit about what you do and what it's like behind the scene?
[01:29]
Rob Smedley
So my role within Formula One and Formula One teams for pretty much since I started was very centric to data. So back in the early days when I started in Formula One, which was in the late 90s, data was not as prevalent or as prolific as it is now. I think if you take today's state in Formula One, then data is probably more prevalent. It's absolutely more prevalent than any other sport, number one. And number two, I think it's more prevalent and more advanced than other sectors as well, not just sporting sectors. So it's very, very intensely focused on data science and using data to leverage and optimize one's position, whether or not that's from an in event strategic position, or whether you're trying to leverage a better optimization of the overall car design. Data is centric to everything that we do in Formula One. In some ways it makes Formula One very, very objective, of course, because it's very data driven, but in other ways it actually increases the requirements upon the teams to be more effective and more efficient with the use of their resources to drive performance. So, like I said, I mean, data is very, very centric to everything that we do in Formula One.
[03:03]
Shali Meng
As a data scientist, I'm so glad to hear that, but I want to Mention that although data scientists do know a lot about formulas, very few of us know why it is called a Formula One. Can you tell us where the name come from?
[03:19]
Rob Smedley
I've got no idea.
[03:23]
Shali Meng
It's interesting.
[03:26]
Rob Smedley
70 odd years old, so it precedes me. I've got no idea why it's called Formula One. Literally no idea.
[03:32]
Shali Meng
All right, so that sounds like deep learning. It's a mystery.
[03:37]
Rob Smedley
Yeah, I know it's called Grand Prix racing because Grand Prix in French is big prize. So it's like big prize racing. But why it's called Formula One and the step below is called Formula two. And I don't know, one of your highly intelligent listeners will know, we should point out. Actually, just as an aside, I'm not a very good Formula One historian. I don't spend a lot of time looking backwards. I'm much more interested in what's happening in front of me.
[04:06]
Liberty Vitter
You know, that brings me right to this question of what are these key performance metrics that you analyze? You've been doing this for such a long time as a race engineer, as a strategist, what are these performance metrics and which ones do you prioritize when evaluating specifically car performance data? Because I know we're going to get to ask you about drivers as well.
[04:30]
Rob Smedley
So I think when you're looking at the key performance metrics for car performance, then you're trying to optimize around a few macroscopic pillars, which are important. So the first one, if you like, is engine power. So if you think about it, there's three major areas where Formula one engineers, technicians, data scientists will concentrate on to try to, to have the most performant package. So like I said, the first one is engine performance, so power, if you like, because that controls principally how fast you go in a straight line. The second one is aerodynamics, so how much downforce the car can generate. So downforce is the force that is pushing you into the ground. So if you think about a Formula one car like an upside down, so an aircraft generates lift, so it generates a lower pressure above the aircraft than below, so the aircraft lifts or rises. The opposite is true of a Formula One car. So you're trying to generate low pressure underneath the car on the lower surface of the wing, so it gets pushed down into the ground. And the reason why you're trying to push it down into the ground is quite simply, if you, if you think about Newtonian physics, very, very simplified. The more normal force you can push down into the ground, the more you can generate lateral forces and Lateral acceleration and longitudinal forces. So you're just trying to force the current to the ground as much as possible, which is downforce, but you're also trying to minimize the aerodynamic effect of drag. So drag is the longitudinal force on the car. So if you imagine when as the car passes down the straight, it's got power that is pushing it down the straight, that's the thrust and the force acting against that is drag. So you're trying to minimize the drag in order to maximize the speed. And the third element which we concentrate on a lot is tyre performance or grip. So if you can maximize the grip of the car through the vehicle dynamics, principally the suspension and the attitude of the tyre with the road, and how you optimize for internal temperatures within the tyre to optimize for grip, this will give you again more performance. So you're trying to optimise around very, very macroscopically. Obviously it becomes significantly more detailed. You're trying to gather as much data from the car as possible that will help you to optimize for principally the aerodynamic forces on the car and the tire performance as well. As a chassis engineer, as a power unit engineer, you're trying to constantly optimize for power. So there are, you know, something like 300 sensors on a Formula One car, which we use to gather data in real time. And then that data is used for two purposes really. One is to optimize within an event. So if you, if you imagine you're not going to make fundamental design changes within an event, you're just going to optimize the vehicle that you've got the vehicle and driver package, but you're also generating all of that data so that long term design and development decisions can be made as well. So we not only use data to drive our decisions for both the immediate and the short term, we also use simulation as well. So Formula one. You know, when I first started, one of the first things that I did actually back in the late 90s was I came across to, to NASA to discuss with some of the chief scientists there to discuss simulation. NASA at that point was very advanced in terms of simulation, much more than Formula One. And then when I came back, I wrote some quasi steady state simulations for the team that I was working for at the time. So it's a desktop simulation. They then became more complex as time went on to include dynamic or transient elements within the simulation. And now you fast forward to today where we have driver in the loop simulators, transient simulation with a driver driving a simulated vehicle to try to give US feedback on performance. So the job of the data scientist in Formula One is to meld, to fuse together both the simulation and the actual data. And you're constantly looking to find excellent correlation between the simulation and the real time data or the data that comes off the car. And if you can do that, obviously then you can then go away and use your simulation tools with high degree of confidence to be able to generate more performance and more optimization and even new design paths as well of your vehicle.
[09:27]
Shali Meng
Wow, that's talking about data driven. This is truly data driven. Speaking of that, can you share an example of when data driven insights led to changes during your time with either Ferrari or Williams?
[09:42]
Rob Smedley
Oh, wow. There's a ton of them. I think that, you know, it's probably very difficult to cite one just because, like I talked about at the start, there's an intensity to the reliance and the utilization of data in Formula one. It's within the lifeblood of everything that we do. Every decision that we make, every new direction that we want to try in terms of car design or in terms of race optimization will all be driven by data. There's very few subjective or purely subjective decisions that are taken in Formula one now, such as the advancement of data science. If I had to go back, you would start with simulating something in the aerodynamic space or simulating something in the tire space, which then led to design changes and positive results in terms of performance. But quite honestly, there's not one single thing that I could cite. We just use data so much to make every single decision that is made about car performance, car optimization that it's difficult to cite just one.
[11:01]
Liberty Vitter
Is there a time, Rob, when it's been hard to communicate necessarily the data to the whole team? I imagine there's different things that different people need to know and different levels of expertise. How do you manage that? How do you communicate between the team?
[11:16]
Rob Smedley
Well, I think you've got to again, have a very good communication structure with your data because, you know, like in my position I would deal at, you know, right down in the weeds looking at detailed problems and trying to work out the physics of, of detailed problems to reporting to the board. So, you know, this is a, there's a broad spectrum of communication strategies that need to be deployed. You know, I can't go and have a conversation with the board at the same level of, of physics and data science as I would with some highly specialized engineers who are working on a problem, and vice versa, in fact. And whatever I work on now, I always think about how we're going to gather the data to be able to make decisions. Because data is a much maligned term in my opinion. Data itself is utterly useless. First of all, there's an assumption here that what we're getting is clean and sanitized and normalized and synchronized data. So that's always a big assumption. And I would say there's many, many companies on the face of the planet that rely on data, but don't have truly sanitized data. But the real value from data comes not from the data itself, but from the insight that you drive. And that's when I talk about data as a communication tool. So how do you generate and fuse together numerous different data sources in order to come up with some elegant insights that help the engineers and the technicians, whoever it is, to drive value and therefore performance in Formula one terms, into the vehicle? So we very much rely on very, very low level data. Tools go into an inordinate amount of detail, down to milliseconds of, you know, how the vehicle is performing. And the data science will be trawling through that data. They will be using simulation to produce synthesized data which is going to help them to understand a problem either directionally or at a great level of detail. And then you can ladder up to, like I said, a higher level bunch of insights, a higher level group of KPIs that will denote and track the accumulated performance throughout a period, whether it's a month or a year or whatever. You know, a year is a long time in Formula one and you know, day by day we're adding performance to the car through the various key performance metrics. So if you're able to track those and you're able to then communicate those across your organization and especially like I said, you know, if you're managing upwards to board level reports or something like that, that is the opposite end of data usage, to drive insight. So insight will either drive somebody who is making a decision for themselves or insight is used to communicate as well. By either or it means that you need a highly sophisticated data setup and data architecture to be able to test and measure, test and measure and then report.
[14:41]
Shali Meng
Well, Robert, as a data scientist, I really can't agree more with what you said about data quality particularly. I've been working on trying to measure data quality myself. Now you mentioned quite a few things about how to clean up a data. You use the words sanitizing. And I'm just curious, what do you mean by that? Because in much of the data science community that means trying to keep the data private, which I don't think is what you mean.
[15:10]
Rob Smedley
No, you assume privacy. So like all the data architecture that we would have in a formula one team, we would assume security and privacy is already highly optimized. When I talk about cleaning up the data to sanitize the data, don't forget like a lot of data that is collected, you know you're collecting it from a non optimal environment. So you're collecting it as the car is going around the track. Some of that data will have a high degree of pollution. Some of that data will need some normalization or some movement, some post processing because the sensor is slightly moved. Then you've got to think about that. You may have 10 different data sources. So if I just give you three very quick data sources, I'm collecting in real time car telemetry data, I'm collecting in real time timing data. So the circuit timing data and I'm collecting in real time weather data. So then I need to be able to synchronize all of those data sets together because if they're not synchronized and I can't look at them with a synchronized view, then it's very difficult to understand cause and effect. So you need to synchronize all of the data. And then finally, like in any kind of efficient data architecture, you need to be able to normalize the data sources as well. So whatever the tools are that I'm going to use to get to the data, you know, I would be normalizing different data packet types, download rates, all of that type of stuff, data types themselves so that a single tool or a particular tool can access like all these various different data sources that when they come in are in various different formats and are not readable across a range of different platforms. So that's when I talk about sanitizing the data. There's already a huge amount of effort that goes into that. I always talk about this within my organizations. That's the least sexy part of data, right? It's the least. I know, I know it's the least value add, but it's the most important part. And I think that there's so many people like now in my businesses, we go and help other businesses set up with data strategies and AI strategies and all of this type of stuff. But usually there's six months, one year, two years work just to build the data architecture before we ever get to being able to get the value added out of it.
[17:50]
Shali Meng
I think Robert, you're absolutely correct. I mean that has been a really big issue. I think across the broad data science community because that part, data cleaning or sanitization sometime is up to like 80, 90% of the effort, but that's one least appreciated. And every time I teach courses of this that I ask the students, whoever have done it, they really appreciate it. The rest of them probably just don't know how hard it is to clean data. And data is really everything. So thank you so much for talking about that. I feel like you really should give lectures to the broad data science community that because there's just so much lesson to be learned. Let me say one more thing here again before I turn to liberty here. You know, there's a lot of talk in data science, particularly with the so called big data, right? We get more and more data. People think about doing more and more what they call the personalized data science. Right now there's a personalized medicine, personalized education in all kinds of things. Now I'm thinking about here for you that got it's incredibly personalized in terms of helping particular driver. For example, we want to ask you, during your years working closely with drivers such as Felipe Masset and Ferrari, how does these data analytics specifically help you to support and develop performance for an individual driver over course, say of a season?
[19:15]
Rob Smedley
Well, I think when you use the term personalized data, I think that's probably the area of a Formula one team where the data is personalized. Because if you think about it, the machine, the vehicle itself is an inanimate object. And you know, in simulation world you can optimize this to the theoretical optimum, if you like, to the quasi static or the dynamic optimum. Whereas when you put the driver in, the driver is an athlete. Like all athletes, they're human beings, they have a different way of trying to optimize themselves for their sport. It's the same in Formula 1 as it is in any other form of athletics. So interestingly, what you find with Formula one drivers is they have a slightly different way of driving the car. So it's not like entirely uniform. So they may, you know, one driver, driver A compared to driver B, might brake a little bit later and brake a little bit harder, but they might then turn in much later to the corner, turn the steering wheel much later to the corner, they might have a different downshift pattern. You know, when they downshift through the gearbox and go from eighth gear to second gear in a hairpin or something, and they're very, very nuanced. It's the same as if you look at any sporting discipline, you know, like in football, no. 2 linebackers will be the same. No two running backs will be the same, no two quarterbacks will be the same. There's a very, very nuanced way in which they play. And therefore the team has to optimise for the individuals rather than, you know, trying to have this theoretical optimum. It's exactly the same in Formula one. So we quite often spend time, you know, the performance engineers who are very close to the particular driver will spend time trying to optimize. Optimize the vehicle for the driver's driving style. I'll give you some examples. Like Fernando Alonso, he likes to turn in to the corner at very, very high speed and very, very early. And in that case, he demands a very, very stable rear end of the vehicle. So he doesn't want to feel the vehicle, which is unstable because it doesn't allow him to treat the vehicle as he wants. If you take Felipe Massa, who you talked about, Felipe Massa would suffer greatly in a Formula one car with very, very low speed understeer. So that's when you turn the steering wheel, but the car doesn't turn as much as you want it to, so you end up washing out of the corner, as it's called. So, you know, the engineers would have to set up the car, not so much for that stable rear that we talked about with Fernando Alonso, but with a very, very grippy front so that the front at low speed would work very well and it would rotate the car and turn it. So we use all sorts of personalized data analysis, simulation setup in order to optimize the car and vehicle. And I think that's one of the really nice things of Formula one is it is a technical endeavor, but it's also a human endeavor and you have to optimize for both. And you need to use data to optimize for both of those things. You can't just optimize for the vehicle, nor can you just optimize for the driver because you will end up in a non optimal position. So using really advanced methods of data science is where we find ourselves to be able to optimize for both positions.
[22:50]
Liberty Vitter
Rob, it makes me think when you talk about using data optimized for both positions, that's tricky, especially when you all are trying new and innovative things all the time. Sometimes there's such a rapid evolution of Data analytics and AI. Have you witnessed those in your career with F1? Either really big changes for good or really big mistakes that have been made with these sort of data analytics?
[23:16]
Rob Smedley
Yeah, I mean, you're always learning Right. And don't forget, we're only a bunch of scientists who are like old scientists most of the time. You're fumbling about in the dark until you discover something. So, you know, I think that there's certainly. In terms of the big misses. Yes. I mean, I think, just to cite quite a recent one, is when we changed the design of the cars in 2022. So that was a regulation change in 2022, where we went to a different aerodynamic concept. So the regulations brought about a different aerodynamic concept, which was all about creating more ground effect, if you like. Ground effect, in very, very basic terms, is when the. The whole vehicle is the shape of a wing and you're trying to generate a very low pressure underneath the vehicle so that it gets pulled down into the ground and that generates that downforce that I talked about before, you know. So all of the teams developed their cars in the wind tunnel and in computational fluid dynamics with this new aerodynamic concept and simulated, you know, lots of different. What they thought at the time was really high fidelity correlation and high fidelity simulations. And basically what happened was when we actually went and run the cars on the track, there was some teams that ended up with an aerodynamic phenomena called porpoising. It's a very simple aerodynamic phenomena where the car was getting sucked down into the ground. And as the car got sucked down into the ground at a certain ride height, that is a certain gap between the road surface and the car floor, which was generated in this suction, it had what was called an aerodynamic stall because the gap wasn't big enough for the flow rate of air that needed to pass through the small gap. So it stalled, it lost all of the downforce, if you like, it lost all of the suction and it popped back up. And then the whole phenomena started over again because now the flow reattached to the bottom of the floor, it started to suck the floor down again. It got to a certain ride height, it stalled the the floor and it popped back up again. And all of this was happening at quite a high frequency. You know, all of this was happening at 1 Hz or something like that. So you can imagine a straight which is 10, 15 seconds long, maybe like, you know, 2, 3 hertz, depending on the speed. It was causing this really violent reaction in the car. And that just showed that, you know, as good as we are with simulation and data generation, we'd completely missed, or as a sport, some teams had completely missed this phenomena and they really suffered with that as well. There was lots of different compromises that they had to make in trying to optimise and trying to generate as much downforce and suction from the underside of the floor under the track. This was a big miss. Other things like going, I'll just stay on floors and go back a few more years. Before that is in 2009, 2010, when, when we introduced what was called the double diffuser. So we found a loophole in the regulations, we simulated it within the wind tunnel, and Formula One came up with a whole new development area. But again, that was all driven by numbers and data and simulation and testing and the constant search for performance.
[26:49]
Shali Meng
Well, Rob, looking ahead, what new opportunities or innovation in AI and the data science excites you most in Formula 1?
[26:57]
Rob Smedley
I think the thing that excites me personally most is the use of artificial intelligence, scientific machine learning, surrogate models, all of that type of stuff. I think that there's a massive opportunity in Formula One. Some of the teams are already starting to look in this direction for what concerns the development, but we're really only scratching the surface. And I think with how fast Formula One can evolve and how fast technology, especially in the artificial intelligence space, is evolving, I think there's lots of opportunities here where we can lean much more heavily into this area of science and data science. Something like neural networks, for me have always been an area of fascination ever since I came into Formula One. If you think about it, the Formula One problem is almost perfect for machine learning or an artificial intelligence problem, because you think there's more than a thousand different parameters that you can change on a Formula One car in order to optimize for performance. And in some cases you've got over 7 billion combinations. It's like a 7 billion combinatorial effect of how you can optimize the car. So as much as we think, you know, as humans, and especially as Formula One engineers, we think we're good, the reality is that we can't do 7 billion combinations in our head to try to find the best performance. So this is a perfect problem for neural networks. This is what machines are good at. Yes, I've been interested in this for literally for like 20 odd years now. But the advancement in neural networks and even in high performance compute as well, was, was never anywhere near enough to be able to solve these problems in the timescales that we have in Formula One events. But I think now we're reaching the point where the models, you know, surrogate models and scientific machine learning is now significantly more advanced and we should be able to get ourselves to a position where this can start to become an area of performance as well. So this is the most exciting thing on the horizon for me right now.
[29:29]
Shali Meng
Well, Robert, I want to mention that when you're talking about data, I just realized, holy cow, you have so much data, number one, you can repeat those data. You have, you have high quality data because of the work you put in. So is there also any angle that the data you guys have can really help to improve these algorithms? Right, because machine learning is mostly about training and having great data is incredibly important. There was a lot of great data. So is there a kind of a reversed angle which is using your kind of data to help the whole machine learning community?
[30:06]
Rob Smedley
Yeah, absolutely. I mean, I've already, you know, I'm in contact with quite a few of the academic institutions about how we use formula one data. You know, we do generate a lot of data, but we're not. You've also got to consider that we're not like the banks, right? We're not like the financial institutions. We're not open 24 hours a day, seven days a week generating data. You know, really, we're a sporting endeavor. And that on track, sporting endeavor only really generates data once every week, you know, for two days a week, for a few hours within those two days or every two weeks, depending on the schedule. But where we do have a huge amount of data is in the development data. So is in the design data, in the aerodynamic data, which is generated both in the wind tunnel and in computational fluid dynamics. We have a lot of synthetic data that we generate through simulation, which is what I was talking about before. So there's a huge amount of data. And I think the opportunity which can be helpful for pushing along developments in the machine learning or the artificial intelligence community, that can definitely be helpful. But I also think that what I'm finding is where we're helpful is how advanced we are in data science in general. This is really helpful for the academic community within machine learning, scientific machine learning, because they kind of, you know, the, the precision that we're trying to generate, we're trying to get down to less than in surrogate models. We need to be down to less than 1% in areas of extrapolation and even, even less than that. You know, once you get to a mature design, mature part of the design cycle, you need to be down to much less than 1% accuracy with your models. So I think not only are we helping in the case of the amount of clean and sanitized data, for that matter, that we have to throw at These problems, but also because of the more traditional tool sets that we have generated, the more traditional simulation software that we've generated that does get us to within, you know, half a percent accuracy or something like that, you know, so 99.5% correlation. If you go to environments outside of Formula One, I would say correlation with traditional simulation and modeling and stuff like that is probably less, much less than what I've just described there at, you know, 0.5% accuracy. So we're helpful to, in a few different ways to the, to the scientific community.
[32:51]
Shali Meng
Well, absolutely. I mean, you know, usually lots of data out there, these correlations are very weak. And also, you know, most of your data, I think it's all really truly science driven, physics driven. Right. So they're really, as you emphasized at the very beginning, truly, you know, objective. And those things are hard to get. So let me get to the last question of each episode, which is the magical wand question. And we ask, if you had a magical one and could instantly solve one data or analytics challenge in Formula one, what would be and why?
[33:30]
Rob Smedley
I would build a neural network of the driver model for everything that I've talked about before. So you're not just trying to optimize for the driver nor the vehicle. You're trying to optimize for the combination of the two. But it's very hard even in a transient simulation to model the driver behavior. It's very hard to model the athlete behavior. So if I could wave a magic wand, I would like a really high fidelity, high correlation neural network driver model, which then I could plug into my overall simulation tool and start to optimize for the combination of both driver and vehicle together.
[34:14]
Shali Meng
Well, Robert, if you can do that, that'd be truly the Formula one, because modeling human behavior is so hard indeed. As I started saying, I know nothing about Formula one. I did not know where the term come from. I think by now I know. I think that Formula one is simply the most data driven formula. It really drives insight, decisions, actions, everything. And it makes you go full speed. And now, before I have enough puns here that I want to thank you again, Robert, for truly. Probably this is the most data driven conversation on this data science podcast, because I think every question is really about the data. And I'm truly grateful for your time. And I think from this point on, I would actually look into more what you guys do, because I think it gives me lots of examples, a lot of insights for my teachers. I think there are somewhere you can have things are. Let me just repeat the phrase data driven. I can't get that out of my mind now. But thank you again, Robert. We really, really appreciate it.
[35:18]
Rob Smedley
You're welcome. Thanks for having me.
[35:24]
Liberty Vitter
Thank you for listening to this month's episode of the Harvard Data Science Review podcast. Before we reach the finish line, let's revisit a question brought up during the episode. The term formula refers to the strict rules governing car design, engines and components, while the number one signifies the sport's status as the premier racing category. Once again, special thanks to our guest, Rob Smeltley, for his valuable insights, and to Devon, our Formula One enthusiast, for assisting with today's questions. To stay updated with all things HDSR, you can visit our website at HDSR, MITPress, MIT.edu, or follow us on X and Instagram at the HDSR.