Loading summary
Kurt Nickish
Strategic growth isn't just about where you're going, it's about where you build. Global business leaders are choosing Ohio for its Pro Business Climate, rapid Innovation and tailored incentive packages. With Jobs Ohio, you'll find a partner that moves on your timeline, helping you scale with confidence. Make your smartest move yet. Get started@jobsohio.com before we begin, we have a couple of questions. What do you love about HBR OnStrategy? What do you want less of? What would make HBR on strategy even better? Tell us. Head over to hbr.org podcastsurvey to share your thoughts. We want to make the show even better, but we need your help to do that. So head to hbr.org podcastsurvey thank you. Welcome to HBR on Strategy. Case studies and conversations with the world's top business and management experts, hand selected to help you unlock new ways of doing business. How did it go? The last time you started an artificial intelligence project at your company, chances are some of your colleagues expressed confusion or apprehension and they never engaged with what you built. Or maybe the whole initiative went sideways after launch because the AI didn't work the way you thought it would. If any of that sounds familiar, you're not alone. Harvard Business School assistant professor and former data scientist Yavor Bozhinov says around 80% of AI projects fail. He talked with host Kurt Nickish on HBR IdeaCast in 2023 about why that is and and the best practices leaders should follow to ensure their projects stay on track.
Yavor Bozhinov
I want to start with that failure rate. You would think that with all the excitement around AI, there's so much motivation to succeed. Somehow, though, the failure rate is much higher than past IT projects. Why is that? What's different here?
I think it begins with the fundamental difference that AI projects are not deterministic like IT projects, right? With an IT project, you know pretty much the end state, and you know that you know if you run it once, twice, it will always give you the same answer. And that's not true with AI. So you have all of the challenges that you have with IT projects, but you have this random, this probabilistic nature, which makes things easier even harder. With algorithms, the predictions, you may give it the same input. So think something like chatgpt me and you can write the exact same prompt and it would actually give us two different answers. So this adds this layer of complexity and this uncertainty. And it also means that when you start a project, you don't actually know how good it's going to be. So when you look at that 80% failure rate, there's a number of reasons why these projects fail. Maybe they fail in the beginning, where you just pick a project that is never going to add any value, so it just fizzles out. But you could actually go ahead and you could build this. You could spend months getting the right data, building the algorithms, and then the accuracy could be extremely low. So, for example, if you're trying to pick which of your customers are going to leave you so you can contact them, maybe the algorithm you build is really not able to find people who are going to leave your part at a good enough rate. So that's another reason why these projects could fail. Or for another algorithm, it could do a really good job, but then it could be unfair and it could have some sort of biases. So the number of failure points is just so much greater when it comes to AI compared to traditional IT projects.
And I suppose there's also that possibility where you have a very successful product, but if the users don't trust it, they just don't use it. And that defeats the whole purpose.
Yeah, exactly. And I mean, this is exactly. Actually, one of the things that motivated me to leave LinkedIn and join HBS was the fact that I built this, what I thought was a really nice AI product for doing some really complicated data analysis with. You know, essentially when we tested it, it. It cut down analysis time that used to take weeks into maybe a day or two days. And then when we launched it, we had this really nice launch event. It was really exciting. There were all these announcements, and a week or two after it, no one was using it, even though it would.
Save them a lot of time, massive amounts of time.
And we tried to communicate that, and people still weren't using it. And it just came back to trust people, didn't trust the product we had built. So this is one of those things that's really interesting, which is if you build it, they will not come. And this is a story that I've heard, not just on LinkedIn in my own experience, but time and time again. And I've written several cases with large companies where one of the big challenges is they build this amazing AI, they show it's doing a really, really good job, and then no one uses it. So it's not really transforming the organization, it's not really adding any value. If anything, it's just frustrating. Maybe there's this new tool that now they have to find a way to avoid using and find reasons why they don't want to use It.
So through some of those painful experiences, yourself in practice, through some of the consulting work you do, through the research you do now, you have some ideas about how to get a project to succeed. The first step seems obvious, but is really important, it seems. Selecting the right thing, selecting the right project or use case. Where do people go wrong with that?
Okay. They go wrong in so many different places. It sounds like a really obvious no brainer. Right. Every manager, every leader is consistently prioritizing projects, they're consistently sequencing projects. But when it comes to AI, there's a couple of unique aspects that need to be considered.
Yeah. In the article you called them idiosyncrasies, which is not something business leaders like to hear.
Exactly. But I think as we sort of transition into this more AI driven world, these will become the standard things that people consider. And what I do in the article is I break them down into feasibility and impact. And I always encourage people to start with the impact first. Everyone will say this is a no brainer, is really this piece of strategic alignment. And you might be thinking, okay, that's, that's straightforward, I know what my company wants to do. But typically when it comes to AI projects, it's the data science team that's actually picking what to work on. And in my experience, data scientists don't always understand the business, they don't understand the strategy, and they just want to use sort of the latest and best technology. So very often there's this misalignment between the most impactful projects for the business and a project that the data scientist just wants to do because it lets them use the latest and best technology. The reality is with most AI projects, you don't need to be using the latest and the cutting edge. That's not necessarily where the value is for most organizations, especially for ones that are just starting their AI journey. The second portion of it is really the feasibility. And of course you have things like do we have the data, do we have the infrastructure? But the one other piece that I want to call out here is what are the ethical implications? So there's this whole area of responsible AI and ethical AI, which again, you don't really have with IT projects right here you have to think about privacy, you have to think about fairness, you have to think about transparency. And these are things you have to consider before you started the project. Because if you try to do it halfway through the build and try to do it as a bolt on, the reality is it will be really costly and it could almost require you just restarting the Whole thing, which greatly increases the costs and frustration of everyone involved.
So the easy way ahead is to tackle the hard stuff first. It gets back to the trust that's necessary.
Right, exactly. And you should have thought about trust at the beginning and all the way through, because in reality there's several different layers to trust. You have trust in the algorithm itself, which is is it free from bias, is it fair, is it transparent? And that's really, really important. But in some sense, what's more important is do I trust the developers, right? The people who actually build the algorithm? If I'm a Nintendo user, I want to know that this algorithm was designed to work for me, to solve the problems that I care about. And in some sense that the people designing the algorithm actually listen to me. That's why it's really important. When you're beginning, you need to know who is going to be your intended user, so you can bring them in the loop.
Who is the you in this situation if you need to know who the users are? Is this the leader of the company? Is this the person leading the developer team? Where's the direction coming from here?
So there's basically two types of AI projects. You have external facing projects where the AI is going to be deployed to your customers. So think like the Netflix ranking algorithm, right? That's not really for the Netflix employees, it's for their customers or Google's ranking algorithm or ChatGPT, right? These things are deployed to their customers. So those are external facing projects. Internal facing project, on the other hand, are deployed to the employees. So the intended users are the company's employees. So for example, this would be like a sales prioritization tool that basically tells you, okay, call this person instead of this person. Or it could be an internal chatbot to help your customer support team. Those are all internal facing products. So the first step is to really just figure out who is the intended audience, who is going to be the customer of this, is it going to be the employees or, or is it going to be your actual customers? So very often for most organizations, internal facing projects are called sort of data science and they fall under the purview of a data science team. Whereas external facing projects tend to fall under the purview of an AI or a machine learning team. Once you sort of figure out this is going to be internal or external, you know who's going to be building this and very often you know the amount of interaction you can have with the intended customers. Because if it's your internal employees, you know, you probably want to Bring those people in the room as much as possible, even at the beginning, even at the inception, to make sure you're solving the right problem. It's really designed to help them do their job. Whereas with your customers, of course, you're going to have focus groups to figure out if this really is the right thing, but you're probably going to rely more on experimentation to tweak that and make sure your customers are really benefiting from this product.
One place where difficulty arises for big companies is this tension between speed and effectiveness. Right. They want to experiment quickly, they want to fail faster and get to successes sooner, but they also want to be careful about ethics. They're very careful about their brand. They want to be able to use the tech in the most helpful places for their business. What's your recommendation for companies that are kind of struggling between being nimble and being most effective?
The reality is you need to keep trying different things so that you can improve the algorithm. So, for example, in one study that I did with LinkedIn, we basically showed that when you leverage experimentation, you can improve your final product by about 20% when it comes to key business indicators. So that notion of we tried something, we use that to learn, and we incorporated the learnings can have substantial boosts on the final product that's actually delivered. So really, for me, it's about figuring out what is the infrastructure you need to be able to do that type of experimentation really, really rapidly, but also figuring out how can you do that in a really safe way. One way of doing that in a safe way is basically having people sort of opt into these more experimental versions of whatever it is you're offering. So a lot of companies have ways of you signing up to be like an alpha tester or a beta tester, and then you sort of get the latest versions, but you realize that maybe it'll be a little bit buggy, it's not going to be the best thing, but maybe you're a big fan, and that doesn't really matter. You just want to try the new thing. So that's one thing you can do, is sort of create a pool of people who you can experiment on. You can try new things without really risking that brand image.
So once this experiment is up and running, how do you recognize when it's failing or when it's subpar, when you've learned things, when it's time to change course with so many variables, it sounds like a lot of judgment calls as you're going along.
Yeah. The thing I always advocate here is to really think about the hypothesis you're testing in your study. There's a really nice example, and this is from Etsy.
And Etsy is an online marketplace for a lot of independent or small creators.
Exactly. So a few years back, folks at Etsy had this idea that maybe they should build this Infinite scroll feature. Basically think of your Instagram feed or Facebook feed where you can keep scrolling and it's just going to load just new things, it's going to keep loading things, you're never going to have to click next page. And what they did was they spent a lot of time because that actually required kind of rearchitecting the user interface. And it took them a few months to work this out. They built Infinite scroll, then they started running the experiment and they saw that there was no effect. And then the question was, well, what did they learn from this? It cost them, let's say, six months to build this. If you look at this, this is actually two hypotheses that are being tested at the same time. The first hypothesis is what if I showed more answers on the same page, If I sort of showed more products on the same page and maybe instead of showing you 20, I showed you 50, then you might be more likely to buy things. That's the first hypothesis. The second hypothesis that this is also testing is what if I was able to show you the results quicker, right? Because why do I not like multiple pages? Well, it's because I have to click next page and it takes a few seconds for that second page to load. At a high level, those are sort of the two hypotheses. Now there actually was a much easier way to test this hypothesis. They could have just displayed, Instead of having 20 results on one page, they could have had 50 results. And they could have done that in, I don't know, like a minute. Because this is just a parameter. So that required no extra engineering. Showing your results quicker hypothesis, that's a little bit trickier because it's hard to speed up a website. But you could do the reverse, which is you could just slow things down artificially where you just make things load a little bit slower. So those are sort of two hypotheses that you could, if you understood those two hypotheses, you would know whether or not you would need to do this Infinite scroll and whether it was worth making that investment. So what they did in the follow up study is they basically ran those two experiments and they basically showed that there was very little effect of showing 20 versus 50 results on the page. And then the other thing, which was actually counterintuitive to what most other companies have seen, but because of the description you gave, actually makes sense, is that adding a small delay doesn't make a huge deal to Etsy because Etsy is a bunch of independent producers of unique products. So it's not that surprising if you have to wait a second or two seconds to see the results. So sort of the high level thing is whenever you're running these experiments and developing these AI products, you want to think about not just about sort of the minimum viable product, but really what are the hypotheses that underlying the success of this? And are you effectively testing those that.
Gets us into evaluation. That's an example of where it didn't work and you found out why. How do you know that it is working or working well enough?
Yeah, absolutely. I think it's worth answering first the question of why do evaluation in the first place? Right. You've developed this algorithm, you've tested it and you've shown it has good predictive accuracy. Why do you still need to evaluate it on real people? Well, the answer is most products have either a neutral or a negative impact on the very same metrics that were designed to improve. And this is very consistent across many organizations. And there's a number of reasons why this is true for AI products. The first one is AI doesn't live in isolation. It lives usually in the whole ecosystem. So when you make a change or you deploy a new AI algorithm, it can interact with everything else that the company does. So for example, let's say you have a new recommendation system. That recommendation system could move your customers away from say, high value activities to low value activities for you, whilst increasing, say, engagement. And here you basically realize that there are all these different trade offs, so you don't really know what's going to happen until you deploy this algorithm.
So after you've evaluated this, what do you need to pay attention to when this product or these services are adopted, whether they're externally facing or internal to the organization, what do you need to be paying attention to?
Once you've successfully shown in your evaluation that this product does add enough value for it to be widely deployed and you've got people actually using the product, then you sort of move to that final management stage which is all about monitoring and improving the algorithm. And in addition to monitoring and improving, that's why you need to actually audit these algorithms and check for unintended consequences.
Yeah. So what's an example of an audit? An audit can sound scary.
Yeah. Audits can absolutely sound scary. And I think firms are very scared of their audits. Right. But they all have to do it. And you sort of need this independent body to come look at it. And that's essentially what we did with LinkedIn. So there is this. One of the most important algorithms at LinkedIn is this people you may know algorithm, which basically recommends which people you should connect with. And what that algorithm is trying to do is it's trying to increase the probability or the likelihood that if I show you this person as a potential connection, you invite them to connect and they will accept that. So that's all that algorithm is trying to do. So the metric, right. The way you measure the success of this algorithm is by basically counting or looking at the ratio of the number of people that were people invited to connect and what percentage of those actually.
Accepted some sort of conversion metric there.
Exactly. And you want that number to be as high as possible. Now what we showed, which is really interesting and very surprising in this study that was published in Science and I have a number of co authors on it, is that a year down the line this was actually impacting what jobs people were getting. And in the short term it was also impacting sort of how many jobs people were applying to, which is really interesting because that's not what this algorithm was designed to do. That's an unintended consequence. And if you sort of scratch at this, you can figure out why this is happening. There's this whole theory of weak ties that comes from this person called Granoveta. And what this theory says is that the people who are most useful for getting new jobs are arm's length connections. So people who maybe are in the same industry as you and maybe they're say five, six years ahead of you in a different company. So people you don't know very well, but you have something in common with them. So this is exactly what was happening, is some of these algorithms, they were increasing the proportion of weak ties that a person was suggested that they should connect with. They were seeing more information, they were applying to more jobs and they were getting more jobs.
Makes sense. Still kind of amazing.
Exactly. And this is what I mean by these ecosystems. It's like you're doing something to try to get people to connect to more people, but at the same time you're having this long term knock on effect on how many jobs people are applying to and how many jobs people are getting. This is just one example. In one company, if you scale this up and you just think about how we live in this really interconnected world. It's not like algorithms live in isolation. They have these types of knock on effects and most people are not really studying them. They're not looking at these long term effects. And I think it was great example that LinkedIn opened the door. They were transparent about this, they let us publish this research and then they actually changed their internal practices where in addition to looking at those sort of short term metrics about who's connecting whom, how many people accepting, they started to look at those more long term effects on the whole sort of how many jobs people are applying to, et cetera. And I think that's sort of testimony to how powerful these types of audits can be because they just give you a better sense of how your organization works.
A lot of what you've outlined and of course the article is very detailed for each of these steps, but a lot of what you have outlined is just how, I don't know, cyclical almost. This process is. It's almost like you get to the end and you're starting over again because you're reassessing and then potentially seeing new opportunities for new tweaks or new products. So to underscore all this, what's the main takeaway then for leaders?
I think the main takeaway is to realize that AI projects are much harder than pretty much any other project that a company does. But also the payoff and the value that this could add is tremendous. So it's worth investing the time to work on these projects. It's not all hopeless. And realizing that there's sort of multiple stages and putting in infrastructure around how to navigate each of those stages can really reduce the likelihood of failure and really make it so that whatever project you're working on turns into a product that gets adopted and actually adds tremendous value.
Yavor, thanks so much for coming on the show to talk about these insights.
Thank you so much for having me.
Kurt Nickish
That was HBS Assistant Professor Yavor Bojanov in conversation with Kurt nikish on HBR IdeaCast. Bozhinov is the author of the HBR article Keep youp AI Projects On Track. We'll be back next Wednesday with another handpicked conversation about business strategy from Harvard Business Review. If you found this episode helpful, share it with your friends and colleagues and follow our show on Apple Podcasts, Spotify, or wherever you get your podcasts. While you're there, be sure to leave us a review. And when you're ready for more podcasts, articles, case studies, books and videos with the world's top business and management experts. You'll find it all@hbr.org this episode was produced by Mary Dew and me, Hannah Bates. Kurt Nickish is our editor. And special thanks to Ian Fox, Maureen Hoch, Erica Troxler, Ramsey Gabaz, Nicole Smith, Ann Bartholomew and you, our listener. See you next week.
Release Date: May 7, 2025
Host: Kurt Nickish, Harvard Business Review
Guest: Yavor Bozhinov, Assistant Professor at Harvard Business School and Former Data Scientist
Description: In this episode, Yavor Bozhinov delves into the complexities of launching successful AI initiatives within organizations. Drawing from his extensive experience and research, Bozhinov provides actionable insights to help leaders navigate the high failure rates of AI projects and ensure their AI strategies deliver tangible value.
Yavor Bozhinov opens the discussion by addressing a startling statistic: approximately 80% of AI projects fail (01:52).
Bozhinov: "I think it begins with the fundamental difference that AI projects are not deterministic like IT projects... This adds this layer of complexity and this uncertainty." (02:09)
Unlike traditional IT projects, which have predictable outcomes, AI projects involve probabilistic algorithms that can yield varying results even with identical inputs. This inherent unpredictability significantly contributes to the higher failure rates.
Bozhinov identifies several factors that lead to the downfall of AI initiatives:
Project Selection: Choosing projects that lack potential value or relevance can lead to early failures. Without strategic alignment, AI projects may fizzle out despite substantial investments.
Data and Algorithmic Accuracy: Building algorithms with insufficient data or low accuracy undermines their effectiveness. For instance, an AI designed to predict customer churn may fail if it cannot accurately identify at-risk customers.
Bias and Fairness: Even successful algorithms can falter if they embed biases, leading to unfair outcomes that erode trust among users.
Lack of User Trust: As Bozhinov recounts his experience with an AI tool at LinkedIn, even highly efficient products can suffer from low adoption rates if users do not trust them.
Bozhinov: "If you build it, they will not come." (04:11)
This highlights the critical importance of not only building effective AI but also ensuring it gains user trust and acceptance.
The selection phase is crucial and often mishandled. Bozhinov emphasizes the need for aligning AI projects with business impact rather than the allure of cutting-edge technology.
Bozhinov: "I always encourage people to start with the impact first... data scientists don't always understand the business, they don't understand the strategy." (06:02)
Key considerations include:
Strategic Alignment: Ensuring the AI initiative aligns with the organization's broader strategic goals.
Impact vs. Technology: Prioritizing projects based on their potential business impact rather than the novelty of the technology involved.
Once a project is selected based on impact, assessing its feasibility involves evaluating data availability, infrastructure, and ethical considerations.
Bozhinov: "You have to think about privacy, you have to think about fairness, you have to think about transparency." (06:23)
Ethical AI practices are paramount. Addressing these factors from the outset prevents costly adjustments later and fosters trust among stakeholders.
Trust is multifaceted, encompassing both the algorithm's reliability and the users' confidence in its developers.
Bozhinov: "Do I trust the developers... that the people designing the algorithm actually listen to me." (08:37)
Understanding the intended users—whether internal employees or external customers—is essential. Internal projects benefit from close collaboration with employee users, while external projects may rely more on customer feedback and experimentation.
Organizations often grapple with the tension between rapid experimentation and ensuring ethical, effective outcomes.
Bozhinov: "It's about figuring out what is the infrastructure you need to be able to do that type of experimentation really, really rapidly, but also figuring out how can you do that in a really safe way." (12:08)
Strategies include:
Rapid Experimentation: Implementing infrastructure that supports quick iterations and learning.
Safe Testing Environments: Utilizing alpha or beta testers to trial new features without jeopardizing the brand.
Evaluation is critical to determine whether an AI initiative meets its intended goals. Bozhinov uses Etsy’s infinite scroll feature as a case study to illustrate common pitfalls in hypothesis testing.
Bozhinov: "Whenever you're running these experiments and developing these AI products, you want to think about not just about the minimum viable product, but really what are the hypotheses that underlie the success of this?" (17:15)
Key lessons include:
Clear Hypotheses: Defining specific hypotheses to test ensures that experiments are purposeful and insights are actionable.
Efficient Testing: Leveraging simpler, parameter-based experiments can validate hypotheses without extensive resource expenditure.
Post-deployment, ongoing monitoring and auditing are essential to identify and address unintended effects of AI systems.
Bozhinov: "Audit... give you a better sense of how your organization works." (19:33)
Using LinkedIn’s "People You May Know" algorithm as an example, Bozhinov explains how audits revealed long-term impacts on job applications and placements—effects that were not initially anticipated.
Bozhinov: "It's not like algorithms live in isolation. They have these types of knock-on effects..." (22:06)
Audits help organizations understand the broader ecosystem in which their AI operates, ensuring sustained value and ethical integrity.
Concluding the discussion, Bozhinov offers strategic insights for leaders embarking on AI initiatives:
Bozhinov: "AI projects are much harder than pretty much any other project that a company does. But also the payoff and the value that this could add is tremendous." (23:41)
Key takeaways include:
Investment in Infrastructure: Building robust infrastructure supports effective experimentation and iteration.
Strategic Planning: Thorough planning across multiple stages reduces failure likelihood and enhances project success.
Continuous Improvement: Viewing AI initiatives as cyclical processes allows organizations to adapt and evolve, maximizing long-term benefits.
Yavor Bozhinov’s insights underscore the intricate challenges of launching AI initiatives while highlighting the substantial rewards they offer. By meticulously selecting projects, emphasizing ethical considerations, fostering user trust, and implementing rigorous evaluation and auditing processes, organizations can navigate the complexities of AI strategy to achieve sustainable success.
For more insights and strategies on business and management, visit hbr.org.