
Loading summary
Narrator
Welcome to Humanitarian Frontiers in AI, the podcast series where innovation meets impact. In each episode, we dive deep into how artificial intelligence is reshaping the future of humanitarian work. From enhancing crisis response to making aid delivery smarter and more effective, AI is opening new doors in the way we support communities in need. In this series, hosts Chris Hoffman and Nassim Motelabi brief you thought leaders from academia and the tech industry to discuss not only the vast opportunities AI offers, but also the ethical considerations and risks we all must navigate. Join them on this journey as they explore AI's potential to transform lives and address humanity's most pressing challenges.
Chris Hoffman
Hi and welcome back everyone. It's Chris Hoffman and Nassim Mutalebi. We're here with Humanitarian Frontiers in AI, a podcast that's brought to you by Innovation Norway through their generous funding to really help get the word out on what AI means for humanitarian action around the world. We're a 10 episode podcast, a short form podcast that be coming out kind of once a week for the next few months and today we've got a great group of panelists for you. We've got Jeffrey Wag and Matt Harris that are going to be here. I would call them doctors but we can just say that their names all end with PhD. So we've got three doctors and a kiss today on the podcast. Nassim, welcome back. It's nice to have you. I hope your trips have been well. I know you've been super busy.
Nassim Motelabi
Hi Chris, I'm super happy to be back and kickstarting a more technical discussion around AI with two brilliant data scientists.
Chris Hoffman
Yeah, it's going to be exciting, it really is. We had such a great kind of build up to this just talking off mic and I'm super excited to hear from these guys and learn more about the data piece. Right. This really is about data and what data means and what data is within the AI space and what we need to know as sometimes non technologists when we are having to make decisions and to really demystify justify this data conversation. Because data is probably the biggest word that's used either before another word like protection or before something like use. Data is always in the conversation when we talk about AI and it can be misconstrued or misunderstood and we want to try to help everybody out there that's in the humanitarian space understand it better. So Nassim, over to you. I'll let you kick us off and we'll get to meet everybody.
Nassim Motelabi
Thanks Chris. So Jeff, Matt, I wanted to just have you introduce yourselves, tell us a Little bit about your background. I know both of you have extensive experience working with humanitarian organizations and maybe tell us a little bit about your experience working with humanitarian data and some of the use cases you've been working with recently and generally what excites you about AI or not. So over to you, maybe Jeff, and then we can go to Matt.
Jeffrey Wag
Sure. Hi, Naseem, Chris, Matt, it's great to be here today. It's really a pleasure to be on the show. So I've been working most recently with CNRS in France. I've also been working as an AI consultant for the international NGO Relief International. They do work in some of the world's most fragile settings to deliver humanitarian aid to vulnerable populations, along with healthcare, livelihoods and education. And I think we're all aware that AI is really just beginning to have a major impact in the humanitarian sector and I hope we'll have a chance to dive into that a little more today.
Nassim Motelabi
Great. Matt, over to you.
Matthew Harris
Thank you very much for having me on the podcast. It's a real pleasure to meet you all. My name is Matthew Harris. I come from a background of physics and the commercial world and I worked as a software developer, project manager and product manager before transitioning into managing data science and AI practices. Most recently I was head of Data Science at Datakind. So I was working with lots of humanitarian organizations to help organizations leverage data and AI in a safe and equitable way for developing tools in data quality as well as data analysis. More recently in Generative AI. I'm currently on Nassim's team working at the Footwear Food program, which has been amazing, working with the Office of Evaluation on Gen AI knowledge retrieval products. And there it's more of how do we scale them for enterprise grade implementation. I'm also working as Director of AI and Innovation at the Program Integrity Alliance.
Chris Hoffman
That's amazing. You know, you both bring such a wealth of knowledge and experience. Well, all three of you do, to be fair in this space. And there are two separate conversations that we've been having in all the podcasts. Right. Which is kind of, Matt, what you've started to allude to, which is the internal stuff, what do we do with all the stuff that we already have that we understand, the reports that have been written in the organization, how do we consolidate things? And then on the other side we have what about that people facing side of AI and how is that going to work? And how are those two things going to work together? And so today, trying to keep it as understandable as possible for the Great population of listeners that we have. Maybe the first question would be, firstly, what is data when it comes to an organization? And I know that that sounds like a simple question, but it's really complex because organizations don't have all their data in one place, so it can be one somewhere and something else somewhere else and where it's stored, how it's stored, in what format, and all those things. So when we just start off from the bare bones of this, and maybe Matt will start with you. What is data? When you're speaking to the wfp, right, or you're speaking to a federal agency, and that first question is, okay, we want AI to do this. Okay, where's your data? And what do you mean by that when you say, where's your data? What are you saying to them? And what do they need to hear from you?
Matthew Harris
So most recently, the products that I've been working on have been related to knowledge retrieval. So we all know that generative AI has patterns such as rag, which seem to be really powerful on the surface of things, but require a lot of curation and making sure that they work accurately and safely. So in terms of the data that have been involved with recently, it's a lot of unstructured data in the form of documents, as you mentioned, but also combined with structured data. So, for example, at the Program Integrity alliance, we have databases of just tables as well as the PDF documents. And there the challenge has been to combine those into products that satisfy the business question and the same at wfp. Key to both is a rigorous data management strategy to ensure that you have governance in place and not to let it become too chaotic. I know these are really obvious things. So in talking to those organizations, the data has been those forms and how we ingest it is where a lot of the work is occurring. Because for AI, if you don't have clean, solid data and a good architecture around it, the AI is not very good, isn't it? It's rubbish in, rubbish out.
Chris Hoffman
Absolutely. Jeff.
Jeffrey Wag
Yeah, I think what Matt is saying is absolutely spot on. One of the biggest challenges AI models and the people who are developing AI models face is incomplete data sets. So data sets that we deal with, especially in the humanit sector, are very heterogeneous. So often we think of data sets as being Excel spreadsheets, numerical data, but of course, data could be medical records, and these medical records might not be in PDF format, they might be handwritten. One of the projects which we have just begun at Relief International is the digitization of medical records and the Conversion of that into machine readable form. And that's something that is a challenge. But if you don't have that data in a complete format, then you can't do anything constructive with it or it's harder to do something constructive with it.
Chris Hoffman
Yeah. Nassim, when we were talking to Lindsay a couple episodes ago from Develop Metrics, right. And she was talking about how she was having to go to organizations and, and try to get them to structure their data. And she mentioned, you know, every organization says we've got data, but they don't really grasp it. And, and this is the starting point really makes me start to think. And I don't know how you're feeling at wfp, Nassim, because I mean, I think they've been going through the data journey a lot longer than a lot of other people. But it feels like organizations still have about two years to go before they can even should start playing around. Or is that too far? Is that too much of a long tail? Should they be able to start playing now? You know, where, where are organizations out? I mean, to see maybe some WFP example of what you've seen there. And for you guys as well, from Relief International and Datakind, this is a.
Nassim Motelabi
Very interesting conversation to me because data collection, data management, these are resource intensive. And when we speak to humanitarian organizations in different scales and capacities, we see that their data management services and capacities are very different and they're varied. Right. And I'm lucky to be in an organization like WFP, which is one of the biggest UN organizations and has been investing in these data management services. And we have a huge capacity around technical staff as well, where they can actually contribute their knowledge from industry and the private sector and those areas that have been working on data driven decision making and evidence generation for a long time and bringing it into organizations such as wfp. We also see unhcr, undp, unicef. I'm not going to name them all, but these are big organizations in the UN that have invested tremendous resources into their data management services. But that said, even to this date, we realize that we still are struggling when it comes to qualitative data management. With the rise of LLMs, we see a new wave of opportunity that we have been blindsided towards. Right? And then we need to start thinking about cleaning our data and being able to utilize the knowledge that our organization has. And Matt working in my team now, we are facing some of these challenges, right. When it comes to using the qualitative data and reports that many of our divisions work in wfp. With. But really, I think one of the interesting elements to me is actually what comes first. Is it the data or is it the algorithm? When we're talking about AI, sometimes I think we now recognize that there are certain AI algorithms out there and models that require certain data, and then we start shaping the data around it. But I wonder from Matt and Jeff, how often can we actually use the data that we have and make the best of it, and what are the opportunities around that? Or do you think that we need to always build the data capacity to utilize the AI models and algorithms that are out there? Maybe. Jeff, over to you, and I'm curious to hear your thoughts.
Jeffrey Wag
Yeah, sure. I think this is a great question. So I think that after data protection, which I imagine we'll come back to a little bit later, one of our toughest challenges when trying to use predictive analytics and AI tools is the incompleteness of data sets. And as you mentioned earlier on, very often organizations believe that they have great data sets, but in fact these data sets might be incomplete. And so one important way to overcome this is to use data from multiple sources. So, for example, rather than relying on, let's say you're doing a satellite imaging data analysis where the imaging might only be available once a day, you can combine this with social media sentiment analysis during a crisis situation. So, for example, during the Nepal earthquake of 2015, satellite images were used to examine damage, while reports from local agencies and crowdsourcing were used to confirm the damage assessments. And therefore you could deploy the aid more effectively. And so you encounter these situations and bias and misrepresentation. I think this is something that I certainly try and emphasize with my colleagues that when they're looking at the output from AI models, whether it be ChatGPT or other predictive analytics, they need to understand that the models are only as good as the data they were used to train on.
Matthew Harris
I think that's a really interesting point and I really like how you phrase that Nassim, because especially with the recent push with generative AI and people are asking, oh, we've got tons of like unstructured data that we can just use immediately. Of course, the reality is not necessarily that it's not the case that that is possible. Simple things like parsing PDFs, it still can be challenging or it's expensive and that that forms a whole ton of barriers. You can pay for a service that will pull out superscript references and things like that, but it's amazing how that is blocking a lot of the work that's going on that said, there is a balance in that we have to make sure that perfection isn't the enemy of success. And there are many cases where perhaps that parsing isn't exactly perfect, but we can still leverage the techniques that we have to produce something that responds to the requirements. The business case to help people in the humanitarian scenario. So it's tricky.
Chris Hoffman
So we start with the first question, what is data? Then moving on to what is possible, the art of the possibility. But then the next question is then what does an organization. What are the five things that they should have if they even want to try something? Maybe it's not five, maybe it's three. But what are those things you say, okay, well you need to have a data scientist or you need to have this and then you need to have all of your data go into a data lake or it needs to go into at least a SharePoint or what are those baseline needs that organizations need to be thinking about that many don't have? I mean organizations that I work with don't even have a CRM for case management. They're still doing a lot of things on Excel. So this is, this is all new conversations to a lot of the folks and it can be old hunt for you guys as, as people that have studied for years and years and years how to manage data. But for the guy or the girl that knew how to put food on a truck and get it to the people that need it, how do we in there now the country director, and they've been in the organization for 20 years. What are you saying to them? This is what you need to do with your data? Maybe Matt, I see you nodding your head.
Matthew Harris
There's a whole list of very long governance best practices and documents which are very daunting for people to understand. I really like that Nassim was involved in creating a really wonderful document called Generative AI for humanitarians. And in there were. I know this pertains to Gen AI specifically, but there was a list of 10 best practices that are really easy to digest. And I think if we were to apply that principle to would say don't try and build it yourself. So many organizations go out and try and build everything themselves, pay a little bit of money. In some cases it's going to cost you far less than having a team of five data scientists try and invent something for themselves from scratch. So cloud vendors such as Vendor aws, Google all have really solid data management and data curation products. The second point is only give access to the data that people need Sounds like a simple thing to say, but there are many organizations I've seen where, oh, everyone's got access to everything. If you start with that role based access permissions from the start. And for non technical users, that's just sharing a file with a certain small list of people rather than making it by default shared with everybody. I think that's a fundamental principle that is really important. And then having the ability to log and monitor what's happening to data in your environment. There's a much longer list, but I would say if I had to do three things to begin with, it would probably start with those.
Jeffrey Wag
Those are all excellent points and in fact one of the issues that I think the sector is facing is that the regulation and the policies, especially the data protection policies, are having trouble keeping up with the technology. And so I think this is a big risk to deploying any sort of AI or predictive analytics in the sector. One of the things that we did recently at Relief International before even starting to deploy any AI tools was to develop a set of AI use policies to ensure that even before our staff and consultants started to use any tools that they consider the implication or a potential biases and more importantly protecting the personal data of the organization and more importantly the communities that we serve. I think this is something that really has to be done before any exploration and models are is done within an organization.
Matthew Harris
Just to add to that, something I didn't really mention is when I've been involved in building new data science infrastructure and practices, it can seem daunting and a bit ad friction at the start, but I think it's a really good idea from the outset to pick a framework like GDPR even if all your data is public, and start with that in mind because it's so much harder to go back a year later and put it in. So doing something like that I think is a really good idea from the outset.
Nassim Motelabi
This is very interesting because we're talking about humanitarian organizations becoming kind of a technology solution provider in some degree. And then also they source their own data at points and develop a solution and they either deploy it or even share it with other stakeholders, from governments to other civil society organizations in NGOs and others. So my question I guess is now more about the use of private sector resources or other open source tools and data and how much of that did you use or do you use on your day to day to address humanitarian challenges? What are some of the resources that you do recommend when it comes to humanitarian response and use of technology and AI and Maybe thinking about data protection when it comes to using our own data or using external services with our own data. So kind of two questions here. One is using data, external data for our own purposes, but also using external tools with our own data. Because there is a lot of questions around security, privacy these days. Right. And then we'll kind of shift a little, maybe that way, talking about how is our data being used when we use other solutions and is it worth it really, to develop our own? What does it mean in that sense? And Jeff, maybe start from. From your end.
Jeffrey Wag
Sure. I mean, I would say AI and any use of data must abide by the do no harm principle. So that means anonymization of data is essential before performing any sort of AI model development. And this is something, you know, we. We emphasize in the policies that we use at RI in which is what actually constitutes personal data. So we want to make it clear to our staff and those who are working with what it actually is, because some of the policies that have been developed in some recent work by the UN have highlighted that personal data goes even beyond. I know Matthew mentioned gdpr, but there are many examples of personal data. These might be the location of an individual, and you can imagine during a refugee crisis is particularly important that that data is protected. It's not always easy. I remember we were working on a project with WFP where we developed this model that was able to use household survey data to predict which characteristics of a household were most likely to lead to food insecurity. So this is obviously very useful information to have. But the models that we produced actually also produce politically sensitive predictions, if you will. And so this is something that, you know, in this case, we have to stop this analysis. But this is something that organizations need to be aware of when they're making any kind of predictions that they be aware of what those predictions actually, what impact they might have politically.
Matthew Harris
I think in terms of privacy. At Datakind, we adhere to GDPR because as and when there was any PII personally identifiable information that occurred, there was a whole process around that that we were able to follow, and that was really powerful. But going back to like AI models and external services that you mentioned, asim, I think an easy way, or at least, and it may not work for every organization, of course, but one easy stance to take is to use a particular. If you're using a cloud vendor is to keep all of your data and tools within that cloud vendor, because as an organization, you have enterprise agreements with that vendor, such as Azure or aws, that can be reviewed to ensure that those privacy controls are in place. This doesn't pertain like Jeff is absolutely right. You also have to do something proactive with the data coming in. But having all of your data and models within your own enterprise infrastructure is a big step forward. As opposed to having your data then calling this LLM provider that's outside of your organization or using this process and then everything is wild. If it's all in Azure, AWS or Google, you're off to a good start.
Chris Hoffman
Matt, real quick, just to follow up on that so we don't lose that thread. So would you be saying by saying this right, so either A, the big three, as you've mentioned them, right, you would utilize their LLMs that they have within their service, or B, what about you launching your own. So for example, hosting Llama in your own cloud environment in Azure as an example.
Matthew Harris
That's exactly it. I mean most of the large and by the way, I say the three large vendors, it's not exhaustive, there are other variants of that. But the point being is to pick an infrastructure that new owners the organization and can monitor closely. Most of them offer all of the big models. Now LLMs, I mean you've got Google Model Garden, you've got Azure. I forgot what it's called in Azure, but you know, changes. The point being is even those models like Llama provided by external entities can reside within your own cloud infrastructure. And instantly you've now kind of headed off a lot of the concerns that, that people may have with sharing of data for training models outside of your organization, all the cloud vendors now I think by default saying we don't do that and you can review that as an organization, you can review the contracts and be sure due diligence, exercise due diligence to make sure that's the case.
Nassim Motelabi
I was really intrigued by the examples that Jeff and Matt provided here because we can see that even if we use external data like what Jeff mentions around survey data, even if we deploy our own models on top of it, there is a social implication or consequence in terms of how that data is being used. On the other hand, Matt is reflecting on the security and privacy of our own data in the humanitarian space that is being stored in certain cloud environments and utilize certain external models or tools, we see that there is a spectrum of concerns that, that we have in the humanitarian space that is social as well as technical. This is very intriguing to me, these examples, but I wanted to kind of maybe steer the conversation to talking about those technical elements in terms of what should we actually be afraid of when it comes to the use of certain tools and technologies? When we talk about certain models like LLM models and AI being trained with our own data in an organization, even if we're using a cloud provider and using some LLM model that is provided through a third party, is that model being trained with our data or not? Do we need to have a separate agreement with them, et cetera, et cetera. Like generally trying to demystify the use of certain algorithms, tools with a certain set of data, our own data, or any type of data that includes PII or personally identifiable data. So what are your thoughts on that? Maybe Matt. And then we can go to Jeff. Just trying to demystify the risks here and what are the best practices to make sure that we are safeguarded?
Matthew Harris
I think there are three points that I think are really important for this point. Number one is choose your use cases judiciously. So don't. Especially with generative AI, we can just LLM it. Don't pick a use case that is safer. There's a spectrum of the different applications of models and AI ranging from super scary and dangerous all the way through to super safe. So super safe is never allow, for example, an LLM to answer from just its training data. You always ground it in some data or you ground it in some fashion. It may not be text or what have you. But the point being is to really pay close attention to the use cases. The second thing, for more traditional if you are training models and if you are using data, I like the approach of data cards and model cards. It's a Google standard. The idea being that if you create an AI product, you should have a card that a person can go to to review the bias analysis that's been done on that model that looks at the provenance of the data and actually lists some of the risks. Not so much risk, but some of the caveats that you should be aware of in using that model and data. There's work involved in doing that, but I think it's really important to have that transparency. And then the third aspect comes back to what I was pertaining to earlier. Choose your technology stack so that you minimize the surface as much as possible. Again, go back to my earlier point about picking a particular cloud vendor rather than trying to have lots and lots of different organizations where your data is going to keep your data safe within your organization. Essentially.
Jeffrey Wag
Yeah, these are all great points, Matt. And I guess I would also add that it's essential when taking any data, especially in the humanitarian Context that all the participants of that process agree to their data being obtained and they're informed exactly how it will be used. Ultimately, this is not only a huge risk to them and their safety potentially, but it can be a risk to the credibility of an organization if data leak out or if it's misused or if models produce erroneous results that has an impact on the organization and potentially the. The organization's ability to continue working in that area. So one way to mitigate these risks is through a process called Edge processing. So the humanitarian data is actually analyzed locally on the recipient's device rather than being transported to a centralized server. So that's one way that we can get around some of these risks.
Chris Hoffman
Do you guys have some more of those? Any more you want to share some of the risk mitigation measures? Because that's a really big piece around that. I mean, again, Matt, to your points, I thought it was, they were, they were spot on, especially around the stack. And I think that organizations understood it before AI that they wanted to centralize, you know, so they only hired. Everything's on Azure. So they only hire people that you work in Azure. They keep everything in Azure. But now AI has come out and there's a lot of new third parties coming in with things. So I think that that was a really important point. But. But do you guys have another example? So Edge is one, Jeff, but are there others that people should think about and consider because they need to ask the right questions? That's always the problem, right? Organizations go and they talk to OpenAI and or to somebody else and they don't even have the right questions to ask.
Matthew Harris
Yeah, I love that Edge aspect, Jeff. That's really cool. I mean, I suppose along the same lines you have federated training of machine learning models where they're just trained locally on the data. I have to confess I've not actually worked with that myself, but it's another area that people could pursue automated monitoring, I think. I'm sorry, I keep. I sound like a broken record, but seeing who is accessing what data when, but not writing it yourself. There are some amazing tools for many of these platforms that will sit there quietly and they will actually monitor your data and see who's accessing and they'll give you a little red, amber, green thing that you can attention quite easily. So I think Azure has Security center, for example, and it gives a whole set of features and it's like a game. You want to get your score up and you want to make the thing go green. And in doing that, you find that you're often compliant with data protection standards and so forth. So using automated tools to monitor your data, if they're available, I think that's.
Chris Hoffman
Another in RI and in the other work that you're doing, Matt, the teams that are there. Right. So in general, what organizations tend to do. Right, is they try to just upskill the teams that they have. And some of these things require new skills, not just upskills, if you know what I mean. And I kind of asked for it a little bit earlier, but if you had some funds to hire some consultants, like both of you have been with humanitarian organizations, where do organizations need to start? With the skill set? Right. We've talked about this in past episodes around the impact of AI and human resources, the changing of human resources within organizations and what that will look like in the future. And again, it's not potentially a reduction in staff, but just a different type of staff and increase in certain size of things and decreases in others. But from your perspective, who should an organization hire as a consultant to start talking to them about this?
Jeffrey Wag
Right.
Chris Hoffman
You can say yourselves too, you know, to hide behind the thing, but you know what I mean.
Jeffrey Wag
No, I think this is a great question and I think my thinking on this has evolved in recent years and I'm starting to come to the conclusion that data engineers and those who understand databases and proper data management should be hired before any data scientist. Because. Because ultimately data scientists end up doing that data engineering themselves as part of their role because the data is not in the right formats. The centralization or the management of the data is just. It needs to be done. But I can understand part of the reason that this happens is that aid agencies, when they get their funding, the funding, it's very difficult to get funding for the technology development. And so therefore a lot of organizations have to work around this. And so I think it will get, I'm hoping anyways it will get to the stage soon where someone like a data engineer is as important as someone in communications or the central administration. I think this is. These are really important roles, especially as we become more data driven as organizations.
Matthew Harris
People listening to the podcast couldn't see me jumping up and down and doing thumbs up to that because I couldn't agree more. I think the days of unicorn data scientists are past us and recruiting data scientists from the outset, if you have the resources for it, that's a great thing. But I don't believe that that's the most important thing. The most important thing is exactly what Jeff said secure your data, make the acquisition of data more streamlined. So organizations are on a continuum. The larger humanitarian organizations that have a technical capacity already, having cloud engineers that understand how to keep things secure, having data engineers understand how to get the data in there, and then moving on to AI specialists, possibly in that last part. But smaller organizations and even the larger ones should consider vendors in some cases, especially if you're a small organization that doesn't have the capacity to build data infrastructure. Perhaps the first step is to pay for a product that is secure, can be reviewed legally for all those reasons that we mentioned earlier. And then you don't have to hire a person, you're able to leverage the professional services of that particular vendor. So that's another interesting option to pursue at the start. And if it's done tactically and you pick a vendor which uses a particular. We've seen this at wfp, of course, Nassim. If you pick a vendor with particular technical stack, that can be an interesting migration pathway along that road.
Nassim Motelabi
Thanks, Matt. This is actually very helpful because I've also seen this shift as well when we sometimes work with data scientists, but they're like, why am I here? Right? Because the bed is not prepared for them, the field is not ready for them to start doing what they're supposed to be doing. And then you see setbacks in the project development. So I totally agree that data engineers, also cloud engineers, I realize, are very important when you are trying to choose the right services to work with, especially that we are shifting to this cloud centric mindset. But also even when it comes to edge computing or edge cases, when Jeff referred to, that in itself also requires some setup and management and I think rethink our infrastructure generally when it comes to AI. But with that, I kind of wanted to take two lines of thinking. One is how resource intensive it is for us to actually even start working with AI. And maybe we can be a little bit pessimistic and say, okay, that's a lot. But what are the small wins for some organizations, especially humanitarian organizations that may not have this capacity or in this infrastructure? So do you have any examples around small wins where we can actually benefit from AI analytics, given the humanitarian data that you've seen in the past and you've worked with those organizations, small and big? So what are those small wins? But on the other side, I want to take a more optimistic approach and thinking in a perfect world, when we have the data, what can AI do for us? Right? What are the examples that you can see or foresee where AI analytics and data analytics can really transform what AI can do for us. Because I think, and I, I'm putting these two cases in front of each other because sometimes I'm like, is it really worth it for us to invest in AI and data analytics to change how we respond to crisis or not? So do you have some of these examples that you've been inspired by?
Jeffrey Wag
These are all great questions, Nassim. I think one of the. So we recently ran a survey of AI use by staff within Relief International and one of the use cases which came up most frequently was translation. Now, translation of low resource languages is still a challenge. You need a very large corpus of text to be able to train these models effectively. But many senior managers, so most of us working in the sector are working in organizations with many, many different languages. I think I counted more than 18 across RI just the other day. But the ability to communicate more effectively across the organization benefits from the use of some of these AI tools, some of the translation tools. I do like to highlight that if people are using these tools, they should indicate it. But we have had managers say that they've noticed the quality of the work that's being done by their staff has improved dramatically because of the use of these tools. And so I think that's a small win. One thing, and sorry, this is going to sound a little bit pessimistic, but one thing we need to remember is that the failure rate of AI projects within the sector is still quite high. It might be over 80%, but this is natural. And I think for those who have worked in the startup space, 80 to 90% failure rate is okay. But, but part of the reason for that is a lot of things have to go right for a project to be successful. The data have to be good. We've already touched a lot on that. The model has to be accurate. The user interface has to be something that people actually want to use. And so one of the things I try to do with the organizations that I worked with is not to overhype the tech. So AI is a tool in a toolbox. It's not the tool that's going to solve all of your problems, but if you've got the right problem to solve, then it can be a very effective, a very effective means of doing so.
Matthew Harris
I think a very operative term there is if you've got the problem to solve. I think one of the really key things about any AI or any software project for that matter, is to have a really, really crystal clear business use case. I use the word business but a reason for using it and a measurable outcome or something that can show that it's actually had a positive effect. There is a danger and I'm a huge nerd and I love noodling around with all these wonderful tools. But at the end of the day for an organization you have to make sure that the work you're doing is in answer in response to a very specific requirement and putting more, if I was a small organization or any organization putting more effort into that first, you know, how much time will be saved, which responses will be faster, etc. You know there's a glib, very quick examples but. But there's a huge spectrum there. I think starting any project for that is absolutely fundamental fitting a little bit.
Chris Hoffman
But around this same idea we've been talking a lot more around the analytical side of things. Right. The decision making piece that AI brings. What about the engagement piece? Right. There are more people displaced than any time in history. There is less funding per crisis than any time we've seen in the recent history at the very least. And so there are potential efficiencies for reaching scale in certain activities by using things such as generative AI and and I want to put this in the context of for example being able to have a chatbot where people can access information about where to receive assistance.
Jeffrey Wag
Right.
Chris Hoffman
And could be with agents. You know, I talk a lot and work a lot with AI agents and how to be able to utilize those, you know, together with generative so that you can collect structured data while at the same time time being able to have a generative conversation. Those types of pieces again stepping out of the organizational level utilization and moving it towards kind of what we would say beneficiary facing or people affected facing. What are your guys thoughts on those things? Maybe it's probably a whole other podcast, but in general, I mean Matt, I love all of your insights on the protective nature of how to structure these things and put them together. And it is a very scary thing to think about an AI engaging with the person. So what are your thoughts on that?
Matthew Harris
More recently with the advent of LLMs, it has made things like chatbots a lot easier. I used to Write chatbots about 10 years ago. It's all intents and entities and you'd spend all this time writing a lovely chatbot and somebody would then ask a side question and the whole thing would fall over in tears. At least there's better handling for that now. But I think grounding and citations are the focus of everything that I've been doing with knowledge retrieval. And by that I mean if the person is asking for advice and they're presented with information, I think every single factual claim in that piece of information should be grounded with a citation. So click on this little superscript. One takes you to the chunk in the document that says the thing and then the person is, it can self serve and read around it. I won't release something that doesn't have that. I won't release a chatbot that doesn't have that capability. So I think that's really, really key. And in terms of what's coming, this is a little off topic. I won't go into it too much, but I think in terms of safety and the concerns around use of LLMs, for example, I think the messaging needs to be. We are grounding, we should be grounding these things in our own data. It should only ever provide answers using our own data. And I wonder if it's a difficult message to get out. But in the people that I've spoken to in the sector that that's allayed some concerns around that. I actually think the dangers, there are dangers that we may not be looking in the right direction. One of which is every single organization I know, all of their developers are using generative AI right now and including myself. It's a very useful tool for day to day. So when people say it's all hype, go and look at the statistics of how many developers are now using generative AI. That's not hype, that's people using it. And that's going to have a profound change, I believe in the next five or ten years on software development at least. And so the chatbots, I actually think that's one of the more common chatbot uses at the moment. Whether it's copilot or O1 is incredible in what it can do with software development. So yeah, those are my points on chatbots.
Jeffrey Wag
I think those are all excellent points, Matt. I guess the only caveat I might add is that I've seen some uses of chatbots in the mental health space and I, I remember during the pandemic we were all locked up, we were all locked down. The organization I was working for at the time decided to share a chat bot that would help us with our loneliness. Right. I have to say it, I didn't try it, I admit, but I was concerned that the use of such chat bots might potentially make a situation worse and therefore will obviously be a liability to the company who produces it, but moreover potentially harm the end user So I think this is something, and Matt touched on this already having some kind of caveat or some kind of guardrail in place when you're using these chatbots to be aware of either how they were trained or what the limitations might be.
Matthew Harris
I think that's a fantastic point. And it's to apply risk analysis to any application of AI, including chatbots, by which you ascertain the probability and then the impact of that particular use case. So a mental health chatbot, I would say the impact is absolutely off the scale. So therefore, risk analysis, in my opinion, would preclude you developing that at all. Whereas other uses of chatbots, they have less of that impact. So I think if people perform a risk analysis on any potential solution, I think that's a really key thing to do.
Chris Hoffman
More technical question on this. So we're talking about garden fencing, right? So training it on our own data. That data can be the training, training materials that we have on financial inclusion. Okay. And people want to be able to query or be trained on how to be more financially literate, so to say. Right. So you've got a training manual. You put that training manual in a rag and then you can allow people to interact with it over WhatsApp as an example. Right. So is that a practical thing that could be used, or is that still a step too far? What kind of considerations around use cases should organizations be thinking about for that beneficiary facing opportunity? Because that's obviously what I do, but it's also what organizations ask me about constantly. And we talk about how far is too far in this beyond the Personas that we have to define, beyond the different cultural contexts, beyond those other extremely important nuances that are there. But just the general use cases, I.
Jeffrey Wag
Think we're seeing some really interesting applications being rolled out now, or at least being prototyped within the sector. And we've seen some really very useful wash related AI tools. We've also seen some potentially very useful tools for farmers. So farmers in the field who are trying to better understand how they might increase their crop yields. And so I think these tools are now advancing really rapidly. And Matthew's already highlighted this. But of course there needs to be some sort of risk awareness or at least some sort of guardrail in place before deploying these with 100% confidence.
Matthew Harris
I guess in extension to that is start with your evaluation tests. Start with a framework that automatically allows you to build out a gamut of tests that tests safety, hallucination, all of that good stuff. And so that if things change, you See it instantly. I see a lot of vibing out there in the field where people are just sort of putting in a sentence and go, oh, that's about right. Oh, this is great, let's launch it. You have to have evaluation tests and actually real time ones, if that's possible. If you have the capacity. I mean, Langsmith is a great go to for that open source, but be careful because Langsmith is outside of your organization. So there are analogous tools within each of the cloud vendors that have similar things for evaluation and testing. So I think that's super key. And not everything has to be a chatbot. There are some amazing opportunities with this technology that are nothing to do with chatbots. Whether it's data enrichment, whether it's tagging, I can't think of all the use cases, but they seem because of ChatGPT, I think everyone's going to chatbots. And yes, there are some amazing capabilities there. But as an organization, I would start with non chatbot uses first. They're safer, they're less wild. You don't have to deal with all of the fringe cases that occur with the human natural language interface. So that's another thing to consider.
Chris Hoffman
Absolutely. Nassim, we're getting close to the end here, so I always love to give you the final question question because it's always very pertinent to what we've been talking about. So I'm gonna leave it over to you.
Matthew Harris
Thank you.
Nassim Motelabi
And this was a great conversation. I think every question we ask can be a separate episode by itself. There's so much to unpack, but it always fascinates me to speak to technical folks who have a vision around the next AI tool that they want to use or the next model they want to explore. I don't know, maybe we can talk about that. I think a lot of our AI conversations go towards LLMs and natural language tools these days, but we didn't get the chance to really talk about some of the other AI models that are out there that are helping us in the humanitarian space. So yeah, maybe we can ask our guests what is it that excites them most about AI and what's upcoming to them as a next boom? I guess within our small little space in AI.
Jeffrey Wag
I mean, one thing I would say Naseem, Matt and I have all worked at the WFP where they're using AI models to predict food insecurity. So for example, the teams in East Africa recently used machine learning models with data from multiple sources like satellite imaging, climate models and social media to predict where the next famine hotspots might occur before they started. Another example is disease management and epidemic prediction. So, for example, during the Ebola outbreak in West Africa, Africa, AI surveillance was used to analyze sources such as news media, social media, and health data to predict where new cases were emerging. So, you know, this kind of predictive analysis allows the health teams on the ground to get ahead of an outbreak. So I think in the healthcare sector in particular, this is an area where we're going to see a lot of really exciting and positive things happening in the AI sector coming in the years ahead.
Matthew Harris
Yeah, absolutely. And just leading on from Jeff, I think there are lots of analogous solutions using geospatial data. I work with organizations like Medic Mobile who are doing prediction analytics on malaria outbreaks and things like that. And I just think that's so tremendously exciting, getting back to L&M's Jacaranda Health out in Kenya, doing amazing things with training, capturing that local context and better support of Swahili. So they have a Swahili LLM. I think there are really great opportunities there, I think in the wider field of things. Personally, I feel the revolution is coming with software development. I think the capabilities now with some of these chatbots are insane with O1 and so forth. And behind or related to that is the whole agentic thing. It's not as easy as the hype would say. There's a lot of work behind it, but it does have some potential and I think we may see a lot more in that regard this, this coming year. So, for example, Google released Deep Researcher and that's a really interesting example of a vendor offering for doing deep research on unstructured data. So, yeah, it's exciting times.
Nassim Motelabi
Thanks, Matt. I was laughing you saying that agent thing, right? It's like sometimes we don't even know what these agents do these days. But super great conversations. Thanks for being here with us. And Chris, over to you. I don't have any further questions.
Chris Hoffman
Awesome. Well, Matt and Jeff, truly it has been a unique pleasure to have you both here, knowing that also we were starting the call off talking about astronomy because both of you are astronomy buffs and Jeff, everybody was kind of jealous of your new position there in France, looking at some of the skies and the stars well beyond here. So sometimes we think about everything on the planet and you get a chance to think about everything that's outside of it, which is super neat to think about, think about, but you know, to just kind of close off the call and beyond our thanks use it's. It's just so important. There's just not enough understanding. It's such a new thing that everybody's trying to learn about. So, so you guys offering your time to share your knowledge and your experience, all three of you, your knowledge and experience in the use of data, and then now the application of data and utilization of it within AIs is so important for us to start getting into and upskilling humanitarians, to start to be able to have the lexicon, to be able to pull from when they speak about it and be able to understand these things. So I can't thank you all enough. It's been a great pleasure. So thanks a lot for joining Humanitarian Frontiers in AI and we both really appreciate it and thank you again to Innovation Norway.
Jeffrey Wag
It's been a lot of fun. Thanks, Chris.
Matthew Harris
Thanks, Nassim.
Jeffrey Wag
Thanks, Matt. It's been really great talking to you and hopefully we'll have a chance to meet in person sometime soon.
Matthew Harris
Absolutely. Thank you so much. I've learned so much from you all today. It's been a real pleasure.
Chris Hoffman
Well, thanks, Naseem. Until the next podcast. I will see you soon.
Nassim Motelabi
See you soon, Chris. And yeah, until the next episode.
Chris Hoffman
Bye, everybody.
Narrator
Thank you for joining us on Humanitarian Frontiers in AI. We hope today's conversation gave you new insights into how AI is transforming humanitarian efforts and the steps we need to take take to ensure it's done ethically and effectively. If you enjoyed this episode, be sure to subscribe and stay tuned for more discussions with leaders and innovators at the intersection of technology and humanitarian work. Together we're exploring how AI can bring real change to communities in need. Keep pushing the frontiers of possibility.
Date: February 10, 2025
Host: Chris Hoffman
Guests: Jeffrey Wag (PhD, CNRS & Relief International), Matthew Harris (PhD, formerly Datakind & WFP), Nassim Motelabi (co-host, WFP)
This episode of Humanitarian Frontiers in AI explores the critical role of data in the adoption of AI within humanitarian organizations. Hosts Chris Hoffman and Nassim Motelabi are joined by renowned data scientists Jeffrey Wag and Matthew Harris, who candidly address the complexities, pitfalls, and opportunities that come with leveraging data for AI-driven solutions in challenging, resource-constrained environments. The panel moves from foundational questions (“What is data in the humanitarian context?”) to issues of ethics, governance, technical best practices, risk mitigation, and tangible examples of AI's promise and limitations in global aid.
On AI Data Foundations:
"For AI, if you don't have clean, solid data and a good architecture around it, the AI is not very good, isn't it? It's rubbish in, rubbish out."
— Matthew Harris [05:48]
On Vendor vs. In-House:
"Don't try and build it yourself. So many organizations try to build everything themselves. Pay a little bit of money for secure cloud or vendor solutions—often it's far cheaper and safer."
— Matthew Harris [14:07]
On Data Before Algorithm:
"After data protection...one of our toughest challenges is the incompleteness of datasets...they might be incomplete."
— Jeffrey Wag [10:50]
On Chatbot Precautions:
"Every factual claim in a chatbot should be grounded with a citation. Click it and see the source. I won’t release one that doesn’t have that."
— Matthew Harris [37:24]
The Real Revolution:
"The real revolution is coming with software development—generative AI as a copilot for writing software."
— Matthew Harris [45:32]
AI in humanitarian work is neither a panacea nor science fiction—it’s a pragmatic, sometimes tedious journey that starts with disciplined data foundations, realistic project scoping, and a commitment to ethical safeguards. Data engineering, strong partnerships, and incremental wins pave the way for transformative impact, while constant vigilance is required against risks both social and technical.
As Chris summarized:
"It’s so important...to start getting into and upskilling humanitarians, to start to be able to have the lexicon, to be able to pull from when they speak about it and be able to understand these things." [46:57]
End of Summary.