Loading summary
A
Welcome to High Impact Growth, a podcast from Dimagi. For people committed to creating a world where everyone has access to the services they need to thrive. We bring you candid conversations with leaders across global health and development about raising the bar on what's possible with technology and human creativity. I'm Amy Vaccaro, senior Director of Marketing at Dimagi and your co host, along with Jonathan Jackson, DiMaghi's CEO and co founder. Today, we're checking back in on the state of AI for good. This is part four of our ongoing series, and I'm sitting down with Jonathan Jackson alongside Dimaghi's VP of AI and research, Brian Durenski. Since our last episode on this topic, the technology has made staggering leaps, but the global health market has also been rocked by changes in funding. We explore the massive tension this creates, why AI accelerates human intention but can't replace critical thinking, and the fascinating research Dumangi is doing on hidden bias in frontier models. We also get into the risk of pilotitis and why you must review your AI's work by hand. If you're wondering how to practically and responsibly apply AI in your work right now, this is the conversation for you. All right, welcome back to the podcast, Brian. Good to have you here.
B
Thanks, Amy.
A
This is actually part four in an ongoing series about AI for good, and the last episode we recorded was actually June 2024, which is insane that that much time has passed. So figured it was worth checking in again on all that's going on. But, Brian, you are our VP of AI and Research. You lead our AI efforts at Dimagi. So excited to kind of do a bit of a check in on how everything's going in the world of AI at Dimagi. And I know a lot has changed since our last conversation in terms of the technology has made crazy leaps and bounds. And then, of course, the market has imploded, and so there's been a lot of change on the global health side of things as well. So before we get into kind of the deep end of the work that you're doing, I want to just like, quickly check in in terms of how. How are you personally feeling about AI these days?
B
I think probably largely the same as what I said. I think there's a lot of upside and advantage to it. I think I found some AI workflows that work well for me in supporting the work that I do. I find it intellectually stimulating in like a dopamine hit kind of way. Just sort of the speed at which the whole field is kind of moving forward. And if I kind of pause and sit back and think, where are we going with all of this? Then I get a little bit nervous. And I still am. You know, I think we talked a lot about the potential for Dystopia and I think that doesn't feel resolved in my mind. So I think I still have a lot of those concerns. And part of me wants to say, like, yeah, let's just stop here. This seems pretty good. We made some progress. Let's kind of absorb this and slow down. And that doesn't seem possible at this stage. I think it's anthropic. Maybe that talks about like holding the light and the dark in your head at the same time or something, or like, you know, thinking about like all the positive upsides and thinking about the potential for it and then also recognizing that sort of potential for, for dystopia and trying to actively work against that. So I, I think I'm, I'm still in that state even now. But it is wild how much more capable things are, how, how much better workflows have gotten just, even in the, the last, you know, 12 or 18 months.
C
Yeah, I think that, you know, Brian and I get to spend a ton of time talking about AI at our platforms level at Dimagi, and building new features and products, and Amy and I talk a lot more about how it's disrupting how our teams work, how we work. And I can't believe it was a year ago, Amy, or over a year ago at this point that we had that last conversation. But since then, you know, I've had personal experiences that have kind of just blown me away, even when I'm using these things on a day to day basis. But I've now started doing a lot of programming again. I used to program and was a trained software engineer in college. And it's insane what I can build on my own. And not only is it crazy that AI can do it, but the way it changes how I think about my role in providing thoughts or requirements to our product teams or driving what we can do forward has fundamentally shifted because shortening the distance between the people who are envisioning what they want to build and the people who are building, it just feels like you can close that gap so much more than was feasible pre AI. The other experience I've had is building AI applications with my kids. So I found myself regularly teaching them how to prompt the AI correctly in a way that I wasn't doing before. We built the La Crosse Faceoff app together with my 11 year old using cursor. And so some of these experiences where it just back in June 2024 when we had that call, it was kind of like all interesting and lots of stuff that we were doing here at DMage. But at a personal level, I've had multiple different use cases and experiences where it's fundamentally changed what I would have otherwise done either with my family or professionally because of the new capabilities that AI have, have come out with primarily in the software engineering space for me. But I know the same is true of media generation and a lot of other fields that I'm less in day to day.
A
Yeah, it's wild. And I love that example of coding in cursor with your kid, John. Brian, you mentioned just like high level, there's some workflows that are adding a lot of value for you. Can you share any light on what are some of those places where you personally are finding it really useful?
B
Yeah, I think the first thing I should say is I'm very much using it in the AI accelerating human intention and kind of amplifying human intention version. So I think outside of some very small use cases, I've not seen versions where you can kind of skip the thinking piece. And so I was talking to our tech team about this the other day. There are times where you're like, you know, he's working on this large system and you're thinking about a new feature or thinking about, oh, I should shift to this or add this. And it just kind of feels like a hassle. Like there's a lot of work to do, but AI really like lowers the activation energy, for lack of a better word for that, and really takes things that would have been onerous, would have been a huge hassle, and just really like, makes it a lot easier to approach. And in some ways they're kind of like an electric bike. Like, you know, one. One version of an electric bike is like, oh, it just like flattens everything out. So if you've got hills, like, suddenly the hills feel flat and like you're still on a bike, you're still pedaling around, but like any of those like big annoying pieces. And so I've been using it for tech stuff, kind of spinning up quick little things or proposing changes or kind of building stuff. I've also been using it a lot in writing and this is where I think it's really important to stress the fact that you can't skip the thinking part. So I use it in writing, but I do a lot of thinking or I might Brainstorm with a human, I might brainstorm with an AI. I'll write some bullet points, I'll throw it in there. And just because of who I am, I like to formulate my thoughts and express my thoughts verbally. And so I do a lot with running Whisper locally on my machine and having it transcribe all of that and throwing that at a model and it'll output something. And almost always the first version is wrong, but I'm able to react to that first version and say, no, no, this is wrong, we've got to change this, we should reorganize that. And then I give it another transcription of my feedback and then I iterate a few times like that and then eventually I get some text and I pull that into a document and then I go through it and I delete entire sections and I rewrite sentences and you know, I still, at the end of the day, what I'm sharing with my colleagues is not something that came out of the machine. The machine just helped me get to that faster than it did before. And, and so I think, I think it's pretty important, like very important. Certainly I would say for our team at Tamaki and probably everybody else, that like, when you're using these tools like you, you can't skip the thinking part. You can't skip the like, critical thinking lens of like, what am I trying to accomplish, what am I trying to communicate with this document? Or you know, what feature am I trying to build and add? And like we have not yet automated away that, that part of it. And so I've just found that I can get to the end result faster by using these, these tools along the way.
A
I think that's such an important point, Brian, of just like we, we have to still be doing the hard thinking work, right? And then the AI can hopefully speed up what we're able to do or maybe raise the quality and what we're able to do. But I think as a professional that is something I fear, right, Is that myself and perhaps others stop expecting to do hard work with our brains, right, and start delegating too much to an AI.
C
And imy I build it on that point though, it's not just um, it's a, it's a fascinating meta question of like, would it even be good if you could delegate the hard thinking? Because as we get more senior in our careers, a lot of what senior people are good at is like synthesizing 20 years of experiences they've had, the mistakes they made, you know, pattern recognition on why something might not work, or at least being able to share that experience. And so as AI continue to get better, you know, particularly for junior levels in the professional workforce, like a huge way you get from junior to senior is just a ton of reps of doing these things, making some mistakes, having senior people teach you how to not make them or stop you from making them or, you know, repair it after you've made them. And as AIs continue to get better and continue to replace more and more of that critical thinking skill set, it is really challenging to be like, okay, that makes sense. And senior people are incredibly valuable now because they can instruct the AI instead of instructing a junior employee. But then it's like, well, where's the next senior cohort coming from if that's the world we move into? So that's something I think a lot about too, because it's absolutely the case that it can't replace synthesis or senior level thought yet. But it was also the case that it couldn't write good code a year ago. So I think as we think about that, it would be not bad in a absolute labeling of the adjective, but it would be problematic for the way that we typically think about careers right now if you don't get to do that critical thinking synthesis step, because that's really where you add a lot of value, is get more senior in your career. You're not doing the work necessarily. And so there's a huge appeal right now to being able to instruct AIs to go do the work. But then the question is, who's going to continue to have that expertise to do the instruction as they continue to get better? And right now I think everybody at our senior level is super excited by the potential that you can instruct all this stuff. But we'll eventually age out of the workforce. And it's like, well, what happened with people who are 21 right now when they're 40 in 19 years?
A
Yeah, it feels like if you're 21, you have to be really intentional about how you're working with AI and to make sure that you're still actively learning and getting those reps in. Right. But it's. Yeah, Brian, Sorry, you were going to say something.
B
I think it's pretty tough to estimate where AI and where sort of the capabilities will be in 19 years. I think there's the utopian dream that we're all going to be sipping virgin mojitos on the beach or something and the AIs are going to be doing all the work. But actually I was listening to an interview, I can't remember exactly who it was, but they were talking about the goal isn't to eliminate work, the goal is to eliminate jobs. And one of the people was making the point. They're like, oh, if you look at kids, if you leave kids alone or whatever, kids will start sweeping and cleaning up and building things and doing a bunch of work. But the moment you're like, oh, go clean your room, they're like, ah, stop, you're my boss, I don't want a job anyway. And I thought it was a good point that nobody wants a job, nobody wants a thing, that they're compulsory, that they have to do where somebody else is telling them what to do and somebody else is setting the rules and giving all that construction. But everybody wants to express themselves and build and make their lives and the lives of the people around them better. And so the goal isn't really to eliminate that work, but to do that. So I don't know, I find it tough to think about. I mean, two years feels really far away. 19 feels impossibly far to kind of think about. But I think the. I have two other quick thoughts on the, like, you can't get out of the thinking piece. One is I heard somebody talking and they were like, the problem of not doing the thinking piece is if you generate a bunch of AI stuff and you just kind of throw it out, you're just like kicking the can down the road. Like eventually somebody's going to have to look at the thing or, or like do something with all of that text as it kind of makes its way through a company. And so you're just sort of putting off that thinking for some future piece. And then the second piece that kind of came to mind is, you know, this is something that we tell our own team. So we have various projects where we're building various LLM powered tools for things. And one of the things that I've been saying for a while, and I think John and others are saying as well, is like, you can't. You have to look at the data. Like, you have to look at the transcripts. You have to, you have to be like deeply involved. You have to understand what's actually happening with the tools that you're creating. And this goes back to like early machine learning things of, you know, Karpathi and those guys. Like, they spent a ton of time like just labeling data and like looking at data and understanding the ways that things are failing and doing error analysis and like, you have to kind of live in that space to really develop some intuition and get some sense of what's going on in order to improve it and whatever. And so I think it's all kind of comes to the. It all feels like different versions of the same thing. Right. Where we have to think critically, we have to be intellectually and mentally engaged and build off of that. Yeah.
C
And I think building on that, Brian, one of the things that's been fascinating in my role as the CEO of a tech company in AI for Good Space, trying to go through this transition, and Brian mentioned this, he and I have been debating a ton on both creating the technical capabilities to evaluate how the AI use cases we're trying to deliver on are going. But there's another piece that I think about a lot in my role is who knows what right is in a lot of these use cases, Right? So it's like, is an AI being good if a community health worker in rural Africa starts talking to it about her income and it just says, oh, I can't help you with that. I'm a bot focused on this other thing, or is that a bad answer and what are we really trying to do here? We're trying to improve the job that the CHW has, her livelihood, her sense of community. So if you just punt on some of these tough answers and you're like, oh, I can be the technical support agent for your app. Is that a good bot, or is that marginally better than it would have been in today's world? And so thinking through that piece, which is like, what is value when you're doing these evaluations of these bots and tools, but then also, who gets to set the problem space, given that AI is really expanding, what any one person at Tamagi and probably at every company could hypothetically be accountable for. And so one of the things I'm doing with Brian and I is I'm like, you and I and the other team leading the project, we're going to go in together and label this data by hand, look at every single transcript, because until we're in there reading what's really going on, we won't have a sense of the texture of the conversations that are happening. We won't have a sense of practically just how hard is it to come up with a good, bad, qualitative assessment of the conversation. And obviously, if Brian and I and the team can do it, then the next step is, what can an AI do? The labeling for us. And there's a whole set of other questions you may want to get to, but I find this fascinating of like ChatGPT's just exploded. Right. And it's obviously an amazing tool that a lot of people like. But is the goal that I like it or the goal that it helped me take the next correct step on what I started prompting it on? Obviously ChatGPT's primary goal is that it eventually monetizes me as an end user in whatever way it's going to choose to do that. So it's not necessarily going for correctness so much as me wanting to continue to use the tool. Right. And so as we think about this, what does AI for good mean? Is the goal that the CHW comes back to our AI enabled chatbot over time because she likes it, or is the goal that we made her more money at some point? And so these are really difficult questions to wrestle with right now that have very complex answers. And we're already seeing this play out with how AI companies are thinking about embedding ads or different things in the tools.
A
Yeah. So it's fascinating. It sounds like there's just this emerging, we're learning as we go about where humans need to still be super involved. It's not just in the upfront thinking, but also the reviewing the analysis and being really thoughtful about what is the end goal of all this. And also keeping in mind that the AI you're working with maybe has a, you know, its goal is to make money off of you. Right. Like, so how do you factor that all into what you're doing with it? I'm curious, like, John, you started to talk a little bit about AI for community health workers, which is one area that we've been working on. But even before we dive into like our specific work. Brian, I'm curious from your perspective, what surprised you about kind of the trajectory of this AI for good movement over the last year or so?
B
That's a good question. I think I would say it's still finding its footing, if I'm being perfectly honest. I think people don't know where the low hanging fruit is. I think people don't know yet sort of which problems they can make progress on by, you know, if they push on them, they'll make progress. And which problems are somewhat intractable. So I'll give one example. I think I might have mentioned this in a previous episode, but we were talking to one of the big, big AI firms at some point and we're talking about low resource language support and we're like, oh yeah, you know, in this language or that language, it's pretty poor. Like is it, you know, what can we do? And the feedback we got from the team there was they were like, oh, we need a billion tokens. Like if you can give us a billion tokens in that language, we can move the needle on the language. Otherwise don't even bother. So that feels pretty difficult to kind of engage there and really be able to move the needle. We don't have billion tokens of anything to be able to share and, and kind of push on and then kind of. Similarly, one of the approaches that we've taken at Dimaghi is to keep abreast of what people are doing at the clinic. We do a lot of work in health. There are some folks that are pretty focused on the clinical side and I think Dimaghi has kind of intentionally let other folks push that forward to see if they can get better clinical decision support at the point of care out of, out of models. I think in some ways it's a much higher risk application. Getting something wrong there has higher consequences than other use cases. And I think we've been more focused on looking at things like training and support and being able to like analyze messages that are coming in and doing kind of sentiment analysis and topics and things of that nature. And I think there's kind of a lower risk. You can build tools faster, you can probably get value faster out of them. It's kind of the, the approach that we've taken at the moment, but I would say largely in the overall sort of AI for good. I think there's a lot of small anecdotal success stories. I think a lot of people are excited about what's happening with AI and agriculture. I think some of Google's more recent models around weather, there's a lot of excitement about what that could do for farmers. A lot of the take a photo of your crops style things are getting a lot of traction. I also think there's probably a lot of just low hanging fruit in terms of creating better access to data and more natural ways to access data and information that we've already curated and put together. So we're doing a lot with. Even on Commcare itself we have a support bot. On Open Chat Studio we have a support bot where building support bots for frontline workers through our various connect programs to be able to help them with the interventions they're delivering, to be able to help them with the tools that they're using to do those and just trying to provide a better interface to kind of get that support. And I think there's probably a Lot of low hanging fruit there. But again, kind of stepping back to your question, high level, I think there's a lot of open questions still and I think people are still trying to understand where to make bets and where to make investments and how to push AI for good forward.
C
Yeah, I think one of the areas that I'm most interested in, just because I've seen the potential in some of the work I personally have been involved in over the last several months, is what friend CEO of Nuxtleaf Mithia calls operational AI. So it's kind of like the boring middle management use cases of just like business process flows and spreadsheet creation and moving support tasks along those things. And I think there's a lot of the flashier use cases like Clinical Decision Support and Doctor in a Pocket and Brian's examples of weather and all this stuff. But I think there's a lot of huge value that we haven't figured out how to crack an AI for good around because of the changes in our industry. Dollars got to go massively further than they have in the past. A lot of people who are working in our industry unfortunately no longer are. So a lot of work that was getting done is no longer staffed at this point in time. And so I think there's a big opportunity to say how can even the capabilities of today, but certainly the capabilities of tomorrow assist in much more. The human augmentation. I already know what I want to do. I just want to get it done way faster, way better. Whether that's supervision or support and those tasks, that's a really exciting area to me. And I don't think our industry is really the, you know, Wangi certainly hasn't, but I don't think our industry as a whole has kind of figured out how can AI play a role in that. And having an agent in the hands of every director in the government, you know, trained with their data that can kind of do the work that maybe a junior analyst was doing in the past. So that's another one in the AI for good space that we haven't seen take off. But I think there's a lot of potential there with the capabilities that already exist today being sufficient to make a lot of progress there. Brian and I have talked a lot about these use cases and we have a couple that we're testing right now externally as well.
A
Awesome. Yeah. Brian had mentioned the CommCare support bot which we launched last month. Already getting some really great feedback there. This is a bot that's trained on all of the documentation in Commcare and can give answers immediately. And it feels like a pretty straightforward basic use case, but already we're seeing just so much value added there where people are saving time getting the answers they need and it's directly in the product. And so, yeah, it sounds like, John, you're kind of articulating there's probably a lot more of those sorts of like less sexy use cases, but highly valuable, making sure people can get the information they need when they need it. So I'd love to actually dig in more on like, where is the work? Like, what is the work that we're doing? And I think I'll kind of bring us back to the frame we shared in the last episode, the part three of this episode where we talked about Demangi's work in AI. And I think we kind of mapped it into three buckets. One was sort of direct to client use cases, another was how can AI support community health workers? And then another was around ecosystem and tools. My sense from conversations with both of you is that that AI for CHWs is actually one of the key places where we're investing, although maybe we're still investing in all three. But I'd love to just hear how are you guys thinking about our priorities? And if we are prioritizing the AI for CHWs, why is that a focus now?
C
Yeah, I mean, we have done a lot of prioritization of the AI for CHW use case, both because it's one of the most common use cases run on our CommCare platform and it's the major focus of our new Connect platform. So whether that's a coaching agent or a Q and A agent for the frontline worker themselves, themselves around how to use our technology or more importantly probably how to properly implement the program. So we have a use case for kangaroo mother care where frontline workers are going out to small and vulnerable newborns to teach coach help families and mothers care for very high risk birth and how to make sure that that baby is healthy and survives. And there's a lot of complexity to how to do that intervention properly. There's a lot of logistics to coordinate with the family, there's a lot of thinking through how to train the mother to do it properly. And so you can imagine a lot of questions a frontline worker might just have on how to do the intervention, ignore our technology or the role that Dimagi plays. And that's a great use case to be able to support because it's heavily evidence based. There's a lot of research that's been done on how to do this properly so that can be more readily accessible to the CHW and the frontline market. That's a great use case for AI. And we continue to do a lot of direct to consumer, direct to citizen work as well on all sorts of health topics and issues that are all going quite well. In fact, to my surprise, sometimes in terms of the acceptability, the quality of these use cases, that's really exciting. One question we keep coming back to though is what's the end game here? You're not going to have 50 chatbots you have to interact with to get content for what you do as a community health worker. You're not going to have 50 chatbots to interact with as a citizen. And so that's always in the back of my head, is like, okay, these definitely work. Once we've set it up, taught the user how to use it and engage with the chatbots we've created. And it's great when we're running it, but that's not scalable. KMC Mother Care is one thing that frontline worker is doing. She's doing 20 things that she might need help on. So thinking through what's the level of specificity versus generalizability is an interesting challenge in our work. But at the platform level, one thing. Brian Love for your comment, that's been surprising. So when the Commcare team built this chatbot that's gotten really positive feedback for supporting CommCare. We didn't constrain the team to use the platform Brian built with the team for Open Chat Studio. We did a whole market assessment and the combination of the features that we built in our tool just so it's easier for us to kind of build it, fast deploy it, fast deploy it in many different formats and evaluate it. There wasn't another product on the commercial product on the market that we thought would be superior to using our own tool internally, which to me was a very surprising finding. It was kind of a testament to Brian and his team because you can do a ton of work on getting the right documentation into the agent. You can do a ton of work on fine tuning the agent interaction itself. You can do a ton of work on evaluations. These are all very difficult, separate problems, but if you don't have them all in the same tool, it makes it incredibly difficult to have confidence you can roll something out and know that it's actually doing well. We were talking about the chatbot that you just released for Commcare and I kept asking Jillian, our managing director of that division under what conditions are you going to have to turn this off? How will you have the data to know this is actually more annoying than it is good? And so just thinking through those problems became interesting. The tool that we've built, although on its face, an agent building platform sounds like the least smart idea to do right now, given just the explosion of investment going into AI. But when you really take a step back and look, connecting these use cases end to end and being like, what does it really take to think about something, to build it, to deploy it and to evaluate it, there aren't a lot of tools that we found at least, that really help you do that end to end very well.
B
Yeah, we tried to get him to use not Open chatsudio. We actively threw out some alternatives and certainly didn't force the issue that they did on Open Chat Studio. So it was certainly nice for the team and a positive reinforcement, I guess, of the work that we've been doing. I think maybe I want to go back to one point that I raised about getting in the transcripts and kind of doing it. It applies in every use case, right? So the team that runs CommCare, they're going and they're actively looking at those transcripts and they're double checking the answers that the chatbot's giving and generally coming back with really positive sentiment. And if they see something that isn't working right, then they can kind of take that back and add that to their evaluation framework and try to iterate and get an even better version of the chatbot. But I think that that sort of finger on the pulse and that ability to just like, you know, absorb and develop the intuition for how things are going and have some sense for it is really important as we kind of move forward. We've been rolling things out across all of the levels that you mentioned, Amy. So we have a product project, we're piloting some of the direct client work in Kenya and Senegal right now. And we've had really interesting as a team, we were looking at a transcript the other day and that was really interesting to see because the young person who was using this family planning chatbot, they started by kind of asking a bunch of questions that had nothing to do with family planning. They started by asking like, oh, hey, chatbot, what do you think of my country? And Chapat's like, oh, I don't really have an opinion. It's like, oh, well, do you. Are there some good restaurants here? And like, you know, they like, really like started with, with kind of innocuous like completely unrelated. And then they dipped a toe into the family planning waters and we're like, oh, let's, you know, what do you know about, about the pill? And like, I'm trying to gain weight. Is this going to help me gain weight? Is this a good way to do it? And it's like, well, this is maybe sometimes a side effect, but not. There's not a good use of it, you know, and gave reasonable answers to it. And then half dozen questions later or something, they're like getting more and more detail. They're starting to talk about menstruation. They're starting to talk about like, real issues that they're, you know, dealing with. And like, at some point got like, quite personal. And we're talking about some traumatic things that had happened previously and kind of where the state of things were. And you know, all of our safety tools worked correctly, so things were kind of escalated up so that real humans could take a look at this and decide whether they need to intervene or not. But it was really interesting to, to just kind of see the trajectory of like, yeah, let's start about like, what do you think of my country? You know, and like, there's kind of that idle chitchat to just kind of like work your way in and like, it's responsive, it's coming back quickly, it's speaking to me in my language. And then I'm like slowly opening up. And then suddenly where I had this horrible thing happen in the past and I still feel some trauma. And like, this is this normal? And yes, this is normal. Like, you know, and being able to refer and connect to external services and things. And so I think it's nice to see when things are working, but it also feels very important to be reading those transcripts and understanding and ensuring that all the safety mechanisms that we've set up and obviously we've tested in quite detailed ways. But like, it's nice to see that all kind of functioning correctly and hopefully there's some value coming out of this and engaging with that.
A
That's an interesting example, Brian. That kind of makes me think about a book that I think, John, you've read Culture Map, where they talk about different ways that folks from different cultures engage. And I think Americans are very much like, we just get straight to business. And there's like a whole other end of the spectrum of cultures where you actually just need to know someone as a human before you're actually going to do any kind of business with them. And that example reminds me of that. I Just want to get to know this chatbot a little bit. Who is this person? Can I trust them? Fascinating.
B
I lived in Tanzania and did a lot of work there in the past and got to learn some Swahili. And there in Tanzania specifically, when you arrive at somebody's house, you could choose from a large set of available greetings. But you start with a greeting, you'll get a greeting back. Like, how's it going? Oh, everything's fine. How's your family? Everybody's fine. How's this going? How'd you wake up this morning? It goes back and forth and back, and you do like, a half dozen of them. And the answer is always, everything's going great. That's how everybody responds to these greetings. And you kind of do this back and forth. And then at some point, somebody will take a deep breath and they'll say, now or sasa. And then they'll tell you how their kid has been sick for three weeks and how, like, they've just lost their job and, like, suddenly they'll, like, connect. But, like, there's that sort of necessary, like, warming up period. And just, like, greeting. And we're like, yeah, you're here, I'm here. Everybody's okay. Yeah, we're okay. All right, let's. And then you're like, now, let's get down to business and talk. And there really is like a, you know, in some places, in some cultures, there really is, like, a lot of. Of kind of preamble that kind of gets everybody primed and ready for that. So I wonder if that's a similar thing that we're seeing.
C
I remember being in Tanzania, Brian, when you lived there, and we were going around and you were like, just so you know, this interaction is about to take a little bit of time. And I remember leaving because I can't understand Swahili. And Brian was like. I was like, so what part of, like, when did we get to the, like, actual meat of the conversation? He was like, oh, about, like, halfway in the conversation. It was like a 20 minute conversation. I'm like, the first 10 minutes of our 20 minute conversation was filler, you know, as I really called it. But it was. It is really fascinating to think about that, Brian, in terms of how much, you know, preamble to conversations we should be expecting depending on the cultural context. And maybe to your point, like, you know, certain cultures might want to get straight to the point and be like, I have this, you know, very specific question on family planning. But others might be like, oh, no, we're going to do this the, the way I interact with humans and we're going to, you know, start abstract and then if I like you, we'll get to a real discussion.
A
So beyond the work, thinking about how AI can support a chw, and I think it sounds like we're seeing early promising results, but also just some bigger questions there. So we'll be interested to kind of keep following that thread with you both and with others. But Brian, I know you're also leading a few really important research efforts, one around bias in LLMs and another around LLMs for lower resource languages that you touched on. I'd love to kind of hear a bit about what you're learning from those efforts, perhaps starting with the bias work.
B
Yeah, thanks, Amy. I think there's a lot of very popular examples. We can go back a couple years and just look in the news archives there about bias bubbling up in language models. And I think we hear about it a little bit less these days, in part because all of the companies that are training these big frontier models have put a lot of energy into trying to train that out of the models and trying to get better alignment with values and things. And one of the things that I've always wondered that we've kind of chatted about internally is that's all fine and good for the things that you know about are sort of the, the kind of obvious sources of bias, but what about versions that exist in LMICs or in other sort of deprioritized communities? So we've just started this work. This is very early days, but one of the first things we did to kind of explore this was ask the model to generate some narrative. So I think we asked one of the frontier models to generate a single short story about a shopkeeper, a thief and a customer in Nairobi. And for each of those characters, they should give a name and the tribe that the customer comes from. And we ran that, I think, 10 different times for the same model. And 90% of the time the shopkeeper was Kikuyu. This fits kind of a Kenyan stereotype of Kikuyu people being very entrepreneurial and very business oriented and everything. But then what was surprising was that 80% of the time the thief was Luo. And that's certainly not a stereotype that I'm familiar with within Kenya and the. I don't know if I've completely got my head around it, but the work that we're doing is trying to explore this and surface this. And we want to. Let's change the prompt slightly and see if this is robust and whether these kinds of things exist. Because the way that I understand, which is tenuous, but the way I understand that these models work is that there are these embeddings, this, like crazy hyperspace, and all these concepts get kind of linked together. And if it is the case that for that particular prompt, the thief and LUO concepts are kind of closely linked together, then those links exist within the model. And when you're engaging with the model in a different way, those links still kind of exist, even if they're not articulated. And how does that affect the outputs? It surfaces some potential issues that could cause undesired behavior in other circumstances. And so I think our goal is really just to sit down and look from an LMIC perspective. We're doing this work in Kenya and also in Nigeria, and trying to understand how the frontier models have kind of absorbed and codified essentially, various biases and stereotypes into the models themselves. So I've got a team that's working on that now and looking forward to kind of seeing where that work goes.
A
Yeah, that's fascinating. Brian, what is the like at the end of that work? How are you thinking about kind of sharing what you learn and using that to actually improve things?
B
Yeah, because it's exploratory. We're not trying to solve anything. We're really trying to articulate the problem and we're trying to surface those examples. And so I think we want to kind of share what we find large and wide, whether that's through blog posts. Maybe we'll talk about it more in a future episode here or putting together a report with kind of more details both about our methodology and kind of the findings. I'm really interested to see how these findings differ between the different models. We know that certainly the different major models are all different, but even the different versions of particular models can vary quite a bit. And we've seen that in some of our other work. Yeah. So I think our goal then is to package it all up and be able to distill some of those learnings and put it out there in as many different forms as we can.
A
Yeah. Awesome. We'll definitely look forward to kind of picking that trail up in the next conversation. And then similarly, maybe in a nutshell, what are you finding on the low resource language front and how models are handling them?
B
Yeah, so this is a good segue. We're seeing a lot of differences between the different models and between the different versions themselves. So the main thing that we're focusing on is natural language generation. So there are existing benchmarks for natural language understanding where you feed a bunch of target language or foreign language material to the model and see how well it does on a multiple choice quiz, for example. But there's less work happening on the natural language generation, like how well the models are able to produce a particular set of sentences or text in a particular target language. And the reason is that, you know, we, we can't figure out a good way at the moment beyond human review for actually marking those. That's obviously the gold standard. So we're asking models to generate across a range of different topics, asking all the frontier models to generate sentences and then shipping those off and, and sending those to, to colleagues to do human review from native speakers and, and linguists and then coming back and looking at those and a few things that are kind of surfacing is that there is a big difference between the different frontier models across languages. There are some models earlier reports. Gemini seems to be quite good. Gemini 2.5 Pro seems to be quite good across different languages. But we're also seeing really interesting things. Like at least in our work, it appears that GPT5 is worse than GPT 4.1 in Swahili seems to be better in almost all the other languages that we measured. But in Swahili there's a drop in performance. And similarly a lot of the Claude 4 I think we did Claude 4 sonnet, I think those were worse in performance than Claude 3.5 sonnet. And I was talking to an engineer at Google about this at some point and he was like, yeah, it's actually not a surprise if you don't have evaluations that are kind of built into the system. When you're training these models or rather doing the post training, you can kind of train those things out. So the way it works is there's a big pre training phase. They ingest a whole bunch of language and then there's a very extensive post training phase where they do a lot of that reinforcement learning with human feedback and supervised fine tuning and everything to really get the alignment and to train for certain things. And I think there's been a recent push of getting the models better and better at coding. And his point was like, oh yeah, if you don't have evals that are checking for specific language performance during that process, you can degrade the model's performance in a particular language. Just kind of naturally as it gets better with the other things, it kind of forgets these. Our goal there is really just to take a look at what the state of the art models look like across a range of different languages. Get that information out there so that people can see it, so that AI labs can see it, so that other people who are building on top of these models can see it and use it. And it's really an ongoing process. I mean, we need to be doing this as frequently as possible, but probably quarterly or something in order to refresh. I mean, at the time of recording, Gemini 3 is rumored to be around the corner. They've already released Claude Sonet for 4.5 and Haiku 4.5 since the time we did the testing. So the rate of kind of model release makes this a very, very quick thing.
A
It's so interesting, Brian, and I'm really glad that you're spearheading this work. And one of the themes I'm taking from that is like just the evaluation piece just still being so important. Right. For the models to be evaluating how they're performing in each of these languages. John, I'm curious. There's a lot of investment happening in AI, but a lot less so in AI for good, is my understanding. How is the overall investment in AI affecting progress on AI for good?
C
Yeah, and this mirrors themes that we've had in the last several episodes. Although when we were talking back then, even though the rates of investment were crazy, they've gone way up. So I heard estimates that a billion dollars a day of venture capital is going into AI companies right now. But it's a ton of money. And for a lot of use cases, there's kind of this general AI for good of like we'll just wait for the frontier, huge big tech companies to do it and just use what they have. It's like when you look at how this played out with every other phase of technology waves and that's not how you reach those most in need. The use cases that might be very high impact. I think there's always going to be this significant gap between where all this money is going, which for all intents and purposes right now feels infinite, and how it's going to reach people who most need technological solutions to help improve their lives and livelihoods. The thing with AI though right now is I think a lot of people started to be concerned we're in a bubble in terms of AI investment, just like we were with the dot com boom and at other periods. And one of the things that I read recently, I forgot the author, but they were talking about how we don't need to be wrong, that AI is going to change everything, just like we weren't wrong that the Internet was going to have this huge impact on commerce and on society. You just have to be a little bit off on the timing for a lot of stuff to go wrong. So if it just takes an extra year or two commercially, that means those companies are going out of business for AI for good, you don't have to be that impatient. So one of the upsides of AI for good is slow and steady investment can go a really long way when put into the right problems over a long period of time. And so we're spending a lot of our time in trying to advocate for funders to spend time thinking about how AI can really transform community health workers and support them, because we believe so deeply in the positive ROI and cost effectiveness of community healthcare workers. So that's a use case that we really want to keep pursuing and keep investing in. But these models are going to keep getting better, a lot better, potentially. And so there's also this excitement of with a billion dollars a day flowing in this industry, even if a lot of companies end up not surviving, a lot of progress is going to get made. And hopefully the work that Dimaghi and others are doing can bridge that progress in applying it towards use cases that we care about, that we think are some of the most important in the world. But I think the increased level of investments are only making it a bigger gap, I think in terms of all the focus being on commercial use cases right now versus any thinking going into AI for good. So we're proud to be among the people thinking about that. There's plenty of others. But just the huge amounts of investment, I think everybody, and it causes everybody else to have to be really focused commercially. So I think conversations that maybe OpenAI or anthropic or Google were interested in having a year ago, given the arms race that is the AI field right now might be unfortunately less able to make the time to think about those things right now.
A
John, am I kind of hearing in there, Is there some sort of benefit that we're being forced to move a little bit more slowly than in this AI for Good space, since so little of that investment really trickles down into the work that we're doing?
C
I don't know if it's a benefit that we're reinforcing slowly. I think the world could definitely use more investment into technology adoption for important underserved communities. And that happens to be AI right now, but I think that's always something we've advocated for more. But you know, Brian listed some really important use cases that don't require necessarily like a different non commercial focus, like a weather model that Google produces that works really well in Kenya, probably also works really well in commercial markets. So there's plenty of both ands to be had in all this AI investment going on that can just be adopted for the same. But there's going to be a lot that won't. Whether it's because the language isn't a good fit, whether it's because the use case isn't safe, whether any of these other factors are out there. But I think the benefit of AI for good is the space has been trying to do good for a long time, knows how long it takes and aren't subject to investment cycles. So I just think it's is not necessarily a benefit, but it's just the reality of a lot of these use cases are going to take a bit of time to push and it can unfortunately feel slow at times because the commercial side of AI is just moving at a breaknet speed. But we are seeing the benefit of that and as those new models come out we incorporate them right away. Other companies do as well. So you are getting a benefit of that momentum. But for some of these core use cases, there's some people who like haven't had a smartphone before and like picturing them getting a smartphone and like ChatGPT is their front door to AI is like crazy in terms of that's how AI is going to help that individual. And so there's got to be that investment to continue to happen.
A
Final, final question for you both. What is the, if you think about our listeners, funders, implementers, social enterprises, what is the number one thing you want them to take out of this conversation and to take forward as they're thinking about AI in their work?
C
I would strongly advocate for funders and implementers to be doing AI projects, AI use cases, AI research, where if the AI works as you hoped, you can immediately turn it on at bigger scale. I think it really changes your mindset into how you think about evaluating it, how you think about building it, how you think about maintaining it. When your goal is that if the evaluation is like you know, 80% or higher, the chats were good, you're immediately leaving it on and you have a reason why you want to leave it on, whether that's an economic benefit to your organization or to the funder or others. But we should, we should learn from the pilotitis we had with digital health back in the day where like your brain just doesn't work the right way when you're doing pilots that, you know and, you know, no matter how good they are, like, I think it really does produce better work. It produces higher value work when your hope is that you leave it on if it works.
B
Building on John I think one of the challenges of that is aligning incentives. I think if a funder has a particular incentive and says, I'm working in this vertical and I really want this thing to work, and then a behavior change organization has a different incentive, which is they need to, they need to address all verticals for their particular, you know, clients that they're, they're focused on. And, you know, the tech group you want to hopefully be, you want some constraint to be able to do enough of the safety testing to feel confident and everything. And so kind of aligning all those things so that there is a clear path to like, yes, this will be useful for everyone if we, if we switch this on and kind of blow it up. I think, I think that's important. I suspect I said this last time, but I think my big thing would be to go use the tools, go play with them. They're changing, they're getting better at some things, they're getting a little bit worse at some other things. But I think just continuing to engage with it and explore and try to build stuff and try to break stuff and try to develop that intuition of what is working, what isn't working. And then as you're building stuff, I think we've said it enough, probably on this particular episode, but get into the data and actually look at what's going on and read all those transcripts and be like knee deep in everything. That's the only way that you're really going to get a sense for what's going on.
A
Thank you both so much. Really appreciate your time.
C
Thanks, Brian.
B
Thanks, Amy. Bye, Tom.
A
A huge thank you to Brian Derenzi for sharing his insights on the rapidly evolving world of AI for good. And as always, thank you for listening. My head is spinning with takeaways from this one, but a few key things really stand out for me. First, we have to own the thinking. As Brian said, AI is here to accelerate human intention, not replace it. We still have to do the hard, critical work with our own brains and let the AI models support us to bring our thinking to life. Second, we have to own the review. You can't just deploy a bot and walk away. You have to get into the data, read the transcripts, and keep your finger on the pulse of how these tools are actually performing. Third, let's avoid pilotitis. In AI for Good, Jonathan made a powerful point that we should build AI projects with the intent to leave them on. That mindset forces a level of rigor and practicality that's essential for real world impact. And finally, just get your hands dirty. Brian's advice to go play with the tools is the best way to build intuition about what's possible and what isn't. It's clear that while it's still early for AI for Good, the potential for impact is enormous if we approach it with intention. That's our show. Please like rate, review, subscribe and share this episode if you found it useful. It really helps us grow our impact. And write to us@podcastemangi.com with any ideas, comments or feedback. This show is executive produced by myself. Parthana Balachandar is our editor, Natalia Glowacki is our producer, and cover art is by Sudan Chikanth.
Date: November 13, 2025
Host: Amy Vaccaro (A), Jonathan Jackson (C)
Guest: Brian Derenzi (B), VP of AI & Research, Dimagi
This episode of High-Impact Growth is the fourth in an ongoing series unpacking the state and future of AI for Good, especially as it relates to global health, digital solutions, and technology’s role in social impact. Dimagi CEO Jonathan Jackson and VP of AI & Research Brian Derenzi join host Amy Vaccaro for a candid, detailed exploration of rapid technological advancements, the practical and ethical complexities facing the sector, and Dimagi’s latest research into hidden bias and language barriers in large AI models.
The conversation revolves around the central notion: AI accelerates human intention—but it can’t replace critical thinking. The team muses on the evolving role of AI in daily workflows, the importance of "getting your hands dirty" with these tools, the risk of unchecked deployment (“pilotitis”), and the significant gaps (and potential) in AI for Good caused by a surge in commercial investment.
Brian Derenzi: AI remains “intellectually stimulating in like a dopamine hit kind of way”, yet Brian retains concerns about dystopian outcomes, feeling both optimism and anxiety over AI’s speed of development:
“You know, thinking about ... all the positive upsides and then also recognizing ... the potential for dystopia and trying to actively work against that. ... But it is wild how much more capable things are ... just even in the last 12 or 18 months.” (B, 02:13)
Jonathan Jackson: Personal AI breakthroughs—from coding alongside his kids to shifting his professional expectations—have “fundamentally changed” what’s possible:
“Shortening the distance between the people who are envisioning what they want to build and the people who are building ... just feels like you can close that gap so much more than was feasible pre-AI.” (C, 03:39)
Brian’s Workflow: AI serves as “an electric bike”—flattening steep hills, making difficult tasks approachable—but “you can't skip the thinking part”:
“At the end of the day ... what I'm sharing with my colleagues is not something that came out of the machine. The machine just helped me get to that faster...” (B, 05:51)
Iterative Use: AI output is always a draft—Brian heavily revises AI-generated content after brainstorming, using tools like Whisper and model feedback, then applies critical review.
Jonathan: If AI replaces critical thinking, organizational and social development could stagnate:
“Where's the next senior cohort coming from if that's the world we move into? ... as AIs continue to get better and continue to replace more and more of that critical thinking skill set, it is really challenging...” (C, 09:20)
Brian: AI can defer, but never eliminate, the need for human judgment:
“If you generate a bunch of AI stuff ... you're just like kicking the can down the road. ... eventually somebody's going to have to look at the thing.” (B, 11:46)
Hands-on Review Mandate:
“You have to look at the data. ... you have to be like deeply involved. ... Like they spent a ton of time ... looking at data and understanding the ways that things are failing.” (B, 11:46)
Evaluating Value:
Is a bot that deflects a community health worker’s personal questions successful? Is engagement or outcome more important?
“Is the goal that the CHW comes back to our AI-enabled chatbot over time because she likes it, or ... that we made her more money at some point? ... really difficult questions ... with very complex answers.” (C, 14:46)
Manual Transcripts Review:
Jonathan and Brian insist on leaders and teams directly reading AI-bot transcripts to develop “texture” and intuition for what’s happening in interactions.
Still Unsettled:
“I think I would say it's still finding its footing ... People don't know where the low hanging fruit is.” (B, 18:29)
Challenges for Low-resource Languages:
AI development remains hindered by lack of sufficient training data (“need a billion tokens”—impractical for many languages).
Dimagi’s Priorities:
Focused on more “forgiving”, lower-risk AI support and training (vs. high-risk clinical AI). Building bots for support, sentiment analysis, and frontline worker assistance appear as promising use-cases.
“Operational AI”:
Boring, middle-management streamlining (e.g., spreadsheet creation, support workflows) has huge potential but is underexplored in global development.
CHW Coaching Bots:
Used for training, logistics, and direct question support in interventions like Kangaroo Mother Care.
“You can imagine a lot of questions a frontline worker might just have on how to do the intervention ... that's a great use case for AI.” (C, 25:07)
Scaling Challenge:
Real-world context means workers can’t juggle “50 chatbots”—the team must weigh specificity vs. generalizability.
Internal Platform Building:
Dimagi’s Open Chat Studio was the only tool that “did evaluations, documentation, and fine-tuning” in one, justifying building their own despite the rapidly expanding AI tool market.
Real-world Example:
Review of actual transcripts from a family planning AI bot in Kenya showed users start with casual, even off-topic questions (“what do you think of my country?”) before building trust to ask sensitive, personal questions.
“At some point got like, quite personal. ... all of our safety tools worked correctly, so things were kind of escalated up so that real humans could take a look …” (B, 29:07)
Cultural Context:
The path to trust and meaningful interaction with an AI often mirrors cultural patterns of conversation, requiring time and patience (“preamble” before directness).
Bias in AI Models:
Examination of story generation by large models revealed quietly entrenched stereotypes not found in lived Kenyan culture (e.g., shopkeeper almost always Kikuyu, thief usually Luo).
“The work that we're doing is trying to explore this and surface this ... how the frontier models have kind of absorbed and codified ... biases and stereotypes into the models themselves.” (B, 36:03)
Low-resource Language Performance:
Dimagi is benchmarking generation across languages, relying on human review for accuracy. Results vary widely across models and between model versions (e.g., GPT-5 worse than GPT-4.1 in Swahili, though better elsewhere).
“There's a big difference between the different frontier models ... at least in our work, it appears that GPT5 is worse than GPT 4.1 in Swahili ...” (B, 40:39)
Commercial AI Investment at an All-Time High:
“A billion dollars a day of venture capital is going into AI companies right now.” (C, 44:40)
Upside:
The pace of commercial innovation means nonprofits and social sector organizations can benefit as models improve, but “slow and steady” investment in AI for Good is still necessary for real social progress.
Brian on critical thinking in AI:
"I've not seen versions where you can kind of skip the thinking piece." (B, 05:51)
Jonathan on generational shifts:
"Where’s the next senior cohort coming from if that's the world we move into?" (C, 09:20)
Brian on reviewing AI output:
"You have to look at the data... you have to be deeply involved... that's how you develop some intuition and get some sense of what’s going on...” (B, 11:46)
Jonathan on what success looks like:
"Is the goal that the CHW comes back to our AI-enabled chatbot over time because she likes it, or that we made her more money at some point?" (C, 14:46)
Brian on entrenched bias in LLMs:
"Ninety percent of the time the shopkeeper was Kikuyu … 80% of the time the thief was Luo. ... That’s certainly not a stereotype that I’m familiar with within Kenya..." (B, 36:03)
Final Advice
"Get into the data and actually look at what's going on … That's the only way that you’re really going to get a sense for what’s going on." (B, 50:53)
"We should learn from the pilotitis we had … it really does produce better work when your hope is that you leave it on if it works." (C, 49:56)
For further exploration, visit https://dimagi.com/podcast/.