Podcast Summary: “Global Progress in the AI Era: Why Investing in Evidence is Key to Translating AI Hype into Impact”
Podcast: This Week in Global Development
Host: Devex | Global Development
Episode Date: March 30, 2026
Overview
This special episode, hosted by Katherine Chaney (Senior Editor, Devex), explores the intersection of artificial intelligence (AI) with global development, focusing on the critical role evidence plays in ensuring AI’s promise translates into genuine societal impact. The discussion convenes three leading thinkers in development evaluation: Dean Karlan (Northwestern University, former USAID Chief Economist), Iqbal Dhaliwal (Global Executive Director, J-PAL at MIT), and Tamina Madden (Co-founder, Agency Fund).
The episode delves into:
- The necessity for robust evidence in an AI-fueled development landscape
- How to evaluate and scale AI interventions effectively
- Lessons from real-world AI deployments
- Best practices and challenges in moving from pilots to impactful programs
Key Discussion Points & Insights
The Nature of Evidence in the AI Era
[02:21–03:37] Dean Karlan
- AI is evolving rapidly, but “the fact that the tools are rapidly evolving is not an excuse to not test what works.”
- Evaluation approaches must adapt, documenting and assessing changes as they occur. Interventions should be learned from, iterated, and not assumed effective simply due to novelty or hype.
“We need to still be able to be adaptive in how we go about learning. We need to be fast as well. And we need to document what those changes are... maybe a change that gets made is actually for the worse.” — Dean Karlan [02:40]
The Playbook Approach to Evaluation
[04:26–06:39] Iqbal Dhaliwal
- J-PAL’s new playbook distills two decades of learning from development evaluations, now contextualized for AI.
- Reminds that true development outcomes take time (e.g., education outcomes in a year, crop yields in months), regardless of technological intervention.
- Cautions against “silver bullet” thinking—great results in the lab can falter in the field.
"We have a long history of getting enamored with silver bullets which look great in the lab but fail in the field... Is AI a hype? Is there actually a lot of substance in AI? The Playbook tries to distill those lessons and create a roadmap for real impact." — Iqbal Dhaliwal [05:30]
[07:00–10:13] Tamina Madden
- The Agency Fund’s playbook focuses on nonprofit and tech builders, emphasizing evaluation as part of product development, not just a final milestone.
- Advocates a four-stage continuous evaluation framework:
- AI System Evaluation (technical performance)
- Product Analytics (user engagement)
- User Evaluations (behavior, trust, adoption)
- Impact Evaluation (does it achieve the intended social outcome)
"Let’s build for users and not necessarily for systems. That’s what’s going to shift development... Evaluation is one of the strongest elements of good product development." — Tamina Madden [07:22]
Barriers to Evidence-Based Practice
[10:38–12:54] Iqbal Dhaliwal
- The AI “bandwagon effect” drives organizations to focus on engagement stats (clicks, downloads), while neglecting true impact.
- A fear: governments could embed untested AI into core systems, making it hard to identify harm and even harder to reverse bad implementations.
"Did you give up all these fantastic careers just to worry about how many clicks do you have... or did you give all of this up because you cared about improving health outcomes and learning outcomes?" — Iqbal Dhaliwal [11:29]
[12:54–15:31] Dean Karlan
- It’s easier to measure digital engagement than real-life improvements (income, health, education).
- Seeking short-term proxy outcomes is critical for iterative development, but these must be validated as predictors of long-term goals.
"Of course you want to find something that's a process or immediate outcome along the way so that you don't have to wait 10 years every time you make a tweak to find out did it work or not." — Dean Karlan [14:45]
The Silicon Valley Mindset and Development
[17:03–20:48] Tamina Madden
- There is a culture of A/B testing in tech, but frictions exist:
- AI is new to target users (e.g., rural populations unfamiliar with ChatGPT/WhatsApp bots).
- Rushing from prototype to impact evaluation skips necessary adaptation—example: a maternal health bot saw reduced engagement from women after shifting to conversational AI.
- We lack good proxies for quick measurement of long-term impact—especially outside of health fields.
"Digital Green... unleashed an AI app that answers farmers’ questions. But we are learning that farmers don’t trust it off the bat... They’re asking questions they know the answers to, to validate whether they can trust this new technology." — Tamina Madden [18:08]
AI Labs, Partnerships, and Scaling Well
[21:46–23:05] Tamina Madden
- Tech partnerships often lack robust, ongoing evaluation transfers; NGOs need the capacity for continuous iteration, not just one-off studies.
"Unless you have continuous evaluation built into your pipeline and automated, you won’t see NGOs doing this... we’re not seeing that capability transferred as much to the development sector." — Tamina Madden [22:17]
[23:05–27:01] Iqbal Dhaliwal
- Tech firms excel at designing for scale, but nonprofits often mistake “scaling what works” for “what works at scale.”
- Five principles:
- Minimal, robust tech
- Cost effectiveness
- “Real-world messiness” (policy, political, operational constraints)
- Early and ongoing capacity building
- Willingness to fail and iterate
"What works at scale is not the same as scaling what works... The real world is messy; capacity building cannot be an afterthought." — Iqbal Dhaliwal [24:29]
[27:15–28:27] Dean Karlan
- AI offers unique potential for customization at scale, but governments especially must ensure inclusivity—one-size-fits-most is insufficient in public service.
Notable Quotes & Memorable Moments
- On hype vs. impact:
"If we don't understand what works, projects with the best sales pitch will get interest and funding, not necessarily what works." — Host (paraphrasing Dean Karlan) [03:37] - On product design for real users:
"The most successful companies focus on building for people… If we can design for people, we get closer to impact faster." — Tamina Madden [07:25] - On the risk of path dependence:
"In this rush to build into systems, we will get into path dependence... it will be very hard to extract that out of our operating system." — Iqbal Dhaliwal [12:26] - On lesson from Togo’s COVID cash transfer:
"If we only relied on the cell phone data for impact measurement, we would have concluded the cash transfers had no impact. The survey told a very strong, consistent story of positive impact. So... it just adds to this kind of cautionary tale." — Dean Karlan [32:13] - On what policymakers should remember:
"Don't take too much for granted... ask, where’s the evidence? Don’t limit oneself to thinking evidence is only big, large-scale impact evaluations." — Dean Karlan [34:52] - On AI partnerships:
"Tripartite partnerships—government, nimble tech, and academic evaluation—are promising. Academia provides research, but not rapid iteration; nimble NGO partners can ensure constant feedback and improvement." — Tamina Madden [37:08] - On going beyond the tech:
"Let evidence be your starting point. Sometimes the problem is a policy or political constraint, not the technology itself." — Iqbal Dhaliwal [39:08]
Real-World Example: Togo’s COVID-era Cash Transfers
[29:11–34:13]
- Urban targeting leveraged voter registries for fast rollout.
- Rural targeting utilized cell phone metadata and machine learning for poverty prediction.
- Cell phone data was excellent for targeting long-term poverty, but poor for measuring short-term impact—only old-fashioned surveys captured true changes post-intervention.
- Key lesson: AI tools can accelerate action but are no substitute for measuring real-life impact through rigorous evaluation.
Practical Advice for Different Stakeholders
-
Government leaders:
Insist on real evidence before scaling, don’t be swayed by hype, and focus on partnerships that prioritize continuous learning and real-world problem-solving [34:52, 37:08]. -
NGOs & implementers:
Embed evaluation at every stage; avoid skipping straight to impact evaluation before establishing user fit and operational effectiveness [17:38, 21:46]. -
Big tech/AI labs:
Contribute not just initial technology, but continuous evaluation and iteration capability. Design products for scale, robustness, real-world messiness, and cost effectiveness [23:05–27:01].
Timestamps for Important Segments
- Opening and Introduction to Evidence in AI: [00:00–04:26]
- Playbook Approaches & Frameworks: [04:26–10:13]
- Challenges to Evidence Uptake: [10:13–15:31]
- Product Development vs Evaluation Mindsets: [16:23–20:48]
- AI Labs, Partnerships, and Scaling for Impact: [21:46–28:27]
- Case Study: Togo COVID Cash Transfer Program: [28:27–34:13]
- Advice for Policymakers/Stakeholders: [34:13–41:30]
Closing Thoughts
The episode underscores that while AI possesses game-changing potential for development, translating its hype into impact requires a rigorous, adaptive, and user-centered approach to evidence generation. Rapid iteration, local partnerships, willingness to fail, and attentiveness to political, social, and operational realities are key. As excitement outpaces evidence, recommitting to old fundamentals—while adapting them for a fast-moving tech era—is the surest path to meaningful, measurable progress.
