Humanitarian Frontiers — Peering Through the Data Scope: Uncovering the Unknown

Date: February 10, 2025
Host: Chris Hoffman
Guests: Jeffrey Wag (PhD, CNRS & Relief International), Matthew Harris (PhD, formerly Datakind & WFP), Nassim Motelabi (co-host, WFP)

Episode Overview

This episode of Humanitarian Frontiers in AI explores the critical role of data in the adoption of AI within humanitarian organizations. Hosts Chris Hoffman and Nassim Motelabi are joined by renowned data scientists Jeffrey Wag and Matthew Harris, who candidly address the complexities, pitfalls, and opportunities that come with leveraging data for AI-driven solutions in challenging, resource-constrained environments. The panel moves from foundational questions (“What is data in the humanitarian context?”) to issues of ethics, governance, technical best practices, risk mitigation, and tangible examples of AI's promise and limitations in global aid.

Key Discussion Points and Insights

1. What Counts as Data in the Humanitarian Sector? [05:48–08:39]

Data is More Than Numbers: Organizational data comes in many forms: unstructured documents, PDFs, handwritten medical records, and traditional spreadsheets. Proper digitization and integration are ongoing challenges.
- "Incomplete data sets...are very heterogeneous. Data could be medical records that might be handwritten. One of the projects...is the digitization of these to machine readable form." — Jeffrey Wag [07:01]
Data ‘Cleanliness’ Is Non-Negotiable: The efficacy of AI models hinges on data quality, standardized storage, and architecture. Without a solid foundation, “rubbish in, rubbish out.”
Resource Constraints: Not all organizations have the luxury of dedicated technical staff or robust systems for data management.

2. Data Readiness: Where Are Humanitarian Organizations Now? [08:39–12:02]

Diverse Maturity Levels: Large agencies (e.g., WFP, UNHCR, UNICEF) have invested heavily in data capacities; others are still starting out.
High Hurdles for Qualitative Data: Even mature organizations struggle with unstructured, qualitative information.
- "Even to this date, we still are struggling when it comes to qualitative data management. With the rise of LLMs, we see a new wave of opportunity..." — Nassim Motelabi [08:39]
Does Data or Algorithm Lead the Way? The group debates whether organizations should develop data to match available AI/ML models or vice versa, and if existing data can often be repurposed for new AI solutions.

3. Multisource Data and Guarding Against Bias [10:50–12:02]

Combining Data Types: Lessons from crisis mapping (e.g., Nepal earthquake 2015) show the power in merging satellite data with crowdsourced and local reports, but also the risks of bias and incompleteness.
- "Rather than relying on just one source...combine with social media sentiment analysis. In Nepal, satellite images and local agency reports were cross-checked to confirm damage and deploy aid more effectively." — Jeffrey Wag [10:50]

4. Foundational Needs for AI Initiatives [13:02–16:55]

Three Key Steps for Organizations New to Data and AI:
1. Don’t Build Everything In-House: Leverage secure cloud/vendor solutions rather than starting from zero.
2. Enforce Access Controls: Share data only on a need-to-know basis.
3. Monitor and Log All Data Access/Use: Set up robust logging from the outset.
- "If I had to do three things...don’t try to build it yourself, only give access to those who need it, monitor what's happening with your data." — Matthew Harris [14:07]
Data Protection Must Precede Deployment: Draft internal policies on AI use and personal data well before launching pilot projects.
- "Regulation and policies, especially data protection, are having trouble keeping up with the technology...This is a big risk." — Jeffrey Wag [15:38]
Adopt External Frameworks Early: Even if not legally required, use standards like GDPR from the start.

5. The Cloud, Vendors, and Data Security [16:55–22:08]

Cloud-Centric Data Management: Using public cloud infrastructure (AWS, Azure, Google) and their in-house LLMs is safest.
On-Prem LLMs: Running models like Llama entirely within your cloud tenancy minimizes risks.
- "Having all your data and models within your own enterprise infrastructure is a big step forward...as opposed to everything being wild." — Matthew Harris [19:38]

6. Privacy, External Data, and Sociopolitical Implications [18:17–22:08]

Anonymization and Consent: Uphold 'do no harm’ by scrubbing personal info, and ensure informed consent.
- "Anonymization is essential before performing any sort of AI model development...Personal data goes beyond GDPR—location, for example, is critical in a refugee crisis." — Jeffrey Wag [18:17]
Risks in Prediction: Examples where predictive models output sensitive or politically charged results—requiring organizational caution and sometimes cessation.

7. Best Practices for Risk Mitigation [23:50–27:59]

Choose Safe Use Cases: Don’t experiment with highly sensitive applications first; pick low-risk, high-value scenarios.
Transparency with Data & Model Cards: Document source, risks, and biases of data/models for every AI deployment.
- "If you create an AI product, you should have a card [data/model card] showing the risks and caveats—it's transparency." — Matthew Harris [23:50]
Edge and Federated Processing: Analyzing data locally can reduce privacy risks.
- "Edge processing means data is analyzed locally, not sent to a centralized server...a mitigation strategy." — Jeffrey Wag [25:29]
Auto-Monitoring over Manual Checks: Use built-in cloud tools for real-time alerts on data access/security.

8. Staffing and Talent: Who Should Humanitarian Organizations Hire? [27:59–31:22]

Prioritize Data Engineering Over Data Science: Data roles are foundational; a clean, well-managed data environment is prerequisite for higher-order analytics.
- "Hire data engineers before data scientists...that management of the data is usually not done, and so scientists end up doing it themselves." — Jeffrey Wag [28:58]
Consider Vendor Partnerships: Especially for smaller orgs, SaaS products and managed services may be more strategic than building internal teams.

9. Small Wins and Realistic Expectations [33:33–35:11]

Translation Tools: Real, immediate benefit for cross-cultural organizations. Caution urged regarding low-resource languages and disclosure of AI assistance.
- "Many managers say they notice the quality of work improves due to translation tools. That's a small win." — Jeffrey Wag [33:33]
Failure is Normal: The high failure rate for AI projects (>80%) echoes startup realities.
- "Don't overhype the tech. AI is a tool, not THE tool to solve all problems..." — Jeffrey Wag [35:11]

10. From Internal to Beneficiary-Facing Applications [36:06–43:40]

Chatbots & Generative Agents: More feasible now, especially for structured info access (e.g., aid eligibility, local services). Safety guardrails and citations are mandatory.
- "Every factual claim from a chatbot should have a citation, so users can verify the claim's source." — Matthew Harris [37:24]
Risks in Sensitive Fields: Caution (or outright avoidance) is needed for chatbots in mental health and other high-stakes use cases.
- "We were offered a chatbot for loneliness during the pandemic...I was concerned it could make things worse and harm end users." — Jeffrey Wag [39:30]

11. Optimism, Realism, and the Next Big Thing [44:43–46:42]

Predictive AI for Foresight: AI helps anticipate famines, epidemics, and other emergent crises—enabling earlier, smarter intervention.
- "AI models predict food insecurity...combining satellite imaging, climate models, social media to forecast famine hotspots or disease outbreaks." — Jeffrey Wag [44:43]
Local Context in Models: Swahili LLMs and regionally-grounded models can improve frontline impact.
AI in Software Development: The next ‘quiet revolution’ in humanitarian tech may be code-generation AI and autonomous agents automating repetitive and complex technical tasks.

Memorable Quotes & Moments

On AI Data Foundations:
"For AI, if you don't have clean, solid data and a good architecture around it, the AI is not very good, isn't it? It's rubbish in, rubbish out."
— Matthew Harris [05:48]
On Vendor vs. In-House:
"Don't try and build it yourself. So many organizations try to build everything themselves. Pay a little bit of money for secure cloud or vendor solutions—often it's far cheaper and safer."
— Matthew Harris [14:07]
On Data Before Algorithm:
"After data protection...one of our toughest challenges is the incompleteness of datasets...they might be incomplete."
— Jeffrey Wag [10:50]
On Chatbot Precautions:
"Every factual claim in a chatbot should be grounded with a citation. Click it and see the source. I won’t release one that doesn’t have that."
— Matthew Harris [37:24]
The Real Revolution:
"The real revolution is coming with software development—generative AI as a copilot for writing software."
— Matthew Harris [45:32]

Notable Timestamps

[02:52–04:23] Panelist Introductions (Jeffrey Wag & Matthew Harris backgrounds)
[05:48–08:39] What is data in humanitarian organizations?
[10:50–12:02] Managing incomplete and biased data; lessons from crisis mapping
[13:02–16:55] What are the basic requirements for leveraging AI?
[18:17–19:38] Data privacy—governing principles and real-world risks
[23:50–27:59] Risk mitigation: use cases, transparency, edge and federated processing, automated monitoring
[28:58–31:22] Who should you hire first? The primacy of data engineers
[33:33–35:11] Small but valuable AI wins; managing expectations
[36:06–39:30] Chatbots, beneficiary-facing AI, and the need for guardrails
[44:43–46:42] The future: predictive analytics, local models, and agentic AI

Closing Reflections

AI in humanitarian work is neither a panacea nor science fiction—it’s a pragmatic, sometimes tedious journey that starts with disciplined data foundations, realistic project scoping, and a commitment to ethical safeguards. Data engineering, strong partnerships, and incremental wins pave the way for transformative impact, while constant vigilance is required against risks both social and technical.

As Chris summarized:
"It’s so important...to start getting into and upskilling humanitarians, to start to be able to have the lexicon, to be able to pull from when they speak about it and be able to understand these things." [46:57]

End of Summary.

Humanitarian Frontiers — Peering Through the Data Scope: Uncovering the Unknown

Date: February 10, 2025
Host: Chris Hoffman
Guests: Jeffrey Wag (PhD, CNRS & Relief International), Matthew Harris (PhD, formerly Datakind & WFP), Nassim Motelabi (co-host, WFP)

Episode Overview

Key Discussion Points and Insights

1. What Counts as Data in the Humanitarian Sector? [05:48–08:39]

Data is More Than Numbers: Organizational data comes in many forms: unstructured documents, PDFs, handwritten medical records, and traditional spreadsheets. Proper digitization and integration are ongoing challenges.
- "Incomplete data sets...are very heterogeneous. Data could be medical records that might be handwritten. One of the projects...is the digitization of these to machine readable form." — Jeffrey Wag [07:01]
Data ‘Cleanliness’ Is Non-Negotiable: The efficacy of AI models hinges on data quality, standardized storage, and architecture. Without a solid foundation, “rubbish in, rubbish out.”
Resource Constraints: Not all organizations have the luxury of dedicated technical staff or robust systems for data management.

2. Data Readiness: Where Are Humanitarian Organizations Now? [08:39–12:02]

Diverse Maturity Levels: Large agencies (e.g., WFP, UNHCR, UNICEF) have invested heavily in data capacities; others are still starting out.
High Hurdles for Qualitative Data: Even mature organizations struggle with unstructured, qualitative information.
- "Even to this date, we still are struggling when it comes to qualitative data management. With the rise of LLMs, we see a new wave of opportunity..." — Nassim Motelabi [08:39]
Does Data or Algorithm Lead the Way? The group debates whether organizations should develop data to match available AI/ML models or vice versa, and if existing data can often be repurposed for new AI solutions.

3. Multisource Data and Guarding Against Bias [10:50–12:02]

Combining Data Types: Lessons from crisis mapping (e.g., Nepal earthquake 2015) show the power in merging satellite data with crowdsourced and local reports, but also the risks of bias and incompleteness.
- "Rather than relying on just one source...combine with social media sentiment analysis. In Nepal, satellite images and local agency reports were cross-checked to confirm damage and deploy aid more effectively." — Jeffrey Wag [10:50]

4. Foundational Needs for AI Initiatives [13:02–16:55]

Three Key Steps for Organizations New to Data and AI:
1. Don’t Build Everything In-House: Leverage secure cloud/vendor solutions rather than starting from zero.
2. Enforce Access Controls: Share data only on a need-to-know basis.
3. Monitor and Log All Data Access/Use: Set up robust logging from the outset.
- "If I had to do three things...don’t try to build it yourself, only give access to those who need it, monitor what's happening with your data." — Matthew Harris [14:07]
Data Protection Must Precede Deployment: Draft internal policies on AI use and personal data well before launching pilot projects.
- "Regulation and policies, especially data protection, are having trouble keeping up with the technology...This is a big risk." — Jeffrey Wag [15:38]
Adopt External Frameworks Early: Even if not legally required, use standards like GDPR from the start.

5. The Cloud, Vendors, and Data Security [16:55–22:08]

Cloud-Centric Data Management: Using public cloud infrastructure (AWS, Azure, Google) and their in-house LLMs is safest.
On-Prem LLMs: Running models like Llama entirely within your cloud tenancy minimizes risks.
- "Having all your data and models within your own enterprise infrastructure is a big step forward...as opposed to everything being wild." — Matthew Harris [19:38]

6. Privacy, External Data, and Sociopolitical Implications [18:17–22:08]

Anonymization and Consent: Uphold 'do no harm’ by scrubbing personal info, and ensure informed consent.
- "Anonymization is essential before performing any sort of AI model development...Personal data goes beyond GDPR—location, for example, is critical in a refugee crisis." — Jeffrey Wag [18:17]
Risks in Prediction: Examples where predictive models output sensitive or politically charged results—requiring organizational caution and sometimes cessation.

7. Best Practices for Risk Mitigation [23:50–27:59]

Choose Safe Use Cases: Don’t experiment with highly sensitive applications first; pick low-risk, high-value scenarios.
Transparency with Data & Model Cards: Document source, risks, and biases of data/models for every AI deployment.
- "If you create an AI product, you should have a card [data/model card] showing the risks and caveats—it's transparency." — Matthew Harris [23:50]
Edge and Federated Processing: Analyzing data locally can reduce privacy risks.
- "Edge processing means data is analyzed locally, not sent to a centralized server...a mitigation strategy." — Jeffrey Wag [25:29]
Auto-Monitoring over Manual Checks: Use built-in cloud tools for real-time alerts on data access/security.

8. Staffing and Talent: Who Should Humanitarian Organizations Hire? [27:59–31:22]

Prioritize Data Engineering Over Data Science: Data roles are foundational; a clean, well-managed data environment is prerequisite for higher-order analytics.
- "Hire data engineers before data scientists...that management of the data is usually not done, and so scientists end up doing it themselves." — Jeffrey Wag [28:58]
Consider Vendor Partnerships: Especially for smaller orgs, SaaS products and managed services may be more strategic than building internal teams.

9. Small Wins and Realistic Expectations [33:33–35:11]

Translation Tools: Real, immediate benefit for cross-cultural organizations. Caution urged regarding low-resource languages and disclosure of AI assistance.
- "Many managers say they notice the quality of work improves due to translation tools. That's a small win." — Jeffrey Wag [33:33]
Failure is Normal: The high failure rate for AI projects (>80%) echoes startup realities.
- "Don't overhype the tech. AI is a tool, not THE tool to solve all problems..." — Jeffrey Wag [35:11]

10. From Internal to Beneficiary-Facing Applications [36:06–43:40]

Chatbots & Generative Agents: More feasible now, especially for structured info access (e.g., aid eligibility, local services). Safety guardrails and citations are mandatory.
- "Every factual claim from a chatbot should have a citation, so users can verify the claim's source." — Matthew Harris [37:24]
Risks in Sensitive Fields: Caution (or outright avoidance) is needed for chatbots in mental health and other high-stakes use cases.
- "We were offered a chatbot for loneliness during the pandemic...I was concerned it could make things worse and harm end users." — Jeffrey Wag [39:30]

11. Optimism, Realism, and the Next Big Thing [44:43–46:42]

Predictive AI for Foresight: AI helps anticipate famines, epidemics, and other emergent crises—enabling earlier, smarter intervention.
- "AI models predict food insecurity...combining satellite imaging, climate models, social media to forecast famine hotspots or disease outbreaks." — Jeffrey Wag [44:43]
Local Context in Models: Swahili LLMs and regionally-grounded models can improve frontline impact.
AI in Software Development: The next ‘quiet revolution’ in humanitarian tech may be code-generation AI and autonomous agents automating repetitive and complex technical tasks.

Memorable Quotes & Moments

On AI Data Foundations:
"For AI, if you don't have clean, solid data and a good architecture around it, the AI is not very good, isn't it? It's rubbish in, rubbish out."
— Matthew Harris [05:48]
On Vendor vs. In-House:
"Don't try and build it yourself. So many organizations try to build everything themselves. Pay a little bit of money for secure cloud or vendor solutions—often it's far cheaper and safer."
— Matthew Harris [14:07]
On Data Before Algorithm:
"After data protection...one of our toughest challenges is the incompleteness of datasets...they might be incomplete."
— Jeffrey Wag [10:50]
On Chatbot Precautions:
"Every factual claim in a chatbot should be grounded with a citation. Click it and see the source. I won’t release one that doesn’t have that."
— Matthew Harris [37:24]
The Real Revolution:
"The real revolution is coming with software development—generative AI as a copilot for writing software."
— Matthew Harris [45:32]

Notable Timestamps

[02:52–04:23] Panelist Introductions (Jeffrey Wag & Matthew Harris backgrounds)
[05:48–08:39] What is data in humanitarian organizations?
[10:50–12:02] Managing incomplete and biased data; lessons from crisis mapping
[13:02–16:55] What are the basic requirements for leveraging AI?
[18:17–19:38] Data privacy—governing principles and real-world risks
[23:50–27:59] Risk mitigation: use cases, transparency, edge and federated processing, automated monitoring
[28:58–31:22] Who should you hire first? The primacy of data engineers
[33:33–35:11] Small but valuable AI wins; managing expectations
[36:06–39:30] Chatbots, beneficiary-facing AI, and the need for guardrails
[44:43–46:42] The future: predictive analytics, local models, and agentic AI

Closing Reflections

End of Summary.

wavePod

Peering Through the Data Scope: Uncovering the Unknown

Powered by Wave AI

Summary

Humanitarian Frontiers — Peering Through the Data Scope: Uncovering the Unknown

Episode Overview

Key Discussion Points and Insights

1. What Counts as Data in the Humanitarian Sector? [05:48–08:39]

2. Data Readiness: Where Are Humanitarian Organizations Now? [08:39–12:02]

3. Multisource Data and Guarding Against Bias [10:50–12:02]

4. Foundational Needs for AI Initiatives [13:02–16:55]

5. The Cloud, Vendors, and Data Security [16:55–22:08]

6. Privacy, External Data, and Sociopolitical Implications [18:17–22:08]

7. Best Practices for Risk Mitigation [23:50–27:59]

8. Staffing and Talent: Who Should Humanitarian Organizations Hire? [27:59–31:22]

9. Small Wins and Realistic Expectations [33:33–35:11]

10. From Internal to Beneficiary-Facing Applications [36:06–43:40]

11. Optimism, Realism, and the Next Big Thing [44:43–46:42]

Memorable Quotes & Moments

Notable Timestamps

Closing Reflections

Summary

Humanitarian Frontiers — Peering Through the Data Scope: Uncovering the Unknown

Episode Overview

Key Discussion Points and Insights

1. What Counts as Data in the Humanitarian Sector? [05:48–08:39]

2. Data Readiness: Where Are Humanitarian Organizations Now? [08:39–12:02]

3. Multisource Data and Guarding Against Bias [10:50–12:02]

4. Foundational Needs for AI Initiatives [13:02–16:55]

5. The Cloud, Vendors, and Data Security [16:55–22:08]

6. Privacy, External Data, and Sociopolitical Implications [18:17–22:08]

7. Best Practices for Risk Mitigation [23:50–27:59]

8. Staffing and Talent: Who Should Humanitarian Organizations Hire? [27:59–31:22]

9. Small Wins and Realistic Expectations [33:33–35:11]

10. From Internal to Beneficiary-Facing Applications [36:06–43:40]

11. Optimism, Realism, and the Next Big Thing [44:43–46:42]

Memorable Quotes & Moments

Notable Timestamps

Closing Reflections