Podcast Summary: LSE Public Lectures and Events
Episode: Seeing the Unseen – Combining Data to Better Understand Our Environment
Date: October 29, 2025
Main Theme
This episode brings together perspectives from statistics and economics to explore how combining diverse data sources—ranging from satellites to social media—can help us better understand, monitor, and manage environmental challenges. Professor Claire Miller (University of Glasgow) shares advances in environmental statistics and data fusion. Dr. Sephi Roth (LSE) offers the economic angle, focusing on air pollution, information quality, and policy impact. The discussion underlines the critical importance of integrating multiple data streams, the challenges of data quality, and the implications for research, policy, and individual action.
Key Discussion Points and Insights
1. Introduction & Framing (00:14)
- Host: Mila Vojnowicz, Head of the Department of Statistics, LSE, introduces the event, presenting Professor Claire Miller and highlighting the importance of environmental statistics and the value of cross-disciplinary collaboration.
- Emphasis on the event’s goal: bridging statistical methods and environmental economics to address real-world challenges.
2. Professor Claire Miller: Data Fusion in Environmental Monitoring (04:15)
(a) Global Lake Water Quality and Satellite Data
- Project: The Global Lakes Project (NERC-funded)
- Data Types: Satellite data (MERIS instrument) and processed chlorophyll data serve as proxies for water quality.
- Objective: Understanding global patterns and temporal changes in water quality, analyzing how lakes may "cluster" in their response to environmental change.
- “Lakes themselves are thought to be sentinels of change. So if we can understand the processes and changes at work within lakes, then it can help us to understand more about environmental change...” (05:38, Miller)
- Key Insight: No single data source provides a full picture; integrating satellite and ground-based data strengthens inferences about environmental change.
(b) National River Monitoring & Chemical Mixtures
- Project: MOT for Rivers (England, Scotland, Wales)
- Data Types: Agency data on nutrients and metals at 13,000+ sites.
- Objective: Go beyond single-chemical monitoring by analyzing chemical mixtures and their interactions with landscape and biology.
- “We now have the possibility to think about, is it actually the mixture or the interaction of chemicals in the water that we need to carefully control?” (07:07, Miller)
(c) Urban Environment & Non-Traditional Data
- Project: GALLANT (Glasgow as a Living Lab)
- Data Sources: Classical (official stats), unstructured (photos, social media), citizen science.
- Key Challenge: “The data associated with these challenges can be both diverse and complex… we’re interested in a variety of different data sources.” (09:53, Miller)
(d) Types and Challenges of Environmental Data (13:00)
- In-situ measurements, automatic sensors, remote sensing, and citizen-driven data all offer unique perspectives but need careful integration.
- Key Challenges Identified:
- Heterogeneity: Data recorded at different times, scales, and qualities.
- Missingness & Connectivity: E.g., network effects in river systems, missing points from sensors.
- Bias: Especially acute when using non-traditional/citizen data (self-selection, design).
- Uncertainty: Must be quantified and communicated in predictions.
(e) In-Depth Example: Data Fusion for Water Quality (18:30)
-
Case Study: Lake Balaton (Hungary)—combining nine in-situ sampling points with 7,500 satellite pixels.
- Approach: Statistical modeling aligns more accurate but sparse in-situ measurements with broad-coverage but lower-accuracy satellite data.
- Outcome: Better high-resolution predictions with quantified uncertainty, guiding future sampling strategies.
- “Such approaches enable us to investigate the patterns and the relationships… giving us that uncertainty information.” (23:13, Miller)
-
Extensions: Methods scale up for higher resolutions, adjust for different variables (wind speed, soil moisture), and incorporate multi-source data even with misaligned space/time stamps.
(f) In-Depth Example: The GALLANT Project and Non-Traditional Data (28:00)
- Framework: Donut Economics—sustainable, socially just urban design.
- Data Innovations:
- Image Analysis: Using Flickr photos, image captioning, sentiment analysis to understand environmental perceptions and events (e.g., storm damage).
- Community Integration: Linking social media, citizen app (Communimap), and survey data.
- Quote: “We’re interested in these variety of different data sources to see, do we get the same messages, do we get different messages, what could be driving that?” (36:28, Miller)
- Event Detection: Potential for near real-time detection of environmental incidents via crowdsourced data.
- Key Takeaway: Statistical and analytic innovation are essential for extracting actionable insights from increasingly complex and non-traditional datasets.
(g) Reflections and Future Challenges
- Need for careful, question-driven integration of data streams due to risks of bias, misrepresentation, and data overload.
- Intersection with AI: AI/foundation models offer new forecasting tools, but require rigorous evaluation and thoughtful combination with established statistical frameworks.
3. Dr. Sephi Roth: The Economist’s Perspective (44:42)
(a) Information as the Foundation for Efficient Policy
- Central Thesis: “Economists deeply care about information as we believe that information is one of the core foundation[s] for efficiency. So without it, markets will allocate resources inefficiently, policies are unlikely to be efficient…” (45:00, Roth)
(b) Air Pollution as a Case Study for Data Integration
- Spatial Scale: Satellite data offers global coverage but often poor local resolution; station data (e.g., London) has high temporal, low spatial reach.
- Illustrative Example: “If you just… measure the pollution on Kingsway… then measure… on the other side of the building…, you are very likely to get very, very different results…” (46:50, Roth)
- Temporal Scale: Satellites give momentary snapshots; stations provide continuous time series.
- Placement Bias: Monitors are not always randomly distributed; sometimes placed away from pollution “hot spots” for regulatory reasons.
(c) Exposure vs. Pollution Concentration
- Exposure, not just ambient concentrations, matters most for public health, education, productivity.
- Challenge: Need to combine multiple data sources (outdoor, indoor, mobility) to estimate true exposure.
- London Study Example: Joint Camden Council project with in-home air monitors showed ambient data “predict very badly the indoor environment. And then… the reason is that because there are many, many indoor sources, even in London.” (55:13, Roth)
(d) Impact of Better Information on Behavior and Policy (58:18)
- RCT Finding: Simply providing real-time indoor pollution data to households reduced their exposure by over 30%.
- “Just providing the information, not telling them what to do… reduced pollution exposure in the home… by more than 30%.” (59:24, Roth)
- Policy Angle: High-quality, precise data critical to effective, proportionate regulatory design (e.g., setting pollution taxes at the right level).
(e) Caution: More Data Isn’t Always Better
- Importance of data quality: Combining datasets can amplify errors if sources are flawed; must rigorously validate and triangulate whenever possible.
- US Example: High agreement between satellite and ground station data in the US, but poor correlation in many other regions.
- “We need to be very careful and always verify the data source…” (62:54, Roth)
(f) Conclusion
- The future lies in integrated, multi-stream data approaches, executed with robust validation and awareness of their limits.
- “We need to do it well. We need to be careful with the data that we use, which source we use and how we use it.” (64:49, Roth)
Notable Quotes and Memorable Moments
- Miller on Interdisciplinarity: “It’s all about a question… The approach that we take… is very much driven by the important question that we’re trying to answer… What might be the data associated with that? And therefore, what is an appropriate statistical approach?” (11:18)
- Miller on Statistical “Toolbox”: “I always talk about a toolbox of evidence… I want to have that trust in the answers… it’s about that full spectrum into a box of evidence.” (72:01)
- Roth on Data Quality: "More data are better. And this is true most of the time, but not all of the time, because data quality really matters." (61:24)
- Roth on Empowering Individuals: “Just providing the information, not telling them what to do… reduced pollution exposure in the home… by more than 30%.” (59:23)
- Engaging Q&A: Discussion ranged from methodological details about data fusion and model uncertainty, to the difficulties of data access, to the tension between metrics standardization and preserving rich biodiversity information.
Important Q&A Segments with Timestamps
-
On Uncertainty in Predictions and Machine Learning Applications (67:03)
- Discussion about dispersing uncertainties across spatial models and the potential for image-based, ML-driven environmental predictions in future work.
-
On Risks of Unstructured Data Swamping Structured Data (70:02)
- Miller emphasizes need for “toolbox of evidence,” careful weighting, and critical appraisal of data streams.
-
On When to Stop Collecting Data (73:46)
- Miller notes that data collection frequency should be tailored to the question and nature of the variable (e.g., water vs. air pollution).
-
On Data Access and Data Sharing Barriers (77:06)
- Acknowledgement that even with open access trends, data often remains unavailable at the granular/raw scale needed.
-
On Standardization versus Complexity in Biodiversity Reporting (79:35)
- Panel recognizes the tension between richness and usability—“standardization can make data less informative, yet not enough standardization risks overwhelming users.”
-
On Future Impacts of Cheaper Satellite Data and AI (84:59)
- Panel discusses likely growth in data volume, game-changing technical innovations (e.g., MTG S1 satellite offering 3D atmospheric data), but underscores the continuing need for validation and human judgment.
Final Thoughts
- Integrating multiple, heterogeneous data streams offers the potential for deeper, more accurate insight into complex environmental systems, but requires rigorous methodology and attention to data quality, bias, and appropriate analysis.
- Advances in technology—satellites, AI, citizen science—are rapidly shifting the field, but foundational statistical principles and careful, question-driven research remain central.
- The economic perspective highlights that better environmental information can directly drive more efficient, effective policy, as well as empower individuals in their daily decisions.
Recommended For:
Researchers, policymakers, data scientists, environmental economists, and anyone interested in the intersection of statistics, technology, and environmental sustainability.
For further reading and details, Professor Miller referenced her collaborators, NERC projects, and data resources; listeners are encouraged to follow up with the references provided during the talk.
