A (4:15)
Many thanks, Milan. I am delighted to be here this evening. Many thanks to the Department of Statistics, to the Global School of Sustainability, and to all the event organisers for making tonight possible. And of course, many thanks to Sephi for joining us later as well. This is joint work with very many colleagues and so I will do my best to acknowledge them throughout the presentation, but will also say more about my collaborators at the end of the presentation. So I wanted to start by telling you about some of the projects and questions that are of interest to us and that we've worked on or are working on just now. So, globally, what is the status of lake water quality for 1000 large lakes globally? So this was the focus of our Global lakes project. It was funded by the Natural Environment Research Council and we were specifically interested in satellite data. So we had data from meris, so the Medium Resolution Imaging Spectrometer, that was an instrument on the European Space Agency's Envisat platform. And so we had satellite retrievals from meris, and my colleagues at the University of Stirling and also Plymouth Marine Laboratory processed those satellite retrievals. Initially, it gives us information on color reflectance, but they processed the product so that we were working with chlorophyll. So we had information on chlorophyll for these 1000 lakes that you can see indicated by the dots on the map. Now, chlorophyll is an indicator of water quality and typically we can think of higher levels of chlorophyll as indicating poorer water quality. So lakes themselves are thought to be sentinels of change. So if we can understand the processes and changes at work within lakes, then it can help us to understand more about environmental change and the potential impacts of global changes. So we looked at chlorophyll in each of these lakes. We looked at it across the water quality for the lake, but also over time. And we were specifically interested in something called temporal coherence. And so that's what we're trying to get a handle on, in the groupings that you can see down in the bottom. Right. So we wanted to know what the patterns were in the water quality, how was chlorophyll changing over time? But more importantly, did we see common patterns where the lakes clustering together, were they grouping together in some way? So that was a global example. So, next, moving to a national picture, this is a project that we're currently working on, MOT for rivers. This is also funded by nerc, the Natural Environment Research Council. And we're looking at data now for rivers across England, Scotland and Wales. And we have data from the Environment Agency, from the Scottish Environment Protection Agency and from Natural Resources, Wales. Now, the specific question of interest here is about mixtures. Traditionally, when thinking about the water quality for rivers, you might look at individual chemicals or a small number of the chemicals. But here we now have the possibility to think about is it actually the mixture or the interaction of chemicals in the water that we need to carefully control? So we're looking at data from over 13,000 water quality sites that you can see in the bottom left, we've got information on nutrients, we've got nitrate, we've got phosphates, we've got metals. We're interested in how they're interacting with what we see in the surrounding catchment. So interactions with the land cover, with flow, with precipitation, with altitude, etc. And in terms of impact, it's all about the biology. So for a healthy ecosystem, we want to carefully manage any impacts on these biology metrics. My third example for now, takes us to a more local scale. So this is all about Glasgow. Glasgow is where I work, so I'm at the University of Glasgow and we are working at the moment with Glasgow City Council on, on this project Gallant, that I'll say more about later on. But here we're interested in the urban environment. So how is Glasgow's living space changing? What effect might interventions have on the future environment? And in this project, we're interested in non traditional data. So we might have more classical data sources, we might have official statistics, we might have deprivation indices, but equally we might have information from photos or from social media. So data, this is what we're interested in. This is the main focus of our conversation this evening. The data associated with these challenges can be both diverse and complex. For example, we talk about both structured and unstructured data. Structured data is maybe more what you would expect, numbers. So the level of chemical concentration in water or the number of species that we can observe, unstructured data, data that we might typically not be aware of. So it could be contained in photos or contained in text. And so we're interested in a variety of different data sources. These data sources might be recorded at different points in time, they might be recorded at different geographic locations, or they might be an average. They might be telling us about an area of the land, or they might be telling us about a monthly figure or an annual figure, and they could be available or collected in a variety of ways. So the data might be available for A particular design study, or it might just be automatically streaming. We all know that these days we're surrounded by data, everything that we do. So we're interested in how we can use multiple data sources in combination. And I've said potentially to provide additional information. And the reason I've said potentially there is because, of course, there's lots of different considerations when we use multiple data sources in terms of using them in an appropriate or sensible way. And so our main interest is to think about how we can use statistical modeling or data analytics approaches to enable us to do that. And for us, it's typically all about a question. So the approach that we take or what we're interested in doing is very much driven by the important question that we're trying to answer. And in all of my work that comes from the Real world challenge, what's the environmental question? What might be the data associated with that? And therefore, what is an appropriate statistical approach? So I thought I would take a step back now and just think about environmental data in general. Where do environmental data come from? How do we collect it? Well, the different sources that we have are continually evolving and expanding. As with many things these days, it's very fast paced in terms of the technology. If I take the water quality example, first of all, just to give some examples, we could have information from in situ. And so that's thinking about a classical approach where you might go to a river or a lake, take a manual sample that's then taken to the lab, analyzed, gives you information on the water quality. Or we might have an automatic sensor, we might have some kind of buoy that's giving us information that's maybe recording at a higher temporal frequency, so more information over time. We could have remote sensing. There's various different technology there. It could be satellite, it could be drone, it could be aircraft imagery. So giving us that wider picture over the land, that wider geographic representation. And then as we talked about a little earlier, now we have more non traditional data sources. So we could have information from citizen science people inputting information into apps on their phone, taking photos, social media posts or images, public opinion posts. It could be videos, or we might record audio. We could take, say, audio of biodiversity, sounds for animals in the environment. But typically there's no one data source that provides us with a complete picture. Typically we don't have one data source that we think gives us all the information. It's as accurate as it can be. So if we can combine our data streams, then it can enable us to unlock insights about patterns, changes or interactions and hopefully give us a greater understanding of our environment. So this is my main interest. Where does the statistics or the data analytics come in to all of this? Well, these are various different aspects of the data and the types of data sources that we've looked at so far. And we can use statistical modeling and data analytics in order to account for these different features of the data. So we mentioned earlier that data might be recorded at different time points or different geographic locations. We often refer to that as just saying recorded over space or spatial locations. Through using modeling, we can think about a sensible way to relate those data streams together. How can we combine the data streams that are giving us similar information, but they're not recorded at exactly the same point? We might have missing data over time and over space. So if we take the sensor example, a sensor can go offline in terms of WI fi connectivity. It can get kicked by an animal, it can get blocked by leaves, satellite. Depending on the instrument that we're working with, we might have issues of cloud cover, we might have issues of land masking. It passes over at particular times or particular object orbits. So we can take account of these different features. If we have data that are collected, say, over a river network that we've got here, we've got a lot of points on that river network. It looks like we've got a lot of information, but actually all of those points are connected by the network. So we need to take account of the fact that the data are connected. We don't have as many pieces of independent information as we think that we have. We've talked about data of different structure, structured or unstructured, and the way that that's arising might mean that our data are biased or they're not representative. Historically, statistics was based around design studies. We knew the properties of our data, but these days, typically we don't. And we don't want to not use the data just because we haven't collected it from a design study. But we need to be aware of the potential bias or the potential for it not to be representative. So within our approaches, we can take account of the data quality, the variability or uncertainty. So I wanted to go into two examples in detail, and this is to give you two examples of work that we've been doing and developing in this area. The first example we refer to as data fusion. And when we refer to data fusion, we're thinking about combining data where we're dealing with the same variable of interest. So here we're thinking about combining, for example, data recorded in situ on the ground with satellite data. And we've looked at various examples, water quality, wind speed, soil moisture. And I'll say a little about these. The second project is our gallant project and I mentioned that briefly earlier. Now this is where we're very much looking at non traditional data sources and thinking about how we can investigate sustainability. So first of all, what are the patterns in water quality over the lake and over time for Lake Balatun, Hungary? So the data that we've got here, we've got information from nine in situ locations and you can see those nine locations spread throughout the lake. We've got information from just over 7,500 pixels in terms of satellite data. And these data were meris. So the same instrument that I talked about earlier for MERIS, it gives you a spatial resolution of 300 meters. So we have a 300 meter by 300 meter pixel essentially of information each time we're thinking about chlorophyll that indicates of water quality. So higher levels, typically poorer. In this example, the in situ data are thought to provide more accurate information at a smaller number of spatial locations. And some of those locations as well have gaps over time. But we can see that from the satellite. We have the full color picture across the lake and we have information over time as well, but it's not thought to be as accurate as the in situ data. If we look at location one here, down the bottom, then we've got a pull out from location one over in the right time series plot here. So the black points are the in situ data, the manual recordings. The red is from the satellite and this is a matched pixel location. So we can see the difference in the, the variability for the two sets of information. Spatially you can see a difference in the color, so stronger, darker red colors in the in situ, in the middle there, for example, than we see in the satellite. So we're interested in how we can combine the in situ and the satellite data to provide a higher resolution prediction of over the water quality for space and in time. And the way that we do that is to think about the data as a curve. So these two pictures, the bottom left, the top right are the ones that you've already seen. And the bottom right, this is showing you the in situ data. So those points again from up the top here. But now we've put a curve through those points and this is the way that we are going to combine the in situ and the satellite data in order to use our modeling to do our data fusion for these products. Now, an equations warning Okay, I do have three slides of equations. If you are not an equations person, just ignore them. I thought the statisticians might like some detail. Okay, so first slide of equations. What we're doing is we have a curve over time for our in situ point. We have a curve over time for our matched satellite pixel. So this is our curve over time for our in situ. This is our curve over time for our matched satellite pixel. We then fit a model to describe the relationship between those two curves. And that's what's going on here. And in that model, we allow that relationship to change as it moves over the surface of the lake. There's an additional feature that we need to build in. That's the idea that the information across the lake is related to one another. So location one and location two are not independent. So we need to take account of the fact that we have a relationship through the water quality. So this is how we combine our in situ and our satellite data. And when we do that, we get a new set of predictions. So up the top here, we now have a new predicted surface. We've combined together that in situ and the satellite data. We have a more accurate product that we've got in terms of our information, and we also have an indication on the uncertainty. So this just gives us a range of values. So what we're saying is that at each point we have a prediction, but that prediction will lie in a range of values, so giving an indication of how accurate we believe it to be. And this is just showing you one spatial snapshot. Of course, we have this going through time as well. So what does that enable us to do? Well, such approaches enable us to investigate the patterns and the relationships. It's enabling us to predict anywhere across the lake and over time, but also giving us that uncertainty information. And one of the reasons to do this is to give us greater confidence in areas where we only have the satellite data because we can't take manual measurements everywhere across the globe. Another reason that we might be interested in here is to think about where you want to take new measurements in the future or where you want to place new in situ sensors in the future. So this was just one example. We've worked on various developments here and just a couple of aspects that I'll pick up to give you an idea. We've looked at how to scale this up. So I mentioned that for the MERUS data, the spatial resolution was 300 meter by 300 meter. But these days the technology can take us down to much finer spatial resolution. So down to less than 10 meters, for example. We also might want to incorporate additional data and so not just thinking about matching the same variable in situ. And for satellite. So two more slides of equations. Again, apologies if you're not an equations person. Bear with me. First one, just mentioning about scaling up or speeding up as we move to higher resolution in terms of the spatial data. So with more technological advances, we have much larger data sets that we're dealing with. I mentioned earlier that we need to take account of the fact that information across the lake will be related. Location 1 and Location 2 are not independent. And in the model, we did this through this particular line here. But this becomes very computationally intensive. If we have a lot of data, it can take a long time to fit. And so we've been working with various approximations so that we can estimate this using a smaller set of data and then scale it back up. Two other examples we've been looking at wind speed and we've been looking at soil moisture. The wind speed example. We're still thinking about relating in situ data to a lower potential, lower accuracy satellite data. We use a slightly different approach here, and this is so that we can capture more complex relationships. And we're also taking account of the fact that wind speed by its nature is a slightly different form of variable. So we can have a small number of extremes, we can have high values, but the majority of the data are at the lower or the more moderate values. So this approach allows us to take account of that. In the soil moisture example that we're working with, we want to build in additional information so we could get more accurate predictions. If we don't just look at soil moisture, but also build in, say, rainfall, temperature and elevation. But again, they are recorded potentially at different times, different geographic locations. And so we need a way in our modeling to account for that. So just a couple of the developments that we've been working on. Okay, if you're not an equations person, it's over, you can relax. Going back to something more general. So the second example, Gallant. So Gallant, Glasgow is a living lab accelerating novel transformation. It's a joint project with Glasgow City Council and again funded through the Natural Environment Research Council. Main aim up the top there to use cross disciplinary expertise to drive systemic transformation. Economic development is responsibly considered through ecological and social lenses. What does that mean? Well, we're interested in Glasgow's living space, as I said before, how it's changing, what effect interventions might have. This is a very large project and I'll go into more Details in a minute. But one of the ideas and the frameworks that we're working with in the project is something called Donut Economics, and that's related to the ideas that we're thinking about a space that is ecologically safe and socially just. So the QR code here gives you a lot more information on what I'm going to refer to as the Glasgow Donut. So the Glasgow Donut has not been developed by me, but has been developed by other colleagues within the team, and particularly our systems transformation colleagues. And the Glasgow Donut is thinking about these ideas of Donut Economics. So this is taking us towards a situation where we want everybody in society to be able to thrive, but we don't want to do anything in society that then overshoots our ecological boundaries that has negative impacts on the environment. So the idea is that we want to be in the ring of the Donut in order for that to happen, so that we can be ecologically safe and socially just so this is a bit more detail on our gallant project, Work Stream one I've just referred to. So this systems transformation, they're getting us to work in this whole systems approach where we're combining together all urban environment, society, and very much motivated by these ideas of the donor. We have another work stream on community collaboration. So my community collaboration colleagues are going out within Glasgow, directly, working with communities and citizens in Glasgow, involving them in the research, speaking to them about their perceptions of Glasgow, what they think needs to change, how we might put in interventions. And then I lead Work Stream 3, which is on data and data analytics. We'll come back to that in a little while. Underpinning all of this, we have five work packages. So work package one is broadly about flooding, Work package two is broadly about biodiversity, work package three broadly about vacant and derelict land. Four is about active travel, and five, clean energy at the community scale. And this is just to give you an indication of the size of the project. So a huge number of researchers working on this right across Glasgow, across all of our different colleges, science and engineering, social science, medical, veterinary and life sciences, and also our arms. So to tell you a little bit more about data. So again, another QR code here. There's lots of information, so please feel free to look if you would like some more. But this is the type of thing that we're interested in in terms of, is a bit different from the first example that we looked at for water quality. The data here a bit more diverse, non traditional. We're interested in photos, interview transcripts, deprivation indices, could be social media posts, citizen science, etc. So within our work stream, we're interested in cross cutting across those gallant themes, flooding biodiversity, etc. Thinking about specifically aspects of sustainability. And so we're developing data analytics approaches related to that and also developing our gallant data hub. So this is to let you see some of the approaches that my colleague Luigi has been developing. This is thinking about how we can extract information from these non traditional data sources. So we might be working with photos or social media posts, for example. Luigi's been developing different analytics pipelines. So how could we move, for example, from a photo to get a description of the that photo and then to think about what that description is telling us? Is it telling us something positive about the environment, something negative about the environment? Can we detect events? So just to give you one example of the type of thing we've been looking at, we've looked at the social media platform, Flickr. So from Flickr, Luigi managed to, through the process of curating a large vocabulary related to gallon, come up with 30,000 posts between 2019 and 2024, not just for Glasgow, but for Glasgow in the kind of wider central Scotland area. As you'll see in a minute, those Flickr posts are typically giving us photos. We've then used image captioning models to move from the image to a description and then used ideas of sentiment analysis to think about what is that description telling us about these aspects of sustainability. And then the next stage is all about linking that potentially to other data sources. So data zones, data deprivation and disease, etc. Okay, so one example here. So this is all of the Flickr posts represented by a dot 2019-2024. Glasgow is around about here where we see the bulk and we see we've just got a wider representation around central Scotland and a little bit to the north. Each of these dots was an image. The image has been converted into a description and then sentiment has been attached to that description. So the sentiment scale going from negative down the bottom to positive up the top. So the negative in the blues and the positive up to the red. So, so we can see one negative blue picked out there. So let's zoom in for a little bit more detail. So this is an example of the image that generated that blue dot, that negative sentiment. So in terms of the tags that were attached to the image, it was fallen, storm and tree. When the image was processed into a description, we got a tree that has fallen down in a yard. And you can see the representation. Now this image is AI generated This is not the image from Flickr. And the reason that we've done that is because we want to be really careful and cautious about anonymity. Just because somebody has posted this in Flickr, they didn't necessarily want me to use it in this talk. So this is just to give you an indication of what the original image looked like. This shows you some of that data aggregated up. So the specific areas that you can see that are shaded, these are data zones. And the definition of a data zone is that it's approximately 500 to 1,000 people. Within these data zones we've got the number of Flickr posts that we had. So this one here, for example, 151. This one here, 79. And these particular ones, when we've aggregated it up and looked at the number of posts, have picked out country parks that are around about the area of Glasgow. So for us that's quite nice in terms of validation. It's not a surprise that if we're interested in biodiversity, we might have a lot of posts from people that are around about those parks. Now this is really just a snapshot of the work, it's really just the start of the work because what we're interested in now is to how we link this to other data that we have. The QR code here takes you to communimap. So Communimap is an app that's been developed by our community consultation colleagues. In Workstream, two people can go onto the app and they can post about their surroundings, they can post photos, they can post routes that they take as they walk through the city. So we're interested in how we might combine what we see in social media to information that we get from the app to information that we get from smaller scale conversations. It's obviously very difficult to get a picture about citizens views of sustainability. So we're looking at these variety of different data sources to see do we get the same messages, do we get different messages, what could be driving that? Of course we know social media, Flickr, it's a self selecting group that have decided to post these images. So we need to be careful about the use. But how does it link with other information and how can we use that? The other thing that we're interested in is things like event detection. We saw in the picture earlier that we had detected the tree had fallen from the storm. We can see that if we could automate that type of process, it might really help in terms of planning, in terms of sorting out such issues instead of having to go Through a reporting system etc, etc. So lots more that needs done here but just the potential of things that we can think about that might feed in to future city plan. So with that I'm going to start to wrap up. Environmental monitoring is a fast moving landscape, rapid innovation in the potential data streams. A lot of information out there that potentially we can use to tell us about the environment. But there are challenges, there are considerations. We need to be careful in the use of the data. So we're interested in how we use statistical modeling, data analytics to combine the sources in an appropriate way. Always thinking about answering particular questions of interest that are really driving it all. But hopefully we might uncover unseen information and give us that richer picture to help inform monitoring, planning and management. Just very quickly about our interest for the future, we want to extend this. So we've talked about combining data streams. We can also think about combining models, combining approaches. There's a lot of work these days in foundation models from AI for the environment, so using things like LLMs ChatGPT to look at environmental forecasting. But how can we combine that with the benefits that we can get from from other modeling approaches as well? To finish up, I want to mention collaborators. I mentioned at the start that this is with many colleagues. I have done none of this work on my own and a lot of the experts are in all of these teams. So environmental science at University of Stirling, Environmental engineering at Edinburgh, Geography at Dundee, Environmental science at Lancaster, Meteorology and remote sensing at Reading James Hutton Institute UK Centre for Ecology and Hydrology Plymouth Marine Laboratory Environment Agency, Glasgow City Council. These are just the people that are involved in the work I presented today. I work with many other people beyond this and I won't read out all the names but a very large number of people involved directly at Glasgow and everything that I've presented today, both in terms of my own school in maths and stats, but also in the wider gallant projects and we're linking up very much with the urban Big Data Center. I want to acknowledge the Natural Environment Research Council. They have funded all of the work that I've talked about today in these three specific projects. And very important disclaimer in the blue. I very much apologize to anybody that I have missed everybody. I want to acknowledge everybody and I'll finish with some references if anybody wants a little bit more of the detail. Thank you very much.