In just a few days, the Zindi community has built a model that can accurately predict air quality in cities and towns across Africa, filling in the blanks where there are no air quality sensors.
In searching for data about air quality in Africa, we came across an image that effortlessly illustrates the problem of monitoring air pollution on the continent. This map of internet-connected air quality sensors around the world (from the World Air Quality Index (WAQI) project) inadvertently highlights why we need new approaches for tracking air quality in Africa.
The WAQI project maps ground-based air pollution sensors across the world, but Africa’s coverage is not great. Source: WAQI Project.
This is backed up by recent reports from the University of Pretoria and UNICEF, which both conclude that we need better data to understand Africa’s air pollution situation before trying to improve it.
The case for better air quality data in Africa
Beyond general respiratory health issues, we know that air pollution has a direct effect on COVID-19 mortality rates. This recent study (published on the Medrxiv preprint server, which means it is not yet a peer-reviewed scientific publication) shows a direct statistical link between long-term air pollution and COIV-19 mortality.
Epidemiological, public health, and economic models all need air quality data for accurate outputs. The authors of the above study say that higher pollution means stricter social distancing and increased medical preparedness is required where pollution levels are higher.
While Zindi hackathons are designed to offer a challenge and learning opportunity to African data scientists, we’re also very interested in contributing practical, open-source AI solutions to help in the battle against COVID-19. So as soon as we got devnikhilmishra’s winning solution to the recent #ZindiWeekendz Urban Air Pollution Challenge (you can find it yourself on GitHub), we put it to the test. We wanted to see how it worked predicting air quality in places where we have no ground-based pollution sensors.
Zindi’s community rises to the challenge
The challenge was to build a model that could take in satellite data for a location and predict the air quality on the ground, as measured by ground-based sensors looking at particulate matter (PPM2.5, a common measure of air pollution). This air quality challenge ran from 10 to 12 April, and attracted over 200 data scientists across Africa and around the world.
We plotted the winning model’s predictions against the ‘true’ values from a sensor in a location that was not provided in the training set. You can see that the prediction (in orange) closely matches the sensor data (in blue) for a large city (London), but also for a smaller town in South Africa (Worcester, with a population of less than 100 000). This means that the model works well for both large and small urban centres.
Our air quality predictions accurately match actual data gathered from on-the-ground sensors in London and Worcester, SA. Source: Zindi
A better picture of air quality in Africa
Now that we’ve checked that model works as expected, we can put it to use! Since the model’s only inputs come from satellite data, we can apply this model to any location, even one that doesn’t have any ground-based sensors.
This lets us build up a picture of air quality for places where there was previously no data available. The map below shows our air quality model applied to major cities across Africa, predicting the air quality for a single day.
Predicted air quality for major cities in Africa for a single day (April 2, 2020). Source: Zindi.
We’ve borrowed the WAQI colour-coded air quality indicator system for the image above, but you’ll notice that this map of Africa has noticeably more information about air quality on it than on the WAQI map above. This is useful, usable information for policymakers, public health researchers and epidemiologists modelling the COVID outbreak and its public health impacts.
With access to historical satellite data, we can easily look back at long-term trends to better understand trends in health outcomes, or to compare the COVID-19 lockdown period to a period of normal activity.
This model does, of course, have its limitations. Pollution levels within a city are often variable: think of Cape Town, with wealthy residential areas near the ocean having good air quality compared to poor communities and townships inland, where PMM2.5 often reaches ‘Unhealthy’ or ‘Very Unhealthy’ levels. The model could potentially be improved to look at areas within a city, but the resolution of the satellite data used places a limit on how well this could work.
In addition, since the model was trained on city data, extending the predictions to less urban areas will likely make the model less accurate. This is a difficult challenge to overcome, as there are very few air quality sensors outside of urban areas, especially in Africa.
We’re excited to say that this model is freely available for use under a CC BY-SA 4.0 license, so please get in touch (firstname.lastname@example.org) if you’d like to put it to use.
We’d like to say a big thank you to all our Zindians who participated and helped make this project possible, particularly those who placed at the top of the leaderboard. We’re also grateful to Microsoft for sponsoring #ZindiWeekendz and making all this possible. If you’re interested in seeing some other solutions, check out the GitHub repositories below.
Zindi is a data science competition platform that hosts a community of over 12,000 data scientists from across Africa and beyond. Sponsored by Microsoft, Zindi has been running a series of six virtual weekend hackathons throughout the months of April and May, specifically focused on the health, social, and economic impact of COVID-19. All solutions will be shared on GitHub and freely available for government and private sector actors to use in their battle against COVID-19.