Code
library(tidyverse)
library(ggplot2)
library(gt)
library(sf)
library(rnaturalearth)
library(rnaturalearthdata)From 2010 to 2022
Air pollution is one of the most common environmental problem in cities around the world (Morawska et al. 2021). Atmospheric pollutants negatively impact human health and can also affect the climate. To know the the level of air pollution, many cities have installed air quality stations to measure pollutant concentration in the air. These measurements allows us to study the current level of pollution, asses atmospheric emissions regulations, and they support atmospheric research, as they help in model evaluation.
The pollutants that affect human health are knows as criteria pollutants, they are frequently regulated by national and state laws. They are Ozone (O3), Carbon Monoxide (CO), Nitrogen Dioxide (NO2), and Fine and Gross Particulate Matter (PM2.5 and PM10, respectively) (Bekbulat et al. 2021).
In this project, we used the data from the World Health Organization air quality dataset (WHO), to study the trends in PM2.5 and NO2 concentration from 2010 to 2022. We focus on PM2.5 and NO2, as they are considered, together with tropospheric ozone (O3), the most harmful to human health (Sicard et al. 2023).
We aim to answer the following questions:
The following code chunk presents the packages we used in this project.
library(tidyverse)
library(ggplot2)
library(gt)
library(sf)
library(rnaturalearth)
library(rnaturalearthdata)We are using the WHO air quality dataset version 6.1. It is an .xlsx that was read into R using the read_excel function. It was cleaned and saved into a .rds file.
This dataset have yearly average of PM10, PM2.5, and NO2, for different cities, from different countries, and WHO regions. Table 1 shows the number of countries and cities in the data by WHO region.
readRDS(here::here("data/processed/who_air_quality_ready.rds")) |>
group_by(who_region_name) |>
summarise(countries = n_distinct(country_name),
cities = n_distinct(city)) |>
gt() |>
cols_label(
who_region_name = html('WHO region name'),
countries = html('Countries'),
cities = html('Cities')
)| WHO region name | Countries | Cities |
|---|---|---|
| Africa | 13 | 59 |
| America | 23 | 832 |
| South-East Asia | 9 | 399 |
| Europe | 49 | 4347 |
| Eastern Mediterranean | 14 | 159 |
| Western Pacific | 13 | 1379 |
| Non-member state | 3 | 7 |
To evaluate the trend of pollutant concentration, we need to calculate the yearly average by WHO region. We also calculate a global average called World and create a single data frame.
who_region_year_avg <- readRDS(here::here("data/processed/who_air_quality_ready.rds")) |>
group_by(who_region_name, year) |>
summarise(
pm10_mean = mean(pm10_concentration, na.rm = T),
pm25_mean = mean(pm25_concentration, na.rm = T),
no2_mean = mean(no2_concentration, na.rm = T),
)
who_world_avg <- readRDS(here::here("data/processed/who_air_quality_ready.rds")) |>
group_by(year) |>
summarise(
pm10_mean = mean(pm10_concentration, na.rm = T),
pm25_mean = mean(pm25_concentration, na.rm = T),
no2_mean = mean(no2_concentration, na.rm = T),
) |>
mutate(
who_region_name = factor('World')
) |>
relocate(who_region_name, .before = year)
who_year_avg <- bind_rows(who_region_year_avg, who_world_avg)Figure 1 shows the distribution of air quality stations that measure PM2.5 in 2018 in the dataset. It is clear that the northern hemisphere has more coverage than the southern hemisphere, also noted in Garland et al. (2024).
world <- ne_countries(scale = "medium", returnclass = "sf")
readRDS(here::here("data/processed/who_air_quality_ready.rds")) |>
filter(year == 2018) |>
st_as_sf(coords = c("longitude","latitude")) |>
st_set_crs(4326) |>
ggplot() +
geom_sf(data=world) +
geom_sf(
pch = 21,
aes(size = pm25_concentration, fill=who_region_name),
col = "grey20") +
scale_fill_manual(
values=c(
'Africa'="#79ADE6",
'America' = "#E67879",
'South-East Asia'="#9FE778",
'Europe'="#E179E7",
'Eastern Mediterranean'="#E8E962",
'Western Pacific' ="#7AE6CF",
'Non-member state' ="#FF9933"
)) +
labs(
size = expression('PM'[2.5] * ' (' * mu * '/m'^3 * ')'),
fill = 'WHO region'
) +
guides(fill = guide_legend(order = 1, nrow = 4),
size = guide_legend(order = 2, nrow = 3)) +
theme_bw(
base_size = 12
) +
theme(
legend.title.position = "top",
legend.position = 'bottom'
)
Stands for particulate matter with aerodynamic diameter less than 2.5 \(\mu\)m . It is also known as fine particulate matter. They are produce by direct emissions (vehicular emissions or mechanical abrasion) or by inorganic and organic heterogeneous reactions from gas to particle. It is of great interest, as it can impact human health as it can enter to the respiratory system and agraviate cardiovascular and respiratory diseases (Oke et al. 2017) .
Figure 2 shows the variation of yearly concentration in different WHO regions. We can see that Europe and America present the lower concentration over the years. On the other hands, during 2010’s Western Pacific and South East Asia presented higher values, but the later present a continuous reduction of PM2.5 along the years. Since 2020, there was an increased in Europe and the Eastern Mediterranean regions.
who_year_avg |>
ggplot(mapping = aes(
x = year,
y = pm25_mean,
group = who_region_name
)) +
geom_line(aes(color = who_region_name), size = 1.25) +
geom_point(aes(color = who_region_name), size = 3) +
scale_x_continuous(breaks = scales::pretty_breaks()) +
scale_colour_manual(
values=c(
'Africa'="#79ADE6",
'America' = "#E67879",
'South-East Asia'="#9FE778",
'Europe'="#E179E7",
'Eastern Mediterranean'="#E8E962",
'Western Pacific' ="#7AE6CF",
'Non-member state' ="#FF9933",
'World' = 'black'
))+
labs(
x = '',
y = expression('PM'[2.5] * ' concentration ('* mu *'g/m' ^3 * ')'),
color = ''
) +
theme_bw(
base_size = 14
) +
theme(legend.position = 'bottom')
NO2 stands for Nitrogen dioxide. It is gas mainly produce by fuel combustion of vehicules or power plants (Oke et al. 2017). It can cause or agraviate respiratory deseases. It is one of the precursors, together with volatile organic compounds (VOCs), to for tropospheric ozone (O3).
Figure 3 shows the variation of yearly concentration. Eastern Mediterranean region presented higher concentration of NO2 while cities from Non-member state presented less concentration. There is a global trend in the reduction of NO2.
who_year_avg |>
ggplot(mapping = aes(
x = year,
y = no2_mean,
group = who_region_name
)) +
geom_line(aes(color = who_region_name), size = 1.25) +
geom_point(aes(color = who_region_name), size = 3 ) +
scale_x_continuous(breaks = scales::pretty_breaks()) +
scale_colour_manual(
values=c(
'Africa'="#79ADE6",
'America' = "#E67879",
'South-East Asia'="#9FE778",
'Europe'="#E179E7",
'Eastern Mediterranean'="#E8E962",
'Western Pacific' ="#7AE6CF",
'Non-member state' ="#FF9933",
'World' = 'black'
))+
labs(
x = '',
y = expression('NO'[2] * ' concentration ('* mu *'g/m' ^3 * ')'),
color = ''
) +
theme_bw(
base_size = 14
) +
theme(legend.position = 'bottom')