Capstone Project Report

Ambient Air Quality Anaysis

Author

Ochea Ikpa

Published

December 12, 2025

Introduction

Urban air pollution remains a major environmental and public-health challenge in Europe, with particulate matter (PM₂.₅, PM₁₀) and nitrogen dioxide (NO₂) posing significant risks in densely populated cities (EEA, 2025; WHO, 2023). Germany has implemented strong emission-control policies over the past decade, yet pollution patterns continue to vary across cities due to differences in traffic volume, industrial activity, and urban design. In Nasr (2023), Nasr reported that Germany’s compliance with air quality limit values in nearly every city in 2022 did not guarantee that health was protected. This study analyzes long-term trends and interrelationships of key pollutants in four major German cities which includes Berlin, Hamburg, Munich, and Stuttgart to assess air-quality improvements and identify dominant emission influences from 2010 to 2021. Data were obtained from the WHO Ambient Air Quality Database (WHO, 2024).

Code
# Loading libraries/packages
library(tidyverse)
library(readxl)
library(here)
library(dplyr)
library(readr)
library(ggthemes)
library(ggridges)
library(ggplot2)
library(gt)
library(knitr)
library(stringr)
library(lubridate)
library(broom)
library(scales)
library(sf)   
library(janitor)
library(stringr)
library(gt)
library(gtsummary)
library(DT)


# importing raw data
ambient_air <- read_excel(here::here("data/raw/who_ambient_air_quality_database_version_2024_(v6.1).xlsx"), sheet = "Update 2024 (V6.1)")


#  Processing raw data
air_quality <- ambient_air |>
  clean_names() |> # cleaning and standardizing name
  rename_with(~ gsub("\\s+", "_", .x)) |> # dobulechecking by replacing spaces with "_"
  rename(region = who_region,
         country = country_name,
         pm10 = pm10_concentration,
         pm25 = pm25_concentration,
         no2 = no2_concentration,
         station = type_of_stations) |> # renaming variable names
  mutate(region = recode(region, 
                         `1_Afr` = "AF", 
                         `2_Amr` = "AM", 
                         `3_Sear` = "SEA", 
                         `4_Eur` = "EU", 
                         `5_Emr` = "EM", 
                         `6_Wpr` = "WP", 
                         `7_NonMS` = "NonMS")) |>
  mutate(city = str_remove(city, "/[A-Z]{3}$")) |>
  mutate(across(c(pm10, pm25, no2, population),
                as.numeric), year = as.integer(year), city = factor(city)) |>
    filter(!is.na(pm10) & 
         !is.na(pm25) & 
         !is.na(no2) & !is.na(year) & !is.na(population)) |>
  select(region:year,
         pm10:no2, # selcting relevant columns for analysis
         station,
         population) |>
      arrange(region, country, city, year)


# exporting processed data
write_csv(air_quality,
           here::here("data/processed/who_ambient_air_quality_database_version_2024_(v6.1)_processed.csv"))

Methods

Berlin, Hamburg, Stuttgart, and Munich have been selected because they represent Germany’s largest and most diverse metropolitan regions, with distinct emission profiles, geographical settings, and climatic conditions that influence ambient air quality. They also maintain long-term, high-quality air monitoring networks, making them ideal for robust descriptive statistical analysis, trend analysis and inter-city comparison of PM₂.₅, PM₁₀, and NO₂ levels.

Code
# Filter only German cities of interest
germany_cities <- c("Berlin", "Hamburg", "Munich", "Stuttgart")

air_quality_deu <- air_quality |>
  mutate(
    city = case_when(
      city == "Munchen" ~ "Munich",
      TRUE ~ city
      )
  ) |>
  filter(country == "Germany",
         city %in% germany_cities,
         year >= 2010 & year <= 2021)

Results and Discussions

1.0 Descriptive Statistics

Code
desc_stats <- air_quality_deu |>
  group_by(city) |>
  summarise(
    mean_pm25 = mean(pm25),
    sd_pm25   = sd(pm25),
    median_pm25 = median(pm25),
    mean_pm10 = mean(pm10),
    sd_pm10   = sd(pm10),
    median_pm10 = median(pm10),
    mean_no2 = mean(no2),
    sd_no2   = sd(no2),
    median_no2 = median(no2))
Code
desc_stats |>
    rename(`City` = city,
           `Mean(PM₂.₅)` = mean_pm25,
           `SD(PM₂.₅)` = sd_pm25,
            `Median(PM₂.₅)` = median_pm25,
           `Mean(PM₁₀)` = mean_pm10,
           `SD(PM₁₀)` = sd_pm10,
           `Median(PM₁₀)` = median_pm10,
           `Mean(NO₂)` = mean_no2,
           `SD(NO₂)` = sd_no2,
           `Median(NO₂)` = median_no2) |>
    kable(digits = 2) # Using kable() function table presenation and cross-referencing
Table 1: Descriptive Statistical Analysis of Top German Cities’ Ambient Air Quality (µg/m³). Data from 2010 - 2021
City Mean(PM₂.₅) SD(PM₂.₅) Median(PM₂.₅) Mean(PM₁₀) SD(PM₁₀) Median(PM₁₀) Mean(NO₂) SD(NO₂) Median(NO₂)
Berlin 15.89 2.99 15.79 23.06 3.25 23.25 31.59 3.23 30.97
Hamburg 13.26 2.32 13.79 19.82 2.36 20.12 29.58 2.94 30.16
Munich 13.06 2.83 13.48 20.78 3.63 20.45 40.58 9.29 42.56
Stuttgart 13.97 3.45 14.17 24.37 3.98 25.64 52.25 12.23 56.68

Table 1 above summarises the annual mean concentrations and their variability for PM₂.₅, PM₁₀, and NO₂ across four major German cities from 2010 to 2021 and via 3 descriptive measures: Mean, Standard deviations and Median.

Between 2010 and 2021, Hamburg shows the cleanest air across all pollutants, while Stuttgart exhibits the highest concentrations, especially for NO₂, reflecting severe traffic-related pollution and its valley-basin geography. Berlin has the highest PM₂.₅, whereas Munich shows moderate PM but high NO₂, again linked to traffic intensity. Standard deviations confirm that Stuttgart and Munich experience the largest year-to-year fluctuations, while Berlin and Hamburg remain relatively stable. Overall, Table 1 highlights clear spatial contrasts in air quality, shaped by urban structure, meteorology, and traffic patterns.

2.0 Time Series Plots

Code
ggplot(air_quality_deu,
       aes(year,
           pm25,
           color = city)) +
  geom_line(size = 1.1) +
  geom_point() +
  theme_minimal() +
  labs(title = "PM₂.₅ Trend (2010–2021)", y = "PM₂.₅ (µg/m³)")

In Figure 1 all cities improve strongly, with Munich and Stuttgart showing the biggest percentage drops. Stuttgart and Munich begin as the most polluted but converge with others by 2021.

Code
ggplot(air_quality_deu,
       aes(year,
           pm10,
           color = city)) +
  geom_line(size = 1.1) +
  geom_point() +
  theme_minimal() +
  labs(title = "PM₁₀ Trend (2010–2021)", y = "PM₁₀ (µg/m³)")

In Figure 2 Munich and Stuttgart show the biggest improvements. Stuttgart remains high in 2010–2015 due to topography and heavy traffic, then improves significantly.

Code
ggplot(air_quality_deu,
       aes(year,
           no2,
           color = city)) +
  geom_line(size = 1.1) +
  geom_point() +
  theme_minimal() +
  labs(title = "NO₂ Trend (2010–2021)", y = "NO₂ (µg/m³)")

In Figure 3 Stuttgart and Munich start with very high NO₂ levels but both cut their concentrations drastically. Berlin and Hamburg also improve but began from lower starting points.

Code
surf <- air_quality_deu |>
  select(city:no2)

3.0 Correlation Matrix

Code
# Computing correlation matrix
corr_data <- air_quality_deu |>
  select(pm25, pm10, no2)

corr_matrix <- cor(corr_data, use = "complete.obs")

# Converting matrix to tidy format for ggplot
corr_air_quality_deu <- corr_matrix |>
  as.data.frame() |>
  rownames_to_column("var1") |>
  pivot_longer(cols = -var1, names_to = "var2", values_to = "value")

# Plotting heatmap
ggplot(corr_air_quality_deu,
       aes(var1, var2, fill = value)) +
  geom_tile() +
  geom_text(aes(label = round(value, 2)), size = 5) +
  scale_fill_gradient2(low = "blue", high = "red", mid = "white", midpoint = 0) +
  theme_minimal() +
  labs(title = "Correlation Heatmap", x = "", y = "")
Figure 4: Correlation heatmap showing relationships among annual mean concentrations of PM₂.₅, PM₁₀, and NO₂.

Figure 4 shows the following pollutants correlation:

  • PM₂.₅ vs PM₁₀

PM₂.₅ and PM₁₀ show a strong positive correlation, indicating they rise and fall together due to shared emission sources such as traffic, combustion, and resuspended dust.

  • PM₂.₅ vs NO₂

PM₂.₅ and NO₂ are moderately to strongly correlated, suggesting both pollutants are strongly influenced by vehicular emissions and urban combustion activities.

  • PM₁₀ vs NO₂

PM₁₀ and NO₂ exhibit a moderate positive correlation, reflecting the partial overlap between coarse particle sources (like dust) and traffic-related NO₂ emissions.

Conclusions

  • Time-series trends show a clear decline in PM₂.₅, PM₁₀, and NO₂ across Berlin, Hamburg, Munich, and Stuttgart, indicating improving air quality over the 2010–2021 period.

  • Descriptive statistics show Stuttgart as the most polluted and Hamburg as the least.

  • Strong correlations exist, especially between PM₂.₅ and PM₁₀ which highlights shared emission sources dominated by traffic and combustion.

  • Overall, the analyses confirm substantial progress in urban air-quality management driven by stricter emission controls and environmental policies.

References

EEA. (2025). \(H\)Ow Air Pollution Affects Our Health. In Europa.eu. https://www.eea.europa.eu/en/topics/in-depth/air-pollution/how-it-affects-our-health.
Nasr, J. (2023). Germany Complied with Air Quality Limit Values Nearly Everywhere in 2022. In Umweltbundesamt. https://www.umweltbundesamt.de/en/press/pressinformation/germany-complied-air-quality-limit-values-nearly; Umweltbundesamt.
WHO. (2023). Air quality. In Who.int. https://www.who.int/europe/news-room/fact-sheets/item/air-quality.
WHO. (2024). WHO Ambient Air Quality Database (Update Jan 2024). https://www.who.int/publications/m/item/who-ambient-air-quality-database-(update-jan-2024).

Reuse