Show R packages used
library(tidyverse)
library(here)
library(lubridate)Understanding climate variability is essential for assessing water security challenges in rapidly growing urban areas such as Kisumu County, Kenya. Daily weather patterns, particularly rainfall, temperature, and humidity, affect both the reliability of water services and the coping strategies of consumers. This project analyzes publicly available daily weather data from 2021 to 2024 to explore seasonal trends and detect periods of extreme weather events, such as droughts and floods, that may contribute to water service disruptions (Arrighi et al., 2017; Borah, 2025). Using established climate indices and anomaly-based methods, the analysis characterizes recent hydro-meteorological variability relevant to water access and operational planning. Although the 2021–2024 period does not constitute a full climatological record, it provides meaningful insight into short-term variability affecting Kisumu’s water
Describe seasonal rainfall, temperature, and humidity patterns in Kisumu.
Identify extreme rainfall events using WMO categories and standardized anomalies.
Summarize year-to-year differences in key climate metrics
Daily weather data for Kisumu County covering the period 2021–2024 were obtained from the Visual Crossing Weather API, which synthesizes ground-based station observations with satellite-supported interpolation to provide continuous spatial and temporal coverage (Visual Crossing Corporation, 2025). The raw dataset included multiple meteorological variables, from which three were retained for analysis: daily precipitation (mm), mean temperature (°C), and relative humidity (%).
Data processing was conducted in R using the tidyverse framework. Raw records were imported, inspected for missing or duplicate entries, and cleaned by standardizing date formats, renaming variables for consistency, and removing rows with incomplete observations. The cleaned dataset was saved as an analysis-ready file in the project’s “data/processed/” directory.
Two complementary analytical approaches were used to characterize climate variability. First, daily precipitation values were classified using rainfall intensity categories consistent with the World Meteorological Organization (WMO), distinguishing dry, wet, heavy, and severe rainfall events (Ongoma et al., 2018; Ramadhan et al., 2022). Second, standardized anomalies were computed following established methods in climate analysis, expressing deviations from long-term means in units of standard deviation to identify unusually wet or dry periods McKee et al. (1993); Akinsanola & Ogunjobi (2014); Arya & Rao (2023). Weekly aggregations of anomalies were used to capture short-term variability relevant to water-service reliability while maintaining statistical robustness.
Together, these procedures produced a cleaned and reproducible dataset, along with derived rainfall and anomaly indicators, that support exploration of seasonal patterns and the identification of extreme weather events across the 2021–2024 period.
The following code chunks document the workflow used to prepare the Kisumu climate dataset for analysis.
Steps include loading required packages, importing the raw Visual Crossing data, inspecting and cleaning the dataset, and saving an analysis-ready version to the “data/processed/” directory.
library(tidyverse)
library(here)
library(lubridate)raw_data <- read_csv(
here::here("data/raw/Weather_data_Kisumu_Kenya_2021_2024.csv")
)glimpse(raw_data)Rows: 1,461
Columns: 32
$ name <chr> "kisumu, Kenya", "kisumu, Kenya", "kisumu, Kenya", "k…
$ datetime <date> 2021-01-01, 2021-01-02, 2021-01-03, 2021-01-04, 2021…
$ tempmax <dbl> 27.0, 27.0, 28.0, 28.0, 25.0, 25.0, 28.0, 29.3, 22.3,…
$ tempmin <dbl> 10.0, 18.0, 14.0, 8.0, 10.0, 11.0, 10.0, 10.0, 18.4, …
$ temp <dbl> 21.0, 22.4, 22.8, 20.1, 18.2, 19.2, 19.7, 20.8, 19.7,…
$ feelslikemax <dbl> 27.4, 28.1, 28.8, 28.8, 25.0, 25.0, 28.6, 29.9, 22.3,…
$ feelslikemin <dbl> 9.1, 18.0, 14.0, 6.3, 7.9, 11.0, 7.7, 8.7, 18.4, 18.3…
$ feelslike <dbl> 21.0, 22.5, 22.9, 20.0, 18.0, 19.2, 19.7, 20.9, 19.7,…
$ dew <dbl> 17.2, 17.5, 17.9, 14.2, 8.7, 8.8, 9.7, 13.1, 16.6, 16…
$ humidity <dbl> 78.8, 75.3, 75.5, 70.9, 58.6, 56.0, 57.4, 64.9, 82.4,…
$ precip <dbl> 21.2, 4.2, 2.8, 0.2, 0.0, 0.4, 0.0, 0.0, 17.8, 8.6, 8…
$ precipprob <dbl> 100, 100, 100, 100, 0, 100, 0, 0, 100, 100, 100, 100,…
$ precipcover <dbl> 95.83, 62.50, 25.00, 8.33, 0.00, 16.67, 0.00, 0.00, 4…
$ preciptype <chr> "rain", "rain", "rain", "rain", NA, "rain", NA, NA, "…
$ snow <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ snowdepth <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ windgust <dbl> 32.4, 31.7, 27.4, 27.7, 29.9, 23.8, 28.8, 23.0, 39.2,…
$ windspeed <dbl> 20.5, 24.1, 24.1, 25.9, 22.3, 24.1, 29.5, 29.4, 18.4,…
$ winddir <dbl> 219.2, 241.3, 224.4, 186.7, 119.8, 116.7, 114.3, 174.…
$ sealevelpressure <dbl> 1015.6, 1016.4, 1016.1, 1014.1, 1012.2, 1012.9, 1011.…
$ cloudcover <dbl> 78.4, 65.7, 56.9, 60.9, 54.9, 29.2, 11.9, 51.8, 86.8,…
$ visibility <dbl> 10.2, 10.0, 10.0, 11.7, 11.7, 10.0, 11.8, 13.1, NA, N…
$ solarradiation <dbl> 211.1, 270.3, 289.4, 296.5, 288.6, 252.2, 303.0, 257.…
$ solarenergy <dbl> 18.3, 23.4, 24.9, 25.6, 24.8, 21.7, 26.2, 22.2, 15.2,…
$ uvindex <dbl> 8, 10, 10, 10, 10, 9, 10, 9, 8, 7, 9, 7, 9, 7, 9, 8, …
$ severerisk <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ sunrise <dttm> 2021-01-01 06:40:48, 2021-01-02 06:41:16, 2021-01-03…
$ sunset <dttm> 2021-01-01 18:48:22, 2021-01-02 18:48:50, 2021-01-03…
$ moonphase <dbl> 0.58, 0.61, 0.65, 0.69, 0.72, 0.75, 0.79, 0.83, 0.86,…
$ conditions <chr> "Rain, Partially cloudy", "Rain, Partially cloudy", "…
$ description <chr> "Partly cloudy throughout the day with a chance of ra…
$ icon <chr> "rain", "rain", "rain", "rain", "partly-cloudy-day", …
processed_data <- raw_data |>
select(datetime, temp, precip, humidity) |> # keep only required variables
rename(
date = datetime,
tmean_c = temp,
precip_mm = precip,
rh_pct = humidity
) |>
mutate(date = as.Date(date)) |> # ensure date format
drop_na(date, tmean_c, precip_mm, rh_pct) # remove missing rows only in key columnswrite_csv(
processed_data,
here::here("data/processed/kisumu_weather_2021_2024.csv")
)
summary(processed_data) date tmean_c precip_mm rh_pct
Min. :2021-01-01 Min. :17.90 Min. : 0.000 Min. :35.50
1st Qu.:2022-01-01 1st Qu.:21.60 1st Qu.: 0.800 1st Qu.:65.90
Median :2023-01-01 Median :22.40 Median : 2.900 Median :71.30
Mean :2023-01-01 Mean :22.39 Mean : 5.611 Mean :70.27
3rd Qu.:2024-01-01 3rd Qu.:23.20 3rd Qu.: 7.900 3rd Qu.:75.70
Max. :2024-12-31 Max. :27.20 Max. :75.300 Max. :90.00
The cleaned dataset enables systematic examination of climate variability across the 2021-2024 period. The figures and summaries below describe seasonal behavior, highlight atypical wet and dry periods, and identify extreme weather events that may contribute to service disruptions.
To describe overall climate behavior, monthly averages of rainfall, temperature, and humidity were calculated for 2021–2024. Kisumu shows clear seasonal cycles, with distinct wet and relatively drier periods and corresponding shifts in temperature and humidity. These patterns provide a baseline context for interpreting extreme events and anomalies in subsequent analyses.
library(lubridate) # for year() and month() helpers
monthly_climate <- processed_data |>
mutate(
year = year(date),
month = month(date, label = TRUE, abbr = TRUE)
) |>
group_by(year, month) |>
summarise(
precip_mm = sum(precip_mm, na.rm = TRUE), # monthly total rainfall
tmean_c = mean(tmean_c, na.rm = TRUE), # monthly mean temperature
rh_pct = mean(rh_pct, na.rm = TRUE), # monthly mean humidity
.groups = "drop"
) |>
pivot_longer(cols = c(precip_mm, tmean_c, rh_pct),
names_to = "variable",
values_to = "value"
) # long format for faceting
ggplot(
monthly_climate,
aes(x = month, y = value, color = factor(year), group = year)
) +
geom_line(size = 1) +
facet_wrap(
~ variable,
scales = "free_y",
ncol = 1,
labeller = labeller(variable = c(
precip_mm = "Monthly rainfall (mm)",
rh_pct = "Relative humidity (%)",
tmean_c = "Mean temperature (°C)"
))
) +
scale_color_brewer(palette = "Dark2", name = "Year") +
theme_minimal()As shown in Figure 1, rainfall, temperature, and humidity follow clear seasonal cycles. Rainfall peaks in March-April and again in October-November, while January-February and July-August are noticeably drier. Humidity closely tracks rainfall, whereas temperatures show a modest dip during the mid-year dry season.
To characterize daily rainfall extremes, each day was classified into intensity categories based on WMO-style thresholds, and annual counts of events were summarized for 2021–2024.
library(dplyr)
library(ggplot2)
library(lubridate)
rain_intensity <- processed_data |>
mutate(
year = year(date),
rain_cat = case_when(
precip_mm < 1 ~ "Dry (<1 mm)",
precip_mm >= 1 & precip_mm < 10 ~ "Wet (1–9.9 mm)",
precip_mm >= 10 & precip_mm < 20 ~ "Heavy (10–19.9 mm)",
precip_mm >= 20 & precip_mm < 50 ~ "Very heavy (20–49.9 mm)",
precip_mm >= 50 ~ "Severe (≥50 mm)"
)
) |>
filter(!is.na(rain_cat)) |>
count(year, rain_cat, name = "days") |>
mutate(
rain_cat = factor(
rain_cat,
levels = c(
"Dry (<1 mm)",
"Wet (1–9.9 mm)",
"Heavy (10–19.9 mm)",
"Very heavy (20–49.9 mm)",
"Severe (≥50 mm)"
)
)
)
ggplot(rain_intensity, aes(x = factor(year), y = days, fill = rain_cat)) +
geom_col(position = "stack") +
labs(
x = "Year",
y = "Number of days",
fill = "Rainfall category"
) +
theme_minimal()As shown in Figure 2 most days in all four years are either dry or have light rainfall (1–9.9 mm), with relatively few heavy or very heavy rainfall days and only occasional severe events (≥50 mm). The proportion of wet and heavy days varies slightly between years, hinting at interannual differences in the intensity of the rainy seasons.
rain_binary <- processed_data |>
mutate(
year = year(date),
wet_flag = precip_mm >= 1,
wet_label = if_else(wet_flag, "Wet (≥1 mm)", "Dry (<1 mm)")
) |>
count(year, wet_label, name = "days") |>
mutate(
wet_label = factor(
wet_label,
levels = c("Dry (<1 mm)", "Wet (≥1 mm)")
)
)
ggplot(rain_binary, aes(x = factor(year), y = days, fill = wet_label)) +
geom_col(position = "stack") +
labs(
x = "Year",
y = "Number of days",
fill = "Day type"
) +
theme_minimal()Figure 3 shows that each year includes a substantial number of dry days, but wet days (≥1 mm) still make up a large share of the annual total. This confirms that Kisumu experiences frequent rainfall events throughout the year, even though only a small subset of days fall into the heavier intensity categories shown in Figure 2.
To visually inspect how rainfall, temperature, and humidity evolve over time, weekly averages were calculated for the 2021–2024 period. The time series below highlights the timing of wet and dry spells, along with co-occurring changes in temperature and humidity, providing a bridge between the raw data and the standardized anomaly analysis.
library(lubridate)
# 1. Aggregate to weekly means
weekly_climate <- processed_data |>
mutate(week = floor_date(date, unit = "week")) |>
group_by(week) |>
summarise(
precip_mm = sum(precip_mm, na.rm = TRUE), # weekly total rainfall
tmean_c = mean(tmean_c, na.rm = TRUE), # weekly mean temperature
rh_pct = mean(rh_pct, na.rm = TRUE), # weekly mean humidity
.groups = "drop"
) |>
# 2. Long format for faceting
tidyr::pivot_longer(
cols = c(precip_mm, tmean_c, rh_pct),
names_to = "variable",
values_to = "value"
)
# 3. Plot weekly time series
ggplot(weekly_climate, aes(x = week, y = value)) +
geom_line(linewidth = 0.6) +
facet_wrap(
~ variable,
ncol = 1,
scales = "free_y",
labeller = labeller(variable = c(
precip_mm = "Weekly rainfall (mm)",
tmean_c = "Weekly mean temperature (°C)",
rh_pct = "Weekly mean relative humidity (%)"
))
) +
labs(
x = "Week",
y = "Value"
) +
theme_minimal()As shown in Figure 4 rainfall exhibits sharp weekly peaks corresponding to the wet seasons, while temperature and humidity vary more smoothly. Several clusters of high weekly rainfall coincide with periods of elevated humidity, foreshadowing the flood-prone episodes examined in the anomaly analysis.
To evaluate how rainfall, temperature, and humidity deviate from their typical conditions, this report applies the Standardized Anomaly Index (SAI). The SAI expresses each observed value as the number of standard deviations it lies above or below the long-term mean, allowing different climate variables, and different time periods, to be compared on a common scale.
The general form of the SAI is:
\[ \phi = \frac{x - \bar{x}}{\sigma} \]
where
- (x) is the observed value (rainfall, temperature, or humidity),
- ({x}) is the long-term mean, and
- () is the standard deviation of the variable.
For time-series analysis, such as weekly rainfall, temperature, or humidity, the SAI is written in its time-dependent form:
\[ \phi_t = \frac{x_t - \bar{x}}{\sigma}, \]
where
- (x_t) is the observed value in week (t), and
- (_t) is the anomaly index for week (t).
This formulation is consistent with implementations the Standardized Precipitation Index and related anomaly methods used in climate studies (Akinsanola & Ogunjobi, 2014; Arya & Rao, 2023; McKee et al., 1993).
To highlight weeks that are unusually wet or dry relative to typical conditions, weekly rainfall totals were converted into a Standardized Anomaly Index (SAI). For each week, total precipitation was aggregated and then expressed as deviations from the long-term weekly mean in units of standard deviation.
Table 1 summarizes how different values of \(\phi_t\) are interpreted in this report ( see Table 1 ) (Akinsanola & Ogunjobi, 2014; Arya & Rao, 2023; McKee et al., 1993).
| SAI.range | Rainfall.regime |
|---|---|
| ϕₜ ≥ 2.0 | Extremely wet |
| 1.5 ≤ ϕₜ < 2.0 | Very wet |
| 1.0 ≤ ϕₜ < 1.5 | Moderately wet |
| −1.0 < ϕₜ < 1.0 | Near normal |
| −1.5 < ϕₜ ≤ −1.0 | Moderately dry |
| −2.0 < ϕₜ ≤ −1.5 | Very dry |
| ϕₜ ≤ −2.0 | Extremely dry |
As shown below in Figure 5, most weeks lie within one standard deviation of the long-term mean, indicating near-normal rainfall. However, several clusters of strongly positive anomalies (≥ 1σ) and a smaller number of negative anomalies (≤ -1σ) stand out, corresponding to the sharp peaks and dips already visible in the weekly time series. These episodes represent the most hydrologically significant wet and dry spells in the 2021–2024 period and are candidates for linking with flood and service disruption events in future work.The largest positive anomalies occur in mid-2021, late 2023, and early 2024, while notable dry anomalies appear intermittently during mid-year transitions.
library(lubridate)
# 1. Aggregate daily rainfall to weekly totals
weekly_rain <- processed_data |>
mutate(week = floor_date(date, unit = "week")) |> # collapse each date to its week
group_by(week) |>
summarise(
precip_mm = sum(precip_mm, na.rm = TRUE), # total weekly rainfall
.groups = "drop"
)
# 2. Compute long-term weekly mean and standard deviation
mu_week <- mean(weekly_rain$precip_mm, na.rm = TRUE) # long-term mean
sd_week <- sd(weekly_rain$precip_mm, na.rm = TRUE) # long-term SD
# 3. Calculate the Standardized Anomaly Index (SAI) and categorize
weekly_rain <- weekly_rain |>
mutate(
# Standardized anomaly index (SAI): φ_t = (x_t - x̄) / σ
sai = (precip_mm - mu_week) / sd_week,
# Qualitative categories for interpretation (as in Table 1)
anomaly_cat = dplyr::case_when(
sai >= 2.0 ~ "Extremely wet (≥ 2.0σ)",
sai >= 1.5 ~ "Very wet (1.5–2.0σ)",
sai >= 1.0 ~ "Moderately wet (1.0–1.49σ)",
sai <= -2.0 ~ "Extremely dry (≤ -2.0σ)",
sai <= -1.5 ~ "Very dry (-2.0– -1.5σ)",
sai <= -1.0 ~ "Moderately dry (-1.49– -1.0σ)",
TRUE ~ "Near normal (|σ| < 1.0)"
)
)
# 4. Plot SAI time series with thresholds
ggplot(weekly_rain, aes(x = week, y = sai)) +
geom_hline(yintercept = 0, linetype = "dashed") + # Zero line (normal conditions)
geom_hline(yintercept = c(-1, 1), linetype = "dotted") + # ±1 SD thresholds (moderate anomalies)
geom_line(linewidth = 0.6, colour = "black") + # Continuous SAI time series
geom_point(
data = dplyr::filter(weekly_rain, abs(sai) >= 1), # Highlight weeks with |SAI| ≥ 1 using coloured points
aes(color = anomaly_cat),
size = 1.6
) +
scale_color_brewer(palette = "Set1", name = "Anomaly class") +
labs(
x = "Week",
y = "Standardized rainfall anomaly (SAI, σ units)"
) +
theme_minimal()library(lubridate)
# 1. Aggregate to weekly means for temperature and humidity
weekly_trh <- processed_data |>
mutate(
week = floor_date(date, unit = "week") # assign each day to a calendar week
) |>
group_by(week) |>
summarise(
tmean_c = mean(tmean_c, na.rm = TRUE), # weekly mean temperature (°C)
rh_pct = mean(rh_pct, na.rm = TRUE), # weekly mean relative humidity (%)
.groups = "drop"
)
# 2. Compute long-term mean and standard deviation for each variable
tmean_mean <- mean(weekly_trh$tmean_c, na.rm = TRUE)
tmean_sd <- sd(weekly_trh$tmean_c, na.rm = TRUE)
rh_mean <- mean(weekly_trh$rh_pct, na.rm = TRUE)
rh_sd <- sd(weekly_trh$rh_pct, na.rm = TRUE)
# 3. Calculate the Standardized Anomaly Index (SAI) for each week
weekly_trh_anom <- weekly_trh |>
mutate(
tmean_sai = (tmean_c - tmean_mean) / tmean_sd, # temperature anomaly in σ units
rh_sai = (rh_pct - rh_mean) / rh_sd # humidity anomaly in σ units
) |>
select(week, tmean_sai, rh_sai) |>
pivot_longer(
cols = c(tmean_sai, rh_sai),
names_to = "variable",
values_to = "sai"
)
# 4. Plot SAI time series for temperature and humidity
ggplot(weekly_trh_anom, aes(x = week, y = sai)) +
geom_hline(yintercept = 0, linetype = "dashed", colour = "grey50") + # long-term mean
geom_line() +
facet_wrap(
~ variable,
ncol = 1,
labeller = labeller(variable = c(
tmean_sai = "Temperature anomaly (SAI)",
rh_sai = "Humidity anomaly (SAI)"
))
) +
labs(
x = "Week",
y = "Standardized anomaly index (φ)"
) +
theme_minimal()As shown in Figure 6, weekly temperature and humidity anomalies remain within ±1σ for most of the 2021–2024 period, indicating generally near-normal conditions. Several warm episodes (positive temperature SAI) coincide with reduced humidity, hinting at short hot–dry spells, while clusters of positive humidity anomalies tend to overlap with wet periods identified in the rainfall analysis. These patterns suggest that rainfall extremes in Kisumu are often accompanied by modest but coherent shifts in temperature and humidity, reinforcing their relevance as secondary stressors on urban water services.
library(dplyr)
library(lubridate)
library(gt)
monthly_stats <- processed_data |>
mutate(
year = year(date),
month = month(date, label = TRUE, abbr = TRUE)
) |>
group_by(year, month) |>
summarise(
total_rain_mm = sum(precip_mm, na.rm = TRUE),
mean_temp_C = mean(tmean_c, na.rm = TRUE),
mean_rh_pct = mean(rh_pct, na.rm = TRUE),
.groups = "drop"
)
# Wettest month per year
wettest_months <- monthly_stats |>
group_by(year) |>
slice_max(total_rain_mm, n = 1) |>
select(year, wettest_month = month)
# Most humid month per year
most_humid_months <- monthly_stats |>
group_by(year) |>
slice_max(mean_rh_pct, n = 1) |>
select(year, most_humid_month = month)
# Warmest month per year
warmest_months <- monthly_stats |>
group_by(year) |>
slice_max(mean_temp_C, n = 1) |>
select(year, warmest_month = month)
annual_summary <- monthly_stats |>
group_by(year) |>
summarise(
total_rain_mm = sum(total_rain_mm),
mean_temp_C = mean(mean_temp_C),
mean_rh_pct = mean(mean_rh_pct),
heavy_rain_days = sum(processed_data$precip_mm > 10 & year(processed_data$date) == year),
.groups = "drop"
) |>
left_join(wettest_months, by = "year") |>
left_join(most_humid_months, by = "year") |>
left_join(warmest_months, by = "year")
library(gt)
annual_summary |>
gt() |>
tab_header(
title = "Summary of annual climate indicators for Kisumu (2021–2024)"
)| Summary of annual climate indicators for Kisumu (2021–2024) | |||||||
| year | total_rain_mm | mean_temp_C | mean_rh_pct | heavy_rain_days | wettest_month | most_humid_month | warmest_month |
|---|---|---|---|---|---|---|---|
| 2021 | 1931.6 | 22.19201 | 70.15643 | 66 | Apr | May | Mar |
| 2022 | 1976.5 | 22.07704 | 70.63223 | 65 | Nov | May | Mar |
| 2023 | 2142.3 | 22.50816 | 69.00737 | 79 | Nov | Nov | Feb |
| 2024 | 2147.5 | 22.77505 | 71.26969 | 75 | Apr | Apr | Mar |
The annual summary in Table 2 shows that total rainfall varies moderately between years, ranging from approximately 1,930 mm in 2021 to more than 2,140 mm in 2023 and 2024. Mean temperature and relative humidity remain broadly stable across the period (22.2–22.8 °C), although 2024 stands out as both the warmest and most humid year. Heavy-rain days (>10 mm) peak in 2023 and 2024, consistent with the late-year clusters of strong positive rainfall anomalies identified in the SAI analysis. The wettest and most humid months continue to align with the long and short rainy seasons (April and October–November), reinforcing the strong seasonality highlighted earlier in the results.
Kisumu’s climate from 2021–2024 shows strong and predictable seasonality, with rainfall maxima in March–April and October–November, and corresponding shifts in humidity and mild temperature dips during dry periods.
Extreme rainfall events are infrequent, but clusters of heavy and very heavy rainfall align with the most pronounced positive anomalies and coincide with elevated humidity.
Weekly anomaly analysis indicates that temperature and humidity remain mostly within ±1σ of long-term averages, acting as secondary but coherent stressors during major rainfall episodes.
Year-to-year variability is moderate but meaningful: 2023–2024 had higher rainfall totals and more heavy-rain days, while 2024 was the warmest and most humid year.
These climate patterns highlight periods when Kisumu’s water services may face elevated risk due to short-term wet extremes or transitional dry spells.