Code
library(tidyverse)
library(here)
library(readxl)
install.packages("ggthemes")
library(ggthemes)
library(gt)
library(gtsummary)
library(knitr)
library(DT)This report presents an assessment of microbiological drinking water quality testing from the point of abstraction(boreholes) to household use in Kumi, Kaabong, Nakaseke and Luuka districts of Uganda. The study examines changes in contamination levels at three key stages that is to say, borehole water sources, household water transportation containers, and household drinking water storage containers.
The main aim of this analysis is to classify E. coli results into five contamination levels for each sample type and to compare results across the water chain. This approach supports identification of where contamination is most predominant at the source, during collection and transport, or during household storage.
This section describes methods used to collect the data used for this analysis.
Water quality monitoring was conducted monthly from November 2024 to September 2025. Sampling information and microbial results were recorded digitally using the mWater Surveyor application, with data captured in two separate forms.
For each borehole, water samples were collected directly from the source and from three households that relied on that borehole. At the household level, samples were taken from both the jerrycan used for water collection and the container used for drinking water storage.
All samples were transported to accredited local laboratories, where they were analysed for E. coli concentration (per 100 mL) using the membrane filtration method, incubated for 24hrs and results recorded in an Mwater survey.
library(tidyverse)
library(here)
library(readxl)
install.packages("ggthemes")
library(ggthemes)
library(gt)
library(gtsummary)
library(knitr)
library(DT)#Importing the datasets
raw_data1 <- readxl::read_excel(here::here("data", "raw", "Sample_collection.xlsx"))
raw_data2 <- readxl::read_excel(here::here("data", "raw", "Sample_E.coliresults.xlsx"))#Selecting relevant columns
#raw_data1: Sample collection
sample_data <- raw_data1|>
select(
`Enter date and time of sampling`,
`Select the District`,
`Select the region`,
`Select the type of survey being taken`,
`Enter the source name`,
`Ask a community member to make notes on the taste of the water`,
`Ask a community member to make notes about the smell of the water`,
`What is the colour of the sampled water`,
`Select the current type of storage technology/ container for drinking water being used`,
`Please specify the type of storage container for drinking water`,
`What medium/ type of container do they use to collect water from the borehole?`,
`Take a sample from the transportation medium/ container used by the household and enter sample bottle number`,
`Take another sample from the household storage container and enter the sample bottle number.`,
`Enter the sample bottle number`)
#raw_data2: E.coli testing results
sample_result <- raw_data2 |>
select(`Enter the date and time now`,
`Select region`,
`Select district`,
`Enter sample bottle number`,
`Enter sample petri dish number`,
`Enter sample incubation start time`,
`Count the number of colonies and enter the number here. If there are >100 colonies, meaning "too numerous to count", enter 101.`,
`What was the volume of water that was filtered when this sample was analysed?`
)#Renaming columns
#sample_data
samples_data <- sample_data |> rename(date = `Enter date and time of sampling`,
district = `Select the District`,
region = `Select the region`,
survey = `Select the type of survey being taken`,
source = `Enter the source name`,
taste = `Ask a community member to make notes on the taste of the water`,
smell = `Ask a community member to make notes about the smell of the water`,
color = `What is the colour of the sampled water`,
storage_container = `Select the current type of storage technology/ container for drinking water being used`,
other_container = `Please specify the type of storage container for drinking water`,
tp_container = `What medium/ type of container do they use to collect water from the borehole?`,
storage_num = `Take a sample from the transportation medium/ container used by the household and enter sample bottle number`,
tp_num = `Take another sample from the household storage container and enter the sample bottle number.`,
bh_num = `Enter the sample bottle number`)
#sample_result
samples_result <- sample_result |>
rename(result_date = `Enter the date and time now`,
region = `Select region`,
district = `Select district`,
sample_num = `Enter sample bottle number`,
dish_num = `Enter sample petri dish number`,
test_date = `Enter sample incubation start time`,
E.coli_CFUs = `Count the number of colonies and enter the number here. If there are >100 colonies, meaning "too numerous to count", enter 101.`,
volume = `What was the volume of water that was filtered when this sample was analysed?`
)
samples_data$date_column <- as.Date(samples_data$date, format = "%Y-%m-%d")#Transferring all sample names and bottle numbers into one column
#samples names
samples_names_2425 <- samples_data |>
filter(date_column >= as.Date("2024-11-1") & date_column <= as.Date("2025-9-22"))|>
mutate(bh = case_when(
survey == "Borehole" & !is.na(source) ~ source ,
TRUE~ NA_character_)) |>
relocate(bh, .before = storage_container) |>
mutate(storage_container = ifelse(
storage_container == "Other (please specify)",
other_container,
storage_container )) |>
relocate(other_container, .before = bh) |>
relocate(date_column, .before = date)
samples_names_2425|>
count(survey)
#Creating all samples names column
samples_names_final <-samples_names_2425 |>
pivot_longer(
cols = c(bh,storage_container, tp_container),
names_to = "Sample_Type",
values_to = "Sample_Name")
samples_names_final|>
count(survey)
samples_names_final |>
count(Sample_Type)
samples_names_final |>
count(Sample_Name)
#Creating all sample collection bottle numbers column
samples_data_final <- samples_names_final |>
mutate(Sample_Num =
case_when(Sample_Type == "bh" ~ bh_num,
Sample_Type == "storage_container" ~ storage_num,
Sample_Type == "tp_container" ~ tp_num,
.default = NA)) |>
filter(!is.na(Sample_Num))
glimpse(samples_data_final)
samples_data_final |>
count(Sample_Type)
#Final cleaned sample collection data
samples_data_cleaned <-samples_data_final |>
filter(!is.na(Sample_Name)) |>
select(date_column,
district,
region,
survey,
source,
taste,
smell,
color,
Sample_Type,
Sample_Name,
Sample_Num) |>
arrange(desc(region), desc(date_column))|> rename(sample_type = Sample_Type,
sample_name = Sample_Name,
sample_num = Sample_Num,
test_date = date_column) |> mutate(region = case_when(
region == "Busoga Region" ~ "Busoga Region",
region == "Busoga Region, Central Region, Teso Region" ~ "Busoga Region",
region == "Central Region" ~ "Central Region",
region == "Karamoja Region" ~ "Karamoja Region",
region == "Teso Region" ~ "Teso Region",
region == "Teso Region, Central Region, Busoga Region" ~ "Central Region",
region == "Teso Region, Karamoja Region" ~ "Teso Region",
TRUE ~ region
)) |>
mutate(district = case_when(
region == "Teso Region" ~ "Kumi",
region == "Busoga Region" ~ "Kamuli",
region == "Central Region" ~ "Nakaseke",
region == "Karamoja Region" ~ "Kabong",
TRUE ~ district ))
samples_data_cleaned |> count(survey)
samples_data_cleaned |>
count(region)
samples_data_cleaned |>
count(district)
samples_data_cleaned |>
count(sample_type)#Final cleaned sample E.coli result
samples_result$result_date<- as.Date(samples_result$result_date, format = "%Y-%m-%d")
samples_result$test_date<- as.Date(samples_result$test_date, format = "%Y-%m-%d")
#Matching dates as in samples data df
samples_result_2425<- samples_result |> filter(result_date >= as.Date("2024-11-1") & result_date <= as.Date("2025-9-22")) |>
filter(test_date >= as.Date("2024-11-1") & test_date <= as.Date("2025-9-22"))
#Removing redundant columns
samples_result_final <- samples_result_2425|>select(district,region,
sample_num,
dish_num,
E.coli_CFUs,
volume,
test_date,
result_date) |> mutate(region = case_when(
region == "Busoga Region" ~ "Busoga Region",
region == "Busoga Region, Central Region, Teso Region" ~ "Busoga Region",
region == "Central Region" ~ "Central Region",
region == "Karamoja Region" ~ "Karamoja Region",
region == "Teso Region" ~ "Teso Region",
region == "Teso Region, Central Region, Busoga Region" ~ "Central Region",
region == "Teso Region, Karamoja Region" ~ "Teso Region",
TRUE ~ region)) |> mutate(district = case_when(
region == "Teso Region" ~ "Kumi",
region == "Busoga Region" ~ "Kamuli",
region == "Central Region" ~ "Nakaseke",
region == "Karamoja Region" ~ "Kabong",
TRUE ~ district )) |> arrange(desc(region), desc(result_date))
samples_result_final |>
count(region)
samples_result_final |>
count(district)
samples_result_final |>
count(test_date)#Joining the sample data and sample result
samples_joined <- samples_data_cleaned %>%
right_join(samples_result_final, by = c("test_date" = "test_date",
"sample_num" = "sample_num",
"region" = "region",
"district" = "district")) |>
relocate(result_date, .after = test_date)
samples_joined |>
filter(!is.na(survey))
samples_joined |> count(region)
samples_joined |> count(survey)
samples_joined |> count(sample_type)
head(samples_joined)#Correcting typos in rows of the final dataset containing both sample collection and sample result data
samples_joined_regions <- samples_joined |> mutate(region = case_when(
region == "Busoga Region" ~ "Busoga Region",
region == "Busoga Region, Central Region, Teso Region" ~ "Busoga Region",
region == "Central Region" ~ "Central Region",
region == "Karamoja Region" ~ "Karamoja Region",
region == "Teso Region" ~ "Teso Region",
region == "Teso Region, Central Region, Busoga Region" ~ "Central Region",
region == "Teso Region, Karamoja Region" ~ "Teso Region",
TRUE ~ region
))
samples_joined_regions |>count(region)
samples_joined_regions |> count(district)
samples_joined_regions|> count(survey)
samples_joined_district <-samples_joined_regions |>
mutate(district = case_when(
region == "Teso Region" ~ "Kumi",
region == "Busoga Region" ~ "Kamuli",
region == "Central Region" ~ "Nakaseke",
region == "Karamoja Region" ~ "Kabong",
TRUE ~ district ))
samples_joined_district|> count(district)
test_data <- samples_data |>
filter(date_column == "2025-03-18",
storage_num == 5 | tp_num == 5 | bh_num == 5)
rm(test_data)
rm(samples_joined_district)
rm(samples_joined_regions)#Processed data
write_csv(samples_joined,
here::here("data/processed/processed_sample_data.csv"))E. coli results were categorised into five contamination levels based on colony-forming units (CFUs). Samples with 0 CFUs were classified as safe; those with 1–10 CFUs as slightly contaminated; 11–50 CFUs as moderately contaminated; 51–100 CFUs as highly contaminated; and samples with more than 100 CFUs as unsafe. This analysis was conducted for all samples with complete data, including samples from boreholes, jerrycans, and household drinking water storage containers.
#Analysis
#Categorising contamination levels
samples_joined_categories <- samples_joined %>%
mutate(contamination_category = case_when(
E.coli_CFUs == 0 ~ "Safe",
E.coli_CFUs >= 1 & E.coli_CFUs <= 10 ~ "Slightly Contaminated",
E.coli_CFUs >= 11 & E.coli_CFUs <= 50 ~ "Moderately Contaminated",
E.coli_CFUs >= 51 & E.coli_CFUs <= 100 ~ "Highly Contaminated",
E.coli_CFUs > 100 ~ "Unsafe",
TRUE ~ NA_character_
))
levels_categories <- c("Safe", "Slightly Contaminated", "Moderately Contaminated", "Highly Contaminated", "Unsafe")
samples_joined_categories_lvl <- samples_joined_categories |>
mutate(contamination_category = factor(contamination_category, levels = levels_categories ))
samples_joined_categories_lvl |>
count(survey)
survey_counts <- samples_joined_categories_lvl %>%
group_by(survey, region) %>%
summarise(count = n(), .groups = "drop")
rm (samples_joined_categories_surveytype)While contamination levels fluctuated between monitoring months, the majority of boreholes consistently fell within the safe and slightly contaminated categories, indicating generally acceptable water quality. A study by Lapworth et al. (2017) shows that boreholes generally have better microbiological water quality than shallower water sources, since deeper groundwater is less vulnerable to surface contamination than shallow sources. Figure 1 shows water quality results for boreholes tested.
#visualisation
#Boreholes
boreholes_samples <- samples_joined_categories_lvl |>
filter(survey == "Borehole")
ggplot(data = boreholes_samples,
mapping = aes(x = contamination_category,
fill = contamination_category)) +
geom_bar(position = "dodge") +
facet_wrap("district") +
labs(title = "E. Coli Contamination Levels in Boreholes by Region",
x = "Contamination Category",
y = "Number of Boreholes") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))+
scale_fill_manual(values = c("Safe" = "#4CAF50",
"Slightly Contaminated" = "#2196F3",
"Moderately Contaminated" = "#FFC107",
"Highly Contaminated" = "#FF9800",
"Unsafe" = "#F44336"))
#Containers
containers_samples <- samples_joined_categories_lvl |>
filter(survey == "Household") |> mutate(sample_name = case_when(
sample_name == "BUCKET" ~ "Bucket",
sample_name == "Backet" ~ "Bucket",
sample_name == "Big Bucket" ~ "Bucket",
sample_name == "Big bucket" ~ "Bucket",
sample_name == "Cylinder" ~ "Bucket",
sample_name == "Cylinder bucket" ~ "Bucket",
sample_name == "PLASTIC DRUM" ~ "Bucket",
sample_name == "Plastic drum" ~ "Drum",
sample_name == "DRUM" ~ "Drum",
sample_name == "Drum" ~ "Drum",
sample_name == "POT" ~ "Pot",
sample_name == "Pot" ~ "Pot",
sample_name == "Sauce pan" ~ "Sauce pan",
sample_name == "Jerry can" ~ "Jerry can",
TRUE ~ sample_name )) |>
mutate(sample_type = case_when(
sample_type == "tp_container" ~ "Transport container",
sample_type == "storage_container" ~ "Storage container"
))
Figure 2 shows E.coli testing results for samples collected from household drinking water storage containers and transportation containers from water sources.
ggplot(data = containers_samples,
mapping = aes(x = sample_name,
fill = contamination_category)) +
geom_bar(position = "dodge") +
facet_wrap("sample_type",labeller = label_wrap_gen(width = 10),scales = "free_x") +
labs(title = "E. Coli Contamination Levels in Transportation and Storage Containers",
x = "Container Type",
y = "Number of Containers") +
theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 6),
axis.title.x = element_text(size = 10),
axis.title.y = element_text(size = 12),
plot.title = element_text(size = 12),
legend.title = element_text(size = 12),
legend.text = element_text(size = 10)) + scale_fill_manual(values = c("Safe" = "#4CAF50",
"Slightly Contaminated" = "#2196F3",
"Moderately Contaminated" = "#FFC107",
"Highly Contaminated" = "#FF9800",
"Unsafe" = "#F44336"))
Table 1 summarises the total numbers of different types of samples that were collected under each contamination category.
category_table <- samples_joined_categories_lvl |>
group_by(contamination_category, sample_type) |>
summarise(count = n(), .groups = "drop") |>
pivot_wider(names_from = sample_type, values_from = count, values_fill = 0) |>
rename(Borehole = bh,
`Storage container` = storage_container,
`Transport container` = tp_container,
`Contamiation Category` = contamination_category)
category_table |>
gt() |>
tab_header(title = "Counts of Boreholes and Container Types by Contamination Category",
subtitle = "Data from Four districts in Uganda") | Counts of Boreholes and Container Types by Contamination Category | ||||
| Data from Four districts in Uganda | ||||
| Contamiation Category | Borehole | Storage container | Transport container | NA |
|---|---|---|---|---|
| Safe | 112 | 188 | 183 | 106 |
| Slightly Contaminated | 116 | 144 | 97 | 50 |
| Moderately Contaminated | 61 | 114 | 128 | 52 |
| Highly Contaminated | 26 | 62 | 51 | 15 |
| Unsafe | 91 | 465 | 510 | 100 |
Table 2 summarises the most popular type of containers used for both water collection and storage of drinking water in the sampled communities.
container_table <- containers_samples |>
filter(survey == "Household") |>
group_by(district, sample_name) |>
summarise(count = n(), .groups = "drop") |>
group_by(district) |>
mutate(percent = round(count / sum(count) * 100, 1)) |>
select(-count) |>
pivot_wider(
names_from = sample_name,
values_from = percent,
values_fill = list(percent = 0)
) |>
rename(District = district)|>
ungroup()
names(container_table) <- ifelse(
names(container_table) == "district",
"district",
paste0(names(container_table), " (%)")
)
container_table |>
gt() |>
tab_header(title = "Container types percentage for transport and storage of drinking water",
subtitle = "Data from Four districts in Uganda") | Container types percentage for transport and storage of drinking water | |||||||
| Data from Four districts in Uganda | |||||||
| District (%) | Bucket (%) | Drum (%) | Jerry can (%) | Other (%) | Pot (%) | Sauce pan (%) | Small drum (%) |
|---|---|---|---|---|---|---|---|
| Kabong | 13.9 | 8.7 | 52.7 | 1.5 | 22.1 | 1.1 | 0.0 |
| Kamuli | 0.7 | 0.0 | 50.0 | 0.0 | 49.3 | 0.0 | 0.0 |
| Kumi | 0.0 | 0.0 | 49.8 | 0.0 | 50.2 | 0.0 | 0.0 |
| Nakaseke | 1.9 | 1.0 | 50.4 | 0.0 | 42.8 | 3.7 | 0.2 |
Borehole water quality monitoring across the different months revealed that Borehole quality is highly variable and this significantly affects the quality of water received at household levels.
Although drinking water may be safe at the source, contamination can occur during transportation, preventing safe delivery to users(Gärtner et al. 2021).
Poor handling practices and unhygienic transport containers increase the risk of water quality deterioration before storage.
Strengthening water safety measures during transportation and promoting household water treatment methods such as chlorination and boiling can significantly improve the quality of drinking water at the point of use.
Thorough cleaning of drinking water containers with safe cleaning materials to ensure there is no recontamination during drinking water storage (Meierhofer, Wietlisbach, and Matiko 2019).
Regular cleaning of transport containers with disinfectants to prevent contamination of safe water during transport.
According to Kelly et al. (2021), conducting regular sanitary inspections and proper protection of boreholes through sanitary infrastructure such as intact concrete aprons and effective drainage can reduce pathways for contaminants entering the water source, since cracked aprons and poor drainage are statistically linked to increased microbial contamination.
@online{nansiimbi2026,
author = {Nansiimbi, Cecilia},
title = {Evaluation of {Drinking} {Water} {Quality} from {Borehole}
{Sources} to {Household} {Storage}},
date = {2026-01-23},
langid = {en},
abstract = {This study assessed the microbiological quality of
drinking water from borehole sources to household use in Uganda
districts, Kumi, Kaabong, Nakaseke and Luuka. Monthly data
collection involved sampling water at the source, transportation
containers, and storage containers, with E. coli analysed using the
membrane filtration method. Borehole results revealed fluctuations
in E. coli levels over monthly monitoring, with the majority of
samples falling within the safe and slightly contaminated
categories. However, high numbers of unsafe water samples were
observed during transportation and storage, indicating that water
considered safe at the source is at risk of contamination before
reaching the point of use. Strengthening water safety measures
including regular cleaning and disinfection of containers, household
water treatment (e.g., chlorination or boiling), and proper borehole
protection through intact aprons and effective drainage can improve
water quality at the point of use.}
}