Code
library(ggplot2)
library(dplyr)
library(readr)
library(ggthemes)
library(tidyverse)
library(gt)
library(FSA)
library(dunn.test)
library(here)This project analyzes four years of fecal sludge effluent quality data from 183 fecal sludge treatment plants (FSTPs) operating in the Rohingya refugee camps. These facilities use six different decentralized treatment technologies, with effluent samples collected across eight monitoring rounds.
The primary objective of this study is to evaluate the performance of these treatment technologies over time by analyzing key effluent quality parameters—BOD, COD, and E. coli. The analysis assesses the extent to which the systems comply with national discharge standards and provides a partial understanding of camp-level microbial risks in this densely populated camp setting.
Note: For the Capstone Project, the performance of the FSTPs is assessed based solely on effluent quality. Removal efficiency calculations will be conducted later once influent quality data becomes available. Due to the current lack of influent data, the project goal has been slightly revised to focus primarily on effluent quality.
Raw effluent monitoring data were imported from CSV files and restricted to samples with non‑missing BOD, COD, and E. coli, resulting in 1,163 observations for analysis. Data were summarised by camp and technology to quantify the number of plants and monitoring coverage, and plant‑level geometric means were calculated from log₁₀‑transformed BOD, COD, and E. coli values. Because all three parameters were strongly right‑skewed and non‑normal, subsequent analyses used log‑transformed values, geometric means, and non‑parametric tests (Kruskal–Wallis with Dunn post‑hoc) for comparisons between technologies.
library(ggplot2)
library(dplyr)
library(readr)
library(ggthemes)
library(tidyverse)
library(gt)
library(FSA)
library(dunn.test)
library(here)effluent <- read_csv(here::here("data/raw/fstp_effluent_quality.csv"))
influent <- read_csv(here::here("data/raw/fstp_influent_quality_avg.csv")) #not considered for capstone project
design_objectives <- read_csv(here::here("data/raw/design_objectives.csv")) #not considered for capstone project
national_standard <- read_csv(here::here("data/raw/national_standard.csv"))effluent_clean <- effluent |>
filter(!is.na(bod) & !is.na(cod) & !is.na(ecoli))head(effluent_clean)# A tibble: 6 × 9
sample_id camp_id fstp_id fstp_type_short fstp_type_long bod cod ecoli
<chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 DPHE_FSL_S_… Camp 04 FSTP_0… LSP Lime Stabiliz… 22 47.7 1800
2 DPHE_FSL_S_… Camp 0… FSTP_0… LSP Lime Stabiliz… 138 458 110000
3 DPHE_FSL_S_… Camp 0… FSTP_0… ABR Anaerobic Baf… 195 472 220000
4 DPHE_FSL_S_… Camp 18 FSTP_1… SSU Solid Separat… 148 412 300
5 DPHE_FSL_S_… Camp 12 FSTP_1… DEWATS Decentralised… 311 688 2100
6 DPHE_FSL_S_… Camp 12 FSTP_1… DEWATS Decentralised… 292 750 300
# ℹ 1 more variable: data_round <chr>
summary(effluent_clean) sample_id camp_id fstp_id fstp_type_short
Length:1163 Length:1163 Length:1163 Length:1163
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
fstp_type_long bod cod ecoli
Length:1163 Min. : 3.0 Min. : 17.4 Min. : 0
Class :character 1st Qu.: 112.0 1st Qu.: 296.0 1st Qu.: 1000
Mode :character Median : 191.0 Median : 502.0 Median : 56000
Mean : 237.9 Mean : 655.7 Mean : 775319
3rd Qu.: 326.0 3rd Qu.: 889.0 3rd Qu.: 250000
Max. :4554.0 Max. :17438.0 Max. :395000000
data_round
Length:1163
Class :character
Mode :character
write_csv(
effluent_clean,
here::here("data/processed/effluent-processed.csv")
)camp_tech_counts <- effluent_clean |>
distinct(camp_id, fstp_id, fstp_type_short) |>
count(camp_id, fstp_type_short, name = "n_plants") |>
arrange(camp_id, fstp_type_short)camp_totals <- camp_tech_counts |>
group_by(camp_id) |>
summarise(total_plants = sum(n_plants), .groups = "drop")
#| label: setup-palettes
cbp1 <- c(
"#999999", "#E69F00", "#56B4E9", "#009E73",
"#F0E442", "#0072B2", "#D55E00", "#CC79A7"
)
ggplot(camp_tech_counts,
aes(x = camp_id, y = n_plants, fill = fstp_type_short)) +
geom_col(position = "stack") +
scale_fill_manual(values = cbp1) +
geom_text(data = camp_totals,
aes(x = camp_id, y = total_plants, label = total_plants),
vjust = -0.3,
inherit.aes = FALSE) +
labs(
title = "Number of FSTPs by camp and technology",
x = "Camp",
y = "Number of plants",
fill = "Technology"
) +
theme_bw() +
theme(
axis.text.x = element_text(
angle = 45,
hjust = 1,
vjust = 1
)
)
Figure 1 shows the number of FSTPs by camp and technology. Camps vary widely in both the number and type of FSTPs: some camps have only one or two plants of a single technology, whereas others have more than 10 plants and a mixture of several technologies. Important note: The numbers of plants shown for each camp do not represent all FSTPs present in the camps. This study includes only plants with monitoring data spanning more than one year, or with at least four sampling rounds. Because reliable information on the establishment or start‑up dates of the plants was not available.
summary_table <- effluent |>
group_by(fstp_type_short) |>
summarise(
Number_of_Plants = n_distinct(fstp_id),
Total_Possible_Samples = Number_of_Plants * 8 * 3,
Actual_Samples_Collected = sum(bod >= 0, na.rm = TRUE) +
sum(cod >= 0, na.rm = TRUE) +
sum(ecoli >= 0, na.rm = TRUE),
Missing_Samples = sum(is.na(bod) + is.na(cod) + is.na(ecoli)),
Coverage_Percent = (Actual_Samples_Collected / Total_Possible_Samples) * 100,
.groups = "drop"
) |>
gt() |>
cols_label(
fstp_type_short = "Technology",
Number_of_Plants = "Number of Plants",
Total_Possible_Samples = "Total Possible Samples",
Actual_Samples_Collected = "Actual Samples Collected",
Missing_Samples = "Missing Samples",
Coverage_Percent = "Coverage (%)"
) |>
fmt_number(
columns = c(Number_of_Plants, Total_Possible_Samples,
Actual_Samples_Collected, Missing_Samples),
decimals = 0
) |>
fmt_number(
columns = c(Coverage_Percent),
decimals = 1
)
summary_table| Technology | Number of Plants | Total Possible Samples | Actual Samples Collected | Missing Samples | Coverage (%) |
|---|---|---|---|---|---|
| ABR | 45 | 1,080 | 924 | 156 | 85.6 |
| DEWATS | 28 | 672 | 510 | 162 | 75.9 |
| LSP | 21 | 504 | 318 | 186 | 63.1 |
| SSU | 31 | 744 | 549 | 195 | 73.8 |
| UFF | 44 | 1,056 | 894 | 162 | 84.7 |
| WSP | 14 | 336 | 294 | 42 | 87.5 |
Overall sample monitoring coverage was fairly high but not complete for any technology (Table 1), ranging from about 63% for LSP to almost 88% for WSP. LSP had 21 plants with the lowest coverage (63%), and SSU had 31 plants with coverage of about 74%, indicating more missing sampling rounds for these technologies than for others. In contrast, ABR and UFF had the largest numbers of plants (45 and 44, respectively) with reasonably good coverage of around 85%, while WSP had only 14 plants but with the highest coverage (about 88%).
round_periods <- tibble::tibble(
round = factor(paste0("R", 1:8), levels = paste0("R", 1:8)),
year = factor(c(2022, 2022, 2022,
2023, 2023, 2023,
2024, 2025)),
period = c(
"Jan–Apr", "May–Aug", "Sep–Dec",
"Jan–Apr", "May–Aug", "Sep–Dec",
"Jan–May", "Jan–Apr"
)
)ggplot(round_periods,
aes(x = year, y = round, fill = period)) +
geom_tile(color = "white", linewidth = 0.7) +
geom_text(aes(label = period), size = 3) +
scale_fill_brewer(palette = "Blues") +
labs(
title = "Sampling monitoring rounds and calendar periods",
x = "Year",
y = "Monitoring round",
fill = "Calendar period"
) +
theme_bw()
rounds_summary <- effluent |>
group_by(fstp_type_short, fstp_id) |>
summarise(
Total_Rounds = sum(!is.na(bod) & !is.na(cod) & !is.na(ecoli)),
Ecoli_Rounds = sum(!is.na(ecoli)),
Cod_Rounds = sum(!is.na(cod)),
Bod_Rounds = sum(!is.na(bod)),
.groups = "drop"
) |>
group_by(fstp_type_short) |>
summarise(
Plants_with_8_Rounds = sum(Total_Rounds == 8),
Plants_with_7_Rounds = sum(Total_Rounds == 7),
Plants_with_6_Rounds = sum(Total_Rounds == 6),
Plants_with_5_Rounds = sum(Total_Rounds == 5),
Plants_with_4_Rounds = sum(Total_Rounds == 4),
Plants_with_3_Rounds = sum(Total_Rounds == 3),
Mean_Rounds_Per_Plant = mean(Total_Rounds),
.groups = "drop"
) |>
gt()
rounds_summary| fstp_type_short | Plants_with_8_Rounds | Plants_with_7_Rounds | Plants_with_6_Rounds | Plants_with_5_Rounds | Plants_with_4_Rounds | Plants_with_3_Rounds | Mean_Rounds_Per_Plant |
|---|---|---|---|---|---|---|---|
| ABR | 20 | 9 | 9 | 3 | 4 | 0 | 6.844444 |
| DEWATS | 8 | 6 | 2 | 4 | 8 | 0 | 6.071429 |
| LSP | 1 | 3 | 4 | 1 | 12 | 0 | 5.047619 |
| SSU | 9 | 4 | 4 | 3 | 11 | 0 | 5.903226 |
| UFF | 12 | 21 | 5 | 1 | 5 | 0 | 6.772727 |
| WSP | 6 | 5 | 1 | 1 | 1 | 0 | 7.000000 |
Coverage by rounds was uneven across technologies (Table 2). WSP had the highest average monitoring intensity (mean 7.0 rounds per plant), and most WSP plants had data available for 7–8 sampling rounds. ABR and UFF also had relatively strong coverage, each with a mean of 6.8 rounds per plant, and many plants monitored in 7 or 8 rounds. In contrast, LSP and SSU had the lowest average numbers of available rounds (means of 5.0 and 5.9 per plant, respectively), with many plants monitored in only 4–6 rounds. DEWATS was intermediate, with a mean of 6.1 rounds per plant and a mix of plants monitored in 4–8 rounds. These patterns indicate that time‑series information is richest for WSP, ABR, and UFF, and somewhat thinner for LSP and SSU.
# considering original effluent values
effluent_long <- effluent_clean |>
pivot_longer(
cols = c(bod, cod, ecoli),
names_to = "parameter",
values_to = "value"
)
summary_statistics <- effluent_long |>
group_by(fstp_type_short, parameter) |>
summarise(
n = sum(!is.na(value)),
Mean = mean(value, na.rm = TRUE),
SD = sd(value, na.rm = TRUE),
Median = median(value, na.rm = TRUE),
Q1 = quantile(value, 0.25, na.rm = TRUE),
Q3 = quantile(value, 0.75, na.rm = TRUE),
Min = min(value, na.rm = TRUE),
Max = max(value, na.rm = TRUE),
.groups = "drop"
) |>
mutate(
IQR_text = paste0(round(Q1, 2), " – ", round(Q3, 2))
)
gt_summary_table <- summary_statistics |>
gt() |>
tab_header(
title = "Summary Statistics of Effluent by Technology",
subtitle = "Original-data BOD, COD, and E. coli"
) |>
cols_label(
fstp_type_short = "Technology",
parameter = "Parameter",
n = "n",
Mean = "Mean",
SD = "SD",
Median = "Median",
IQR_text = "IQR (Q1–Q3)",
Min = "Min",
Max = "Max"
) |>
fmt_number(
columns = c(Mean, SD, Median, Min, Max),
decimals = 2
)
gt_summary_table| Summary Statistics of Effluent by Technology | ||||||||||
| Original-data BOD, COD, and E. coli | ||||||||||
| Technology | Parameter | n | Mean | SD | Median | Q1 | Q3 | Min | Max | IQR (Q1–Q3) |
|---|---|---|---|---|---|---|---|---|---|---|
| ABR | bod | 308 | 218.31 | 160.44 | 179.00 | 111.75 | 282.00 | 9.00 | 1,305.00 | 111.75 – 282 |
| ABR | cod | 308 | 600.15 | 417.06 | 485.50 | 289.75 | 801.50 | 40.25 | 2,223.00 | 289.75 – 801.5 |
| ABR | ecoli | 308 | 284,666.56 | 737,700.77 | 56,500.00 | 1000.00 | 272500.00 | 0.00 | 8,300,000.00 | 1000 – 272500 |
| DEWATS | bod | 170 | 224.71 | 135.61 | 202.50 | 116.00 | 311.00 | 15.00 | 753.00 | 116 – 311 |
| DEWATS | cod | 170 | 627.34 | 378.38 | 581.50 | 320.25 | 835.25 | 34.40 | 1,790.00 | 320.25 – 835.25 |
| DEWATS | ecoli | 170 | 151,124.12 | 1,087,772.73 | 2,050.00 | 0.00 | 30000.00 | 0.00 | 13,800,000.00 | 0 – 30000 |
| LSP | bod | 106 | 310.82 | 234.41 | 269.00 | 140.25 | 420.25 | 11.00 | 1,396.00 | 140.25 – 420.25 |
| LSP | cod | 106 | 753.62 | 584.20 | 631.50 | 366.25 | 989.75 | 47.70 | 4,320.00 | 366.25 – 989.75 |
| LSP | ecoli | 106 | 243,599.06 | 385,198.44 | 85,500.00 | 10000.00 | 257500.00 | 0.00 | 2,150,000.00 | 10000 – 257500 |
| SSU | bod | 183 | 251.52 | 160.62 | 206.00 | 128.00 | 359.00 | 3.00 | 998.00 | 128 – 359 |
| SSU | cod | 183 | 727.31 | 506.52 | 553.00 | 339.00 | 1041.50 | 17.40 | 2,730.00 | 339 – 1041.5 |
| SSU | ecoli | 183 | 559,797.81 | 1,457,019.95 | 80,000.00 | 10000.00 | 330000.00 | 0.00 | 11,500,000.00 | 10000 – 330000 |
| UFF | bod | 298 | 255.26 | 304.07 | 192.00 | 108.25 | 360.75 | 13.00 | 4,554.00 | 108.25 – 360.75 |
| UFF | cod | 298 | 724.22 | 1,083.19 | 505.60 | 296.25 | 1063.00 | 42.80 | 17,438.00 | 296.25 – 1063 |
| UFF | ecoli | 298 | 644,952.55 | 3,913,022.90 | 100,000.00 | 10000.00 | 340000.00 | 0.00 | 60,800,000.00 | 10000 – 340000 |
| WSP | bod | 98 | 165.29 | 176.49 | 119.00 | 63.25 | 210.25 | 15.00 | 1,207.00 | 63.25 – 210.25 |
| WSP | cod | 98 | 431.67 | 371.40 | 306.75 | 176.00 | 524.75 | 58.00 | 1,927.00 | 176 – 524.75 |
| WSP | ecoli | 98 | 4,774,150.31 | 40,070,278.72 | 60,000.00 | 6275.00 | 205000.00 | 0.00 | 395,000,000.00 | 6275 – 205000 |
# Simple min/max for each parameter
min(effluent_clean$bod, na.rm = TRUE)[1] 3
max(effluent_clean$bod, na.rm = TRUE)[1] 4554
min(effluent_clean$cod, na.rm = TRUE)[1] 17.4
max(effluent_clean$cod, na.rm = TRUE)[1] 17438
min(effluent_clean$ecoli, na.rm = TRUE)[1] 0
max(effluent_clean$ecoli, na.rm = TRUE)[1] 3.95e+08
effluent_long <- effluent_clean |>
pivot_longer(
cols = c(bod, cod, ecoli),
names_to = "parameter",
values_to = "value"
)
ggplot(effluent_long, aes(x = value)) +
geom_histogram(bins = 30, color = "black", fill = "steelblue") +
facet_wrap(~ parameter, scales = "free_x") +
labs(
title = "Distribution of Effluent Parameters",
x = "Value",
y = "Count"
)
Because the distributions are heavily right-skewed and dominated by a few extreme high values, the data clearly deviate from normality. This supports the use of log‑transformed values, geometric means, and non‑parametric statistical tests in the subsequent analyses, rather than assuming normally distributed residuals or relying on arithmetic mean.
shapiro.test(effluent_clean$bod)
Shapiro-Wilk normality test
data: effluent_clean$bod
W = 0.65982, p-value < 2.2e-16
shapiro.test(effluent_clean$cod)
Shapiro-Wilk normality test
data: effluent_clean$cod
W = 0.51361, p-value < 2.2e-16
shapiro.test(effluent_clean$ecoli)
Shapiro-Wilk normality test
data: effluent_clean$ecoli
W = 0.029727, p-value < 2.2e-16
Shapiro–Wilk tests provided strong evidence against normality for all three parameters (BOD: W = 0.66, COD: W = 0.51, E. coli: W = 0.03; p < 0.001 for each).
log_conversion_table <- effluent_clean |>
mutate(
log_bod = log10(bod),
log_cod = log10(cod),
log_ecoli = log10(ecoli + 1)
)# Long format for log10 values
effluent_long_log <- log_conversion_table |>
pivot_longer(
cols = c(log_bod, log_cod, log_ecoli),
names_to = "parameter",
values_to = "value"
)
ggplot(effluent_long_log, aes(x = value)) +
geom_histogram(bins = 30, color = "black", fill = "steelblue") +
facet_wrap(~ parameter, scales = "free_x") +
labs(
title = "Distribution of log10-transformed effluent values",
x = "log10(value)",
y = "Count"
)
After applying a base‑10 logarithmic transformation (log10) to BOD, COD, and E. coli, the histograms of the transformed values (Figure Y) were much more symmetric and bell‑shaped compared with the raw data. For BOD and COD, the log‑transformed distributions are centred around log10 values of roughly 2–3, with noticeably reduced skewness and more balanced tails. For E. coli, most non‑zero observations fall between log10 values of about 4 and 6, with a separate bar at 0 representing samples below 1 CFU/100 mL after adding 1 prior to transformation.
These patterns suggest that the log10‑transformed data better satisfy the assumptions of many statistical methods and provide more stable measures of central tendency. Consequently, subsequent analyses use log‑transformed values and geometric means, rather than the raw effluent measurements, when comparing technologies and rounds.
shapiro.test(log_conversion_table$log_bod)
Shapiro-Wilk normality test
data: log_conversion_table$log_bod
W = 0.9677, p-value = 2.038e-15
shapiro.test(log_conversion_table$log_cod)
Shapiro-Wilk normality test
data: log_conversion_table$log_cod
W = 0.98132, p-value = 4.359e-11
shapiro.test(log_conversion_table$log_ecoli)
Shapiro-Wilk normality test
data: log_conversion_table$log_ecoli
W = 0.81184, p-value < 2.2e-16
Effluent BOD, COD, and E. coli were strongly right‑skewed in the original scale, and although log10 transformation improved symmetry, all three parameters remained statistically non‑normal (Shapiro–Wilk p < 0.001). Therefore, non-parametric tests (Kruskal–Wallis with Dunn post-hoc) were used for inter-technology comparisons.
log_means_table <- log_conversion_table |>
group_by(fstp_id, fstp_type_short, fstp_type_long) |>
summarise(
log_mean_bod = mean(log_bod, na.rm = TRUE),
log_mean_cod = mean(log_cod, na.rm = TRUE),
log_mean_ecoli = mean(log_ecoli, na.rm = TRUE),
.groups = 'drop'
)
geom_means_table <- log_means_table |>
mutate(
geom_mean_bod = 10^log_mean_bod,
geom_mean_cod = 10^log_mean_cod,
geom_mean_ecoli = 10^log_mean_ecoli
)geom_means_long <- geom_means_table |>
pivot_longer(
cols = c(geom_mean_bod, geom_mean_cod, geom_mean_ecoli),
names_to = "parameter",
values_to = "geom_mean"
)geom_means_long2 <- geom_means_long |>
mutate(parameter_label = case_when(
parameter == "geom_mean_bod" ~ "A) BOD",
parameter == "geom_mean_cod" ~ "B) COD",
parameter == "geom_mean_ecoli" ~ "C) E. coli",
TRUE ~ parameter
))
ggplot(geom_means_long2,
aes(x = fstp_type_short, y = geom_mean)) +
geom_boxplot(outlier.shape = NA) +
geom_jitter(width = 0.2, alpha = 0.5) +
facet_wrap(~ parameter_label, scales = "free_y") +
labs(
title = "Plant-level geometric mean effluent by technology",
x = "Technology",
y = "Geometric mean (original units)"
)
Figure 3 (A) shows that across technologies most plants have geometric mean BOD between roughly 100 and 300 mg/L, but the central tendency and spread differ. LSP and UFF show the highest medians and wide interquartile ranges, indicating higher and more variable BOD in their effluents. In contrast, WSP has the lowest median BOD and a relatively narrow IQR, showing better and more consistent organic matter removal than the other technologies.
Figure 3 (B) shows a similar pattern. SSU and UFF have higher median geometric mean COD and broader IQR, indicating poorer and more variable performance. ABR and DEWATS lie in the middle, while WSP again has the lowest median COD with a tighter distribution, reinforcing its relatively strong performance for organic parameters.
For E. coli Figure 3 (C), ABR and WSP have similar, relatively low medians, and their IQRs are lower than those of LSP, SSU, and UFF. DEWATS also has the lowest median compared to other technologies. LSP and SSU show largea geometric mean E. coli, indicating weaker and more variable microbial reduction.
Overall, these plots indicate that WSP performs best for BOD and COD, DEWATS and WSP are relatively favourable for E. coli. LSP, SSU and UFF are generally the least effective and most variable across all three indicators.
kruskal.test(geom_mean_bod ~ fstp_type_short,
data = geom_means_table)
Kruskal-Wallis rank sum test
data: geom_mean_bod by fstp_type_short
Kruskal-Wallis chi-squared = 17.301, df = 5, p-value = 0.003963
kruskal.test(geom_mean_cod ~ fstp_type_short,
data = geom_means_table)
Kruskal-Wallis rank sum test
data: geom_mean_cod by fstp_type_short
Kruskal-Wallis chi-squared = 17.782, df = 5, p-value = 0.003233
kruskal.test(geom_mean_ecoli ~ fstp_type_short,
data = geom_means_table)
Kruskal-Wallis rank sum test
data: geom_mean_ecoli by fstp_type_short
Kruskal-Wallis chi-squared = 42.67, df = 5, p-value = 4.31e-08
Non‑parametric Kruskal–Wallis tests indicated statistically significant differences in plant‑level geometric mean effluent across technologies for BOD, COD, and E. coli (BOD: χ²(5) = 17.3, p = 0.004; COD: χ²(5) = 17.8, p = 0.003; E. coli: χ²(5) = 42.7, p < 0.001). These results confirm that at least one technology differs from the others in its typical effluent quality for each parameter. Pairwise Dunn post‑hoc tests were therefore used to identify which technologies differed significantly from each other.
dunn_bod <- dunnTest(
geom_mean_bod ~ fstp_type_short,
data = geom_means_table,
method = "bh" # Benjamini-Hochberg for multiple comparisons
)
dunn_bod Comparison Z P.unadj P.adj
1 ABR - DEWATS -0.7312004 0.4646567431 0.536142396
2 ABR - LSP -2.4237398 0.0153616076 0.046084823
3 DEWATS - LSP -1.6091924 0.1075742717 0.230516296
4 ABR - SSU -1.5809986 0.1138783634 0.213521931
5 DEWATS - SSU -0.7403527 0.4590860162 0.573857520
6 LSP - SSU 0.9606806 0.3367127849 0.459153798
7 ABR - UFF -1.3586064 0.1742713332 0.290452222
8 DEWATS - UFF -0.4634691 0.6430281871 0.688958772
9 LSP - UFF 1.3290096 0.1838447956 0.275767193
10 SSU - UFF 0.3453412 0.7298378442 0.729837844
11 ABR - WSP 2.1413557 0.0322453658 0.080613414
12 DEWATS - WSP 2.5396835 0.0110952819 0.041607307
13 LSP - WSP 3.7557023 0.0001728561 0.002592841
14 SSU - WSP 3.1811014 0.0014671626 0.011003719
15 UFF - WSP 3.0743170 0.0021098511 0.010549256
For plant‑level geometric mean BOD, Dunn post‑hoc tests showed that WSP plants had significantly lower effluent BOD than DEWATS, LSP, SSU, and UFF (adjusted p < 0.05). The difference between ABR and LSP was borderline significant. Other pairs do not differ significantly.
dunn_cod <- dunnTest(
geom_mean_cod ~ fstp_type_short,
data = geom_means_table,
method = "bh"
)
dunn_cod Comparison Z P.unadj P.adj
1 ABR - DEWATS -0.6856366 0.4929423227 0.616177903
2 ABR - LSP -1.5828690 0.1134513271 0.243109987
3 DEWATS - LSP -0.8773875 0.3802762257 0.570414338
4 ABR - SSU -2.0196592 0.0434187550 0.108546888
5 DEWATS - SSU -1.1751378 0.2399395719 0.399899287
6 LSP - SSU -0.1878678 0.8509802868 0.850980287
7 ABR - UFF -1.3664313 0.1718036230 0.322131793
8 DEWATS - UFF -0.5156981 0.6060653042 0.699306120
9 LSP - UFF 0.4849079 0.6277416960 0.672580389
10 SSU - UFF 0.7749077 0.4383941789 0.597810244
11 ABR - WSP 2.4371642 0.0148029575 0.044408873
12 DEWATS - WSP 2.7827351 0.0053902789 0.020213546
13 LSP - WSP 3.3740094 0.0007408185 0.003704092
14 SSU - WSP 3.7802005 0.0001567021 0.002350532
15 UFF - WSP 3.3747376 0.0007388611 0.005541459
For COD, WSP had significantly lower plant‑level geometric mean effluent than all other technologies (ABR, DEWATS, LSP, SSU, UFF; adjusted p < 0.05), whereas no other technology pairs differed significantly.
dunn_ecoli <- dunnTest(
geom_mean_ecoli ~ fstp_type_short,
data = geom_means_table,
method = "bh"
)
dunn_ecoli Comparison Z P.unadj P.adj
1 ABR - DEWATS 4.1587607 3.199788e-05 1.199921e-04
2 ABR - LSP -1.8112285 7.010549e-02 1.752637e-01
3 DEWATS - LSP -5.1257489 2.963574e-07 2.222680e-06
4 ABR - SSU -1.3576098 1.745875e-01 3.273516e-01
5 DEWATS - SSU -5.0549196 4.305720e-07 2.152860e-06
6 LSP - SSU 0.5724273 5.670325e-01 8.505488e-01
7 ABR - UFF -1.7753939 7.583288e-02 1.624990e-01
8 DEWATS - UFF -5.6977791 1.213782e-08 1.820673e-07
9 LSP - UFF 0.3855384 6.998386e-01 8.747982e-01
10 SSU - UFF -0.2538562 7.996066e-01 9.226231e-01
11 ABR - WSP -1.0118784 3.115962e-01 5.193270e-01
12 DEWATS - WSP -4.0041726 6.223493e-05 1.867048e-04
13 LSP - WSP 0.4898176 6.242630e-01 8.512677e-01
14 SSU - WSP 0.0224240 9.821097e-01 9.821097e-01
15 UFF - WSP 0.2175253 8.277990e-01 8.869275e-01
For E. coli, DEWATS plants had significantly lower plant‑level geometric mean effluent E. coli than all other technologies (ABR, LSP, SSU, UFF, WSP; adjusted p < 0.001). No statistically significant differences were detected among the remaining technologies after multiple‑comparison adjustment.
tech_round_geom <- log_conversion_table |>
group_by(fstp_type_short, data_round) |>
summarise(
log_mean_bod = mean(log_bod, na.rm = TRUE),
log_mean_cod = mean(log_cod, na.rm = TRUE),
log_mean_ecoli = mean(log_ecoli, na.rm = TRUE),
.groups = "drop"
) |>
mutate(
geom_mean_bod = 10^log_mean_bod,
geom_mean_cod = 10^log_mean_cod,
geom_mean_ecoli = 10^log_mean_ecoli,
data_round = factor(data_round, levels = sort(unique(data_round)))
)ggplot(tech_round_geom,
aes(x = data_round,
y = geom_mean_bod,
color = fstp_type_short,
shape = fstp_type_short,
group = fstp_type_short)) +
geom_line() +
geom_point(size = 2) +
labs(
title = "Technology-level geometric mean BOD by round",
x = "Monitoring round",
y = "Geometric mean BOD (mg/L)",
color = "Technology",
shape = "Technology"
)
Figure 4 WSP consistently had the lowest BOD, starting around 125 mg/L in R and remaining below about 150 mg/L from R3 - R8, with only a modest increase in the later rounds. ABR also showed relatively stable and moderate BOD (below 200 mg/L) from R1-R5 R8 and close to 220 mg/L in R6-R7, without the significant peaks seen for other technologies.
In contrast, LSP, SSU, UFF, and to a lesser extent DEWATS exhibited larger fluctuations and higher peaks. UFF, SSU and DEWATS reached BOD values near or above 300 mg/L in R7. These patterns suggest that WSP, and to some extent ABR, provided more consistently low BOD over the monitoring period, while LSP, SSU, UFF, and DEWATS showed more variable and occasionally much poorer organic matter removal.
ggplot(tech_round_geom,
aes(x = data_round,
y = geom_mean_cod,
color = fstp_type_short,
shape = fstp_type_short,
group = fstp_type_short)) +
geom_line() +
geom_point(size = 2) +
labs(
title = "Technology-level geometric mean COD by round",
x = "Monitoring round",
y = "Geometric mean COD (mg/L)",
color = "Technology",
shape = "Technology"
)
Figure 5 WSP consistently had the lowest COD values, starting around 350–400 mg/L and remaining below about 450 mg/L through rounds R1–R8. ABR and DEWATS were intermediate, with COD typically between ~350 and 600 mg/L, while UFF showed a similar pattern but with higher peaks in the later rounds.
LSP and SSU exhibited the highest and most variable COD, particularly towards the end of the monitoring period. LSP increased from around 400 mg/L in R4 to nearly 1,000 mg/L by R8, and SSU rose from about 350 mg/L in R4 to similarly high values in R7–R8. These trends mirror the BOD results, suggesting that WSP performs best and most consistently for organic matter removal, whereas LSP and SSU often have much poorer COD effluent quality, especially in later rounds.
ggplot(tech_round_geom,
aes(x = data_round,
y = geom_mean_ecoli,
color = fstp_type_short,
shape = fstp_type_short,
group = fstp_type_short)) +
geom_line() +
geom_point(size = 2) +
scale_y_log10() +
labs(
title = "Technology-level geometric mean E. coli by round",
x = "Monitoring round",
y = "Geometric mean E. coli (CFU/100 mL, log scale)",
color = "Technology",
shape = "Technology"
)
Figure 6 Geometric mean effluent E. coli declined over time for most technologies, particularly after round R5. DEWATS started with relatively low E. coli and showed the steepest decline, reaching the lowest geometric means by R6–R8. ABR, LSP, SSU, and UFF began at higher levels but all showed marked reductions from R5 onwards, converging to geometric means around or below 10²–10³ CFU/100 mL in the last rounds.
WSP displayed somewhat different behaviour: E. coli decreased between R1 and R7 but increased again in R8, indicating less consistent microbial removal late in the monitoring period. Overall, these trends suggest improved E. coli reduction performance over time for most technologies, with DEWATS performing best and WSP showing more variable behaviour despite good organic removal.
Still thinking how to represent this in better way
plant_camp <- effluent_clean |>
dplyr::distinct(fstp_id, camp_id)geom_means_with_camp <- geom_means_table |>
dplyr::left_join(plant_camp, by = "fstp_id")plants_flag <- geom_means_with_camp |>
dplyr::mutate(
high_risk_plant = geom_mean_ecoli > 1e4
)camp_risk_simple <- plants_flag |>
dplyr::group_by(camp_id) |>
dplyr::summarise(
n_plants = dplyr::n(),
n_high_risk = sum(high_risk_plant, na.rm = TRUE),
prop_high_risk = n_high_risk / n_plants,
.groups = "drop"
)camp_risk_simple <- camp_risk_simple |>
dplyr::mutate(
camp_id = reorder(camp_id, -prop_high_risk)
)
ggplot(camp_risk_simple,
aes(x = 1, y = camp_id, fill = prop_high_risk)) +
geom_tile(color = "grey70") +
geom_text(aes(label = paste0(n_high_risk, "/", n_plants)),
color = "black", size = 3) +
scale_fill_gradient(
low = "#FFE5E5",
high = "#8B0000"
) +
labs(
title = "Camp-wise share of high-E. coli plants (sorted by risk)",
x = NULL,
y = "Camp",
fill = "Proportion of plants\ngeom. mean E. coli > 10^4"
) +
theme_bw() +
theme(
axis.text.x = element_blank(),
axis.ticks.x = element_blank()
)
camp_tech_risk <- plants_flag |>
group_by(camp_id, fstp_type_short) |>
summarise(
n_plants = n(),
n_high_risk = sum(high_risk_plant, na.rm = TRUE),
prop_high_risk = n_high_risk / n_plants,
.groups = "drop"
)camp_tech_risk <- camp_tech_risk |>
group_by(fstp_type_short) |>
arrange(desc(prop_high_risk), .by_group = TRUE) |>
mutate(
camp_tech_id = factor(
paste(fstp_type_short, camp_id, sep = "__"),
levels = paste(fstp_type_short, camp_id, sep = "__")
)
) |>
ungroup()
ggplot(camp_tech_risk,
aes(x = 1, y = camp_tech_id, fill = prop_high_risk)) +
geom_tile(color = "grey70") +
geom_text(aes(label = paste0(n_high_risk, "/", n_plants)),
color = "black", size = 3) +
scale_y_discrete(
labels = function(x) sub("^[^_]+__", "", x) # strip "TECH__" so only camp name shows
) +
scale_fill_gradient(
low = "#FFE5E5",
high = "#8B0000"
) +
facet_wrap(~ fstp_type_short, scales = "free_y") +
labs(
title = "Camp-wise share of high-E. coli plants by technology",
x = NULL,
y = "Camp (sorted high to low risk within each technology)",
fill = "Proportion of plants\ngeom. mean E. coli > 10^4"
) +
theme_bw() +
theme(
axis.text.x = element_blank(),
axis.ticks.x = element_blank()
)
Sanky (used AI) need to change the load_band case_when for e coli load based on literature
library(networkD3)
## 0) CHECK INPUT -----------------------------------------------------------
# geom_means_with_camp should have:
# fstp_id, fstp_type_short, camp_id, geom_mean_ecoli
## 1) Plant-level E. coli risk bands ---------------------------------------
plants_risk_bands <- geom_means_with_camp |>
mutate(
ecoli_band = case_when(
geom_mean_ecoli <= 1e3 ~ "≤ 10^3 (compliant)",
geom_mean_ecoli <= 1e4 ~ "> 10^3–10^4",
geom_mean_ecoli <= 1e5 ~ "> 10^4–10^5",
TRUE ~ "> 10^5"
),
ecoli_band = factor(
ecoli_band,
levels = c("≤ 10^3 (compliant)",
"> 10^3–10^4",
"> 10^4–10^5",
"> 10^5")
)
)
## 2) Camp-level total E. coli load and load bands -------------------------
camp_load <- plants_risk_bands |>
group_by(camp_id) |>
summarise(
total_geom_ecoli = sum(geom_mean_ecoli, na.rm = TRUE),
.groups = "drop"
)
# Choose load bands based on your range (edit thresholds if needed)
camp_load <- camp_load |>
mutate(
load_band = case_when(
total_geom_ecoli <= 1e4 ~ "Low load (≤ 10^4)",
total_geom_ecoli <= 1e5 ~ "Moderate load (10^4–10^5)",
total_geom_ecoli <= 5e5 ~ "High load (10^5–5×10^5)",
TRUE ~ "Very high load (> 5×10^5)"
),
load_band = factor(
load_band,
levels = c("Low load (≤ 10^4)",
"Moderate load (10^4–10^5)",
"High load (10^5–5×10^5)",
"Very high load (> 5×10^5)")
)
)
## 3) Rank camps from low to high risk (for nicer ordering) ----------------
camp_risk_rank <- plants_risk_bands |>
mutate(high_band = ecoli_band == "> 10^5") |>
group_by(camp_id) |>
summarise(
n_plants = n(),
n_high_band = sum(high_band, na.rm = TRUE),
prop_high = n_high_band / n_plants,
.groups = "drop"
) |>
arrange(prop_high, n_high_band) # low → high risk
camp_nodes <- camp_risk_rank$camp_id # ordered camps
## 4) Build link tables for each layer -------------------------------------
# Technology -> plant risk band
tech_to_band <- plants_risk_bands |>
count(fstp_type_short, ecoli_band, name = "value")
# Risk band -> camp
band_to_camp <- plants_risk_bands |>
count(ecoli_band, camp_id, name = "value")
# Camp -> load band
camp_to_load <- camp_load |>
count(camp_id, load_band, name = "value")
## 5) Build node list (technologies, bands, camps, load bands) -------------
tech_nodes <- sort(unique(plants_risk_bands$fstp_type_short))
band_nodes <- levels(plants_risk_bands$ecoli_band)
load_nodes <- levels(camp_load$load_band)
nodes <- data.frame(
name = c(tech_nodes, band_nodes, camp_nodes, load_nodes),
stringsAsFactors = FALSE
)
# helper for index
node_id <- function(x) match(x, nodes$name) - 1
## 6) Create combined links -------------------------------------------------
links_tech_band <- data.frame(
source = node_id(tech_to_band$fstp_type_short),
target = node_id(as.character(tech_to_band$ecoli_band)),
value = tech_to_band$value
)
links_band_camp <- data.frame(
source = node_id(as.character(band_to_camp$ecoli_band)),
target = node_id(band_to_camp$camp_id),
value = band_to_camp$value
)
links_camp_load <- data.frame(
source = node_id(camp_to_load$camp_id),
target = node_id(as.character(camp_to_load$load_band)),
value = camp_to_load$value
)
links <- rbind(links_tech_band, links_band_camp, links_camp_load)
## 7) Node groups for colouring by level -----------------------------------
nodes$group <- c(
rep("Technology", length(tech_nodes)),
rep("Band", length(band_nodes)),
rep("Camp", length(camp_nodes)),
rep("Load", length(load_nodes))
)
## 8) Plot Sankey: Technology → risk band → camp → load band --------------
sankeyNetwork(
Links = links,
Nodes = nodes,
Source = "source",
Target = "target",
Value = "value",
NodeID = "name",
NodeGroup = "group",
fontSize = 11,
nodeWidth = 20
)compliance_long <- effluent_clean |>
mutate(
bod = as.numeric(bod),
cod = as.numeric(cod),
ecoli = as.numeric(ecoli)
) |>
# make long by parameter
tidyr::pivot_longer(
cols = c(bod, cod, ecoli),
names_to = "parameter",
values_to = "value"
) |>
mutate(
parameter = recode(parameter,
bod = "BOD",
cod = "COD",
ecoli = "E. coli"),
status = case_when(
parameter == "BOD" & value <= 30 ~ "Meets standard",
parameter == "COD" & value <= 125 ~ "Meets standard",
parameter == "E. coli" & value <= 1000 ~ "Meets standard",
TRUE ~ "Exceeds standard"
)
) |>
group_by(parameter, fstp_type_short, status) |>
summarise(n = n(), .groups = "drop") |>
group_by(parameter, fstp_type_short) |>
mutate(pct = 100 * n / sum(n)) |>
ungroup()
ggplot(compliance_long,
aes(x = fstp_type_short, y = pct, fill = status)) +
geom_col(position = "stack") +
geom_text(
aes(label = sprintf("%.1f%%", pct)),
position = position_stack(vjust = 0.5),
size = 3
) +
facet_wrap(~ parameter, nrow = 1) +
scale_y_continuous(labels = scales::percent_format(scale = 1)) +
scale_fill_manual(
values = c("Meets standard" = "#009E73",
"Exceeds standard" = "#D55E00")
) +
labs(
title = "Percentage of samples meeting discharge standards by technology and parameter",
x = "Technology",
y = "Samples (%)",
fill = "Status"
) +
theme_bw()
Decentralized fecal sludge treatment in the Rohingya camps substantially reduces pollutant concentrations but not showed high variability over the years across all technologies.
None of the assessed technologies consistently achieve microbiologically safe or standards‑compliant effluent, especially for BOD and COD.
WSPs provided the highest BOD and COD reduction but only moderate E. coli removal, whereas DEWATS achieved the highest microbial reduction.
A subset of camps contributes disproportionately high E. coli loads.
These findings highlight the need for technology‑specific upgrades, strengthened operation and monitoring, and risk‑based prioritization of high‑burden camps to better protect public health in protracted humanitarian crises.