EVALUATING THE PERFORMANCE OF FECAL SLUDGE TREATMENT TECHNOLOGIES IN ROHINGYA REFUGEE CAMPS

Author
Affiliation

Mejbah Uddin Chowdhury

University of Victoria

Published

December 11, 2025

Abstract
add here

Introduction:

Project description

This project analyzes four years of fecal sludge effluent quality data from 183 fecal sludge treatment plants (FSTPs) operating in the Rohingya refugee camps. These facilities use six different decentralized treatment technologies, with effluent samples collected across eight monitoring rounds.

The primary objective of this study is to evaluate the performance of these treatment technologies over time by analyzing key effluent quality parameters—BOD, COD, and E. coli. The analysis assesses the extent to which the systems comply with national discharge standards and provides a partial understanding of camp-level microbial risks in this densely populated camp setting.

Note: For the Capstone Project, the performance of the FSTPs is assessed based solely on effluent quality. Removal efficiency calculations will be conducted later once influent quality data becomes available. Due to the current lack of influent data, the project goal has been slightly revised to focus primarily on effluent quality.

Method:

Raw effluent monitoring data were imported from CSV files and restricted to samples with non‑missing BOD, COD, and E. coli, resulting in 1,163 observations for analysis. Data were summarised by camp and technology to quantify the number of plants and monitoring coverage, and plant‑level geometric means were calculated from log₁₀‑transformed BOD, COD, and E. coli values. Because all three parameters were strongly right‑skewed and non‑normal, subsequent analyses used log‑transformed values, geometric means, and non‑parametric tests (Kruskal–Wallis with Dunn post‑hoc) for comparisons between technologies.

Library

Code
library(ggplot2)
library(dplyr)
library(readr)
library(ggthemes)
library(tidyverse)
library(gt)
library(FSA)
library(dunn.test)
library(here)

(Wickham et al. 2019) (Dinno 2024) (Arnold 2024)

Import

Code
effluent <- read_csv(here::here("data/raw/fstp_effluent_quality.csv"))
influent <- read_csv(here::here("data/raw/fstp_influent_quality_avg.csv")) #not considered for capstone project
design_objectives <- read_csv(here::here("data/raw/design_objectives.csv")) #not considered for capstone project
national_standard <- read_csv(here::here("data/raw/national_standard.csv"))

Data Preprocessing

Remove NAs

Code
effluent_clean <- effluent |> 
  filter(!is.na(bod) & !is.na(cod) & !is.na(ecoli))
Code
head(effluent_clean)
# A tibble: 6 × 9
  sample_id    camp_id fstp_id fstp_type_short fstp_type_long   bod   cod  ecoli
  <chr>        <chr>   <chr>   <chr>           <chr>          <dbl> <dbl>  <dbl>
1 DPHE_FSL_S_… Camp 04 FSTP_0… LSP             Lime Stabiliz…    22  47.7   1800
2 DPHE_FSL_S_… Camp 0… FSTP_0… LSP             Lime Stabiliz…   138 458   110000
3 DPHE_FSL_S_… Camp 0… FSTP_0… ABR             Anaerobic Baf…   195 472   220000
4 DPHE_FSL_S_… Camp 18 FSTP_1… SSU             Solid Separat…   148 412      300
5 DPHE_FSL_S_… Camp 12 FSTP_1… DEWATS          Decentralised…   311 688     2100
6 DPHE_FSL_S_… Camp 12 FSTP_1… DEWATS          Decentralised…   292 750      300
# ℹ 1 more variable: data_round <chr>
Code
summary(effluent_clean)
  sample_id           camp_id            fstp_id          fstp_type_short   
 Length:1163        Length:1163        Length:1163        Length:1163       
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
 fstp_type_long          bod              cod              ecoli          
 Length:1163        Min.   :   3.0   Min.   :   17.4   Min.   :        0  
 Class :character   1st Qu.: 112.0   1st Qu.:  296.0   1st Qu.:     1000  
 Mode  :character   Median : 191.0   Median :  502.0   Median :    56000  
                    Mean   : 237.9   Mean   :  655.7   Mean   :   775319  
                    3rd Qu.: 326.0   3rd Qu.:  889.0   3rd Qu.:   250000  
                    Max.   :4554.0   Max.   :17438.0   Max.   :395000000  
  data_round       
 Length:1163       
 Class :character  
 Mode  :character  
                   
                   
                   

Data Processed

Code
write_csv(
  effluent_clean,
  here::here("data/processed/effluent-processed.csv")
)

Number of technologies

Code
camp_tech_counts <- effluent_clean |>
  distinct(camp_id, fstp_id, fstp_type_short) |>
  count(camp_id, fstp_type_short, name = "n_plants") |>
  arrange(camp_id, fstp_type_short)
Code
camp_totals <- camp_tech_counts |>
  group_by(camp_id) |>
  summarise(total_plants = sum(n_plants), .groups = "drop")

#| label: setup-palettes
cbp1 <- c(
  "#999999", "#E69F00", "#56B4E9", "#009E73",
  "#F0E442", "#0072B2", "#D55E00", "#CC79A7"
)

ggplot(camp_tech_counts,
       aes(x = camp_id, y = n_plants, fill = fstp_type_short)) +
  geom_col(position = "stack") +
  scale_fill_manual(values = cbp1) +
  geom_text(data = camp_totals,
            aes(x = camp_id, y = total_plants, label = total_plants),
            vjust = -0.3,
            inherit.aes = FALSE) +
  labs(
    title = "Number of FSTPs by camp and technology",
    x = "Camp",
    y = "Number of plants",
    fill = "Technology"
  ) +
  theme_bw() +
  theme(
    axis.text.x = element_text(
      angle = 45,
      hjust = 1,
      vjust = 1
    )
  )
Figure 1: Number of FSTPs by camp and technology

Figure 1 shows the number of FSTPs by camp and technology. Camps vary widely in both the number and type of FSTPs: some camps have only one or two plants of a single technology, whereas others have more than 10 plants and a mixture of several technologies. Important note: The numbers of plants shown for each camp do not represent all FSTPs present in the camps. This study includes only plants with monitoring data spanning more than one year, or with at least four sampling rounds. Because reliable information on the establishment or start‑up dates of the plants was not available.

Coverage by technology

Code
summary_table <- effluent |>
  group_by(fstp_type_short) |>
  summarise(
    Number_of_Plants = n_distinct(fstp_id),
    Total_Possible_Samples = Number_of_Plants * 8 * 3,
    Actual_Samples_Collected = sum(bod >= 0, na.rm = TRUE) +
      sum(cod >= 0, na.rm = TRUE) +
      sum(ecoli >= 0, na.rm = TRUE),
    Missing_Samples = sum(is.na(bod) + is.na(cod) + is.na(ecoli)),
    Coverage_Percent = (Actual_Samples_Collected / Total_Possible_Samples) * 100,
    .groups = "drop"
  ) |>
  gt() |>
  cols_label(
    fstp_type_short = "Technology",
    Number_of_Plants = "Number of Plants",
    Total_Possible_Samples = "Total Possible Samples",
    Actual_Samples_Collected = "Actual Samples Collected",
    Missing_Samples = "Missing Samples",
    Coverage_Percent = "Coverage (%)"
  ) |>
  fmt_number(
    columns = c(Number_of_Plants, Total_Possible_Samples,
                Actual_Samples_Collected, Missing_Samples),
    decimals = 0
  ) |>
  fmt_number(
    columns = c(Coverage_Percent),
    decimals = 1
  )

summary_table
Table 1: Sampling coverage by FSTP technology
Technology Number of Plants Total Possible Samples Actual Samples Collected Missing Samples Coverage (%)
ABR 45 1,080 924 156 85.6
DEWATS 28 672 510 162 75.9
LSP 21 504 318 186 63.1
SSU 31 744 549 195 73.8
UFF 44 1,056 894 162 84.7
WSP 14 336 294 42 87.5

Overall sample monitoring coverage was fairly high but not complete for any technology (Table 1), ranging from about 63% for LSP to almost 88% for WSP. LSP had 21 plants with the lowest coverage (63%), and SSU had 31 plants with coverage of about 74%, indicating more missing sampling rounds for these technologies than for others. In contrast, ABR and UFF had the largest numbers of plants (45 and 44, respectively) with reasonably good coverage of around 85%, while WSP had only 14 plants but with the highest coverage (about 88%).

Sampling monitoring rounds and calendar periods

Code
round_periods <- tibble::tibble(
  round = factor(paste0("R", 1:8), levels = paste0("R", 1:8)),
  year  = factor(c(2022, 2022, 2022,
                   2023, 2023, 2023,
                   2024, 2025)),
  period = c(
    "Jan–Apr", "May–Aug", "Sep–Dec",
    "Jan–Apr", "May–Aug", "Sep–Dec",
    "Jan–May", "Jan–Apr"
  )
)
Code
ggplot(round_periods,
       aes(x = year, y = round, fill = period)) +
  geom_tile(color = "white", linewidth = 0.7) +
  geom_text(aes(label = period), size = 3) +
  scale_fill_brewer(palette = "Blues") +
  labs(
    title = "Sampling monitoring rounds and calendar periods",
    x = "Year",
    y = "Monitoring round",
    fill = "Calendar period"
  ) +
  theme_bw()
Figure 2: Sampling monitoring rounds and calendar periods

Coverage by sampling rounds per technology

Code
rounds_summary <- effluent |>
  group_by(fstp_type_short, fstp_id) |>
  summarise(
    Total_Rounds = sum(!is.na(bod) & !is.na(cod) & !is.na(ecoli)),
    Ecoli_Rounds = sum(!is.na(ecoli)),
    Cod_Rounds   = sum(!is.na(cod)),
    Bod_Rounds   = sum(!is.na(bod)),
    .groups = "drop"
  ) |>
  group_by(fstp_type_short) |>
  summarise(
    Plants_with_8_Rounds = sum(Total_Rounds == 8),
    Plants_with_7_Rounds = sum(Total_Rounds == 7),
    Plants_with_6_Rounds = sum(Total_Rounds == 6),
    Plants_with_5_Rounds = sum(Total_Rounds == 5),
    Plants_with_4_Rounds = sum(Total_Rounds == 4),
    Plants_with_3_Rounds = sum(Total_Rounds == 3),
    Mean_Rounds_Per_Plant = mean(Total_Rounds),
    .groups = "drop"
  ) |>
  gt()

rounds_summary
Table 2: Number of monitoring rounds per FSTP technology.
fstp_type_short Plants_with_8_Rounds Plants_with_7_Rounds Plants_with_6_Rounds Plants_with_5_Rounds Plants_with_4_Rounds Plants_with_3_Rounds Mean_Rounds_Per_Plant
ABR 20 9 9 3 4 0 6.844444
DEWATS 8 6 2 4 8 0 6.071429
LSP 1 3 4 1 12 0 5.047619
SSU 9 4 4 3 11 0 5.903226
UFF 12 21 5 1 5 0 6.772727
WSP 6 5 1 1 1 0 7.000000

Coverage by rounds was uneven across technologies (Table 2). WSP had the highest average monitoring intensity (mean 7.0 rounds per plant), and most WSP plants had data available for 7–8 sampling rounds. ABR and UFF also had relatively strong coverage, each with a mean of 6.8 rounds per plant, and many plants monitored in 7 or 8 rounds. In contrast, LSP and SSU had the lowest average numbers of available rounds (means of 5.0 and 5.9 per plant, respectively), with many plants monitored in only 4–6 rounds. DEWATS was intermediate, with a mean of 6.1 rounds per plant and a mix of plants monitored in 4–8 rounds. These patterns indicate that time‑series information is richest for WSP, ABR, and UFF, and somewhat thinner for LSP and SSU.

Effluent descriptive statistics (raw data)

Code
# considering original effluent values
effluent_long <- effluent_clean |>
  pivot_longer(
    cols = c(bod, cod, ecoli),
    names_to = "parameter",
    values_to = "value"
  )

summary_statistics <- effluent_long |>
  group_by(fstp_type_short, parameter) |>
  summarise(
    n      = sum(!is.na(value)),
    Mean   = mean(value, na.rm = TRUE),
    SD     = sd(value, na.rm = TRUE),
    Median = median(value, na.rm = TRUE),
    Q1     = quantile(value, 0.25, na.rm = TRUE),
    Q3     = quantile(value, 0.75, na.rm = TRUE),
    Min    = min(value, na.rm = TRUE),
    Max    = max(value, na.rm = TRUE),
    .groups = "drop"
  ) |>
  mutate(
    IQR_text = paste0(round(Q1, 2), " – ", round(Q3, 2))
  )

gt_summary_table <- summary_statistics |>
  gt() |>
  tab_header(
    title = "Summary Statistics of Effluent by Technology",
    subtitle = "Original-data BOD, COD, and E. coli"
  ) |>
  cols_label(
    fstp_type_short = "Technology",
    parameter       = "Parameter",
    n               = "n",
    Mean            = "Mean",
    SD              = "SD",
    Median          = "Median",
    IQR_text        = "IQR (Q1–Q3)",
    Min             = "Min",
    Max             = "Max"
  ) |>
  fmt_number(
    columns = c(Mean, SD, Median, Min, Max),
    decimals = 2
  )

gt_summary_table
Table 3: Summary statistics of effluent parameters by technology (original values)
Summary Statistics of Effluent by Technology
Original-data BOD, COD, and E. coli
Technology Parameter n Mean SD Median Q1 Q3 Min Max IQR (Q1–Q3)
ABR bod 308 218.31 160.44 179.00 111.75 282.00 9.00 1,305.00 111.75 – 282
ABR cod 308 600.15 417.06 485.50 289.75 801.50 40.25 2,223.00 289.75 – 801.5
ABR ecoli 308 284,666.56 737,700.77 56,500.00 1000.00 272500.00 0.00 8,300,000.00 1000 – 272500
DEWATS bod 170 224.71 135.61 202.50 116.00 311.00 15.00 753.00 116 – 311
DEWATS cod 170 627.34 378.38 581.50 320.25 835.25 34.40 1,790.00 320.25 – 835.25
DEWATS ecoli 170 151,124.12 1,087,772.73 2,050.00 0.00 30000.00 0.00 13,800,000.00 0 – 30000
LSP bod 106 310.82 234.41 269.00 140.25 420.25 11.00 1,396.00 140.25 – 420.25
LSP cod 106 753.62 584.20 631.50 366.25 989.75 47.70 4,320.00 366.25 – 989.75
LSP ecoli 106 243,599.06 385,198.44 85,500.00 10000.00 257500.00 0.00 2,150,000.00 10000 – 257500
SSU bod 183 251.52 160.62 206.00 128.00 359.00 3.00 998.00 128 – 359
SSU cod 183 727.31 506.52 553.00 339.00 1041.50 17.40 2,730.00 339 – 1041.5
SSU ecoli 183 559,797.81 1,457,019.95 80,000.00 10000.00 330000.00 0.00 11,500,000.00 10000 – 330000
UFF bod 298 255.26 304.07 192.00 108.25 360.75 13.00 4,554.00 108.25 – 360.75
UFF cod 298 724.22 1,083.19 505.60 296.25 1063.00 42.80 17,438.00 296.25 – 1063
UFF ecoli 298 644,952.55 3,913,022.90 100,000.00 10000.00 340000.00 0.00 60,800,000.00 10000 – 340000
WSP bod 98 165.29 176.49 119.00 63.25 210.25 15.00 1,207.00 63.25 – 210.25
WSP cod 98 431.67 371.40 306.75 176.00 524.75 58.00 1,927.00 176 – 524.75
WSP ecoli 98 4,774,150.31 40,070,278.72 60,000.00 6275.00 205000.00 0.00 395,000,000.00 6275 – 205000

Check data distributions (normal/non-normal)

Data distributions (original values)

Code
# Simple min/max for each parameter
min(effluent_clean$bod,   na.rm = TRUE)
[1] 3
Code
max(effluent_clean$bod,   na.rm = TRUE)
[1] 4554
Code
min(effluent_clean$cod,   na.rm = TRUE)
[1] 17.4
Code
max(effluent_clean$cod,   na.rm = TRUE)
[1] 17438
Code
min(effluent_clean$ecoli, na.rm = TRUE)
[1] 0
Code
max(effluent_clean$ecoli, na.rm = TRUE)
[1] 3.95e+08
Code
effluent_long <- effluent_clean |>
  pivot_longer(
    cols = c(bod, cod, ecoli),
    names_to = "parameter",
    values_to = "value"
  )

ggplot(effluent_long, aes(x = value)) +
  geom_histogram(bins = 30, color = "black", fill = "steelblue") +
  facet_wrap(~ parameter, scales = "free_x") +
  labs(
    title = "Distribution of Effluent Parameters",
    x = "Value",
    y = "Count"
  )

Because the distributions are heavily right-skewed and dominated by a few extreme high values, the data clearly deviate from normality. This supports the use of log‑transformed values, geometric means, and non‑parametric statistical tests in the subsequent analyses, rather than assuming normally distributed residuals or relying on arithmetic mean.

Code
shapiro.test(effluent_clean$bod)

    Shapiro-Wilk normality test

data:  effluent_clean$bod
W = 0.65982, p-value < 2.2e-16
Code
shapiro.test(effluent_clean$cod)

    Shapiro-Wilk normality test

data:  effluent_clean$cod
W = 0.51361, p-value < 2.2e-16
Code
shapiro.test(effluent_clean$ecoli)

    Shapiro-Wilk normality test

data:  effluent_clean$ecoli
W = 0.029727, p-value < 2.2e-16

Shapiro–Wilk tests provided strong evidence against normality for all three parameters (BOD: W = 0.66, COD: W = 0.51, E. coli: W = 0.03; p < 0.001 for each).

Data distributions (Log transformed values)

Code
log_conversion_table <- effluent_clean |> 
  mutate(
    log_bod = log10(bod),
    log_cod = log10(cod),
    log_ecoli = log10(ecoli + 1)
  )
Code
# Long format for log10 values
effluent_long_log <- log_conversion_table |>
  pivot_longer(
    cols = c(log_bod, log_cod, log_ecoli),
    names_to = "parameter",
    values_to = "value"
  )

ggplot(effluent_long_log, aes(x = value)) +
  geom_histogram(bins = 30, color = "black", fill = "steelblue") +
  facet_wrap(~ parameter, scales = "free_x") +
  labs(
    title = "Distribution of log10-transformed effluent values",
    x = "log10(value)",
    y = "Count"
  )

After applying a base‑10 logarithmic transformation (log10) to BOD, COD, and E. coli, the histograms of the transformed values (Figure Y) were much more symmetric and bell‑shaped compared with the raw data. For BOD and COD, the log‑transformed distributions are centred around log10 values of roughly 2–3, with noticeably reduced skewness and more balanced tails. For E. coli, most non‑zero observations fall between log10 values of about 4 and 6, with a separate bar at 0 representing samples below 1 CFU/100 mL after adding 1 prior to transformation.

These patterns suggest that the log10‑transformed data better satisfy the assumptions of many statistical methods and provide more stable measures of central tendency. Consequently, subsequent analyses use log‑transformed values and geometric means, rather than the raw effluent measurements, when comparing technologies and rounds.

Code
shapiro.test(log_conversion_table$log_bod)

    Shapiro-Wilk normality test

data:  log_conversion_table$log_bod
W = 0.9677, p-value = 2.038e-15
Code
shapiro.test(log_conversion_table$log_cod)

    Shapiro-Wilk normality test

data:  log_conversion_table$log_cod
W = 0.98132, p-value = 4.359e-11
Code
shapiro.test(log_conversion_table$log_ecoli)

    Shapiro-Wilk normality test

data:  log_conversion_table$log_ecoli
W = 0.81184, p-value < 2.2e-16

Effluent BOD, COD, and E. coli were strongly right‑skewed in the original scale, and although log10 transformation improved symmetry, all three parameters remained statistically non‑normal (Shapiro–Wilk p < 0.001). Therefore, non-parametric tests (Kruskal–Wallis with Dunn post-hoc) were used for inter-technology comparisons.

Results:

Plant-level geometric means (for technology comparison)

Code
log_means_table <- log_conversion_table |> 
  group_by(fstp_id, fstp_type_short, fstp_type_long) |>
  summarise(
    log_mean_bod   = mean(log_bod,   na.rm = TRUE),
    log_mean_cod   = mean(log_cod,   na.rm = TRUE),
    log_mean_ecoli = mean(log_ecoli, na.rm = TRUE),
    .groups = 'drop'
  )

geom_means_table <- log_means_table |>
  mutate(
    geom_mean_bod   = 10^log_mean_bod,
    geom_mean_cod   = 10^log_mean_cod,
    geom_mean_ecoli = 10^log_mean_ecoli
  )
Code
geom_means_long <- geom_means_table |>
  pivot_longer(
    cols = c(geom_mean_bod, geom_mean_cod, geom_mean_ecoli),
    names_to = "parameter",
    values_to = "geom_mean"
  )
Code
geom_means_long2 <- geom_means_long |>
  mutate(parameter_label = case_when(
    parameter == "geom_mean_bod"   ~ "A) BOD",
    parameter == "geom_mean_cod"   ~ "B) COD",
    parameter == "geom_mean_ecoli" ~ "C) E. coli",
    TRUE                           ~ parameter
  ))

ggplot(geom_means_long2,
       aes(x = fstp_type_short, y = geom_mean)) +
  geom_boxplot(outlier.shape = NA) +
  geom_jitter(width = 0.2, alpha = 0.5) +
  facet_wrap(~ parameter_label, scales = "free_y") +
  labs(
    title = "Plant-level geometric mean effluent by technology",
    x = "Technology",
    y = "Geometric mean (original units)"
  )
Figure 3: Plant-level geometric mean effluent by technology

BOD geometric means

Figure 3 (A) shows that across technologies most plants have geometric mean BOD between roughly 100 and 300 mg/L, but the central tendency and spread differ. LSP and UFF show the highest medians and wide interquartile ranges, indicating higher and more variable BOD in their effluents. In contrast, WSP has the lowest median BOD and a relatively narrow IQR, showing better and more consistent organic matter removal than the other technologies.

COD geometric means

Figure 3 (B) shows a similar pattern. SSU and UFF have higher median geometric mean COD and broader IQR, indicating poorer and more variable performance. ABR and DEWATS lie in the middle, while WSP again has the lowest median COD with a tighter distribution, reinforcing its relatively strong performance for organic parameters.

E. coli geometric means

For E. coli Figure 3 (C), ABR and WSP have similar, relatively low medians, and their IQRs are lower than those of LSP, SSU, and UFF. DEWATS also has the lowest median compared to other technologies. LSP and SSU show largea geometric mean E. coli, indicating weaker and more variable microbial reduction.

Overall, these plots indicate that WSP performs best for BOD and COD, DEWATS and WSP are relatively favourable for E. coli. LSP, SSU and UFF are generally the least effective and most variable across all three indicators.

Non‑parametric test

Code
kruskal.test(geom_mean_bod ~ fstp_type_short,
             data = geom_means_table)

    Kruskal-Wallis rank sum test

data:  geom_mean_bod by fstp_type_short
Kruskal-Wallis chi-squared = 17.301, df = 5, p-value = 0.003963
Code
kruskal.test(geom_mean_cod ~ fstp_type_short,
             data = geom_means_table)

    Kruskal-Wallis rank sum test

data:  geom_mean_cod by fstp_type_short
Kruskal-Wallis chi-squared = 17.782, df = 5, p-value = 0.003233
Code
kruskal.test(geom_mean_ecoli ~ fstp_type_short,
             data = geom_means_table)

    Kruskal-Wallis rank sum test

data:  geom_mean_ecoli by fstp_type_short
Kruskal-Wallis chi-squared = 42.67, df = 5, p-value = 4.31e-08

Non‑parametric Kruskal–Wallis tests indicated statistically significant differences in plant‑level geometric mean effluent across technologies for BOD, COD, and E. coli (BOD: χ²(5) = 17.3, p = 0.004; COD: χ²(5) = 17.8, p = 0.003; E. coli: χ²(5) = 42.7, p < 0.001). These results confirm that at least one technology differs from the others in its typical effluent quality for each parameter. Pairwise Dunn post‑hoc tests were therefore used to identify which technologies differed significantly from each other.

Code
dunn_bod <- dunnTest(
  geom_mean_bod ~ fstp_type_short,
  data = geom_means_table,
  method = "bh" # Benjamini-Hochberg for multiple comparisons
)

dunn_bod
     Comparison          Z      P.unadj       P.adj
1  ABR - DEWATS -0.7312004 0.4646567431 0.536142396
2     ABR - LSP -2.4237398 0.0153616076 0.046084823
3  DEWATS - LSP -1.6091924 0.1075742717 0.230516296
4     ABR - SSU -1.5809986 0.1138783634 0.213521931
5  DEWATS - SSU -0.7403527 0.4590860162 0.573857520
6     LSP - SSU  0.9606806 0.3367127849 0.459153798
7     ABR - UFF -1.3586064 0.1742713332 0.290452222
8  DEWATS - UFF -0.4634691 0.6430281871 0.688958772
9     LSP - UFF  1.3290096 0.1838447956 0.275767193
10    SSU - UFF  0.3453412 0.7298378442 0.729837844
11    ABR - WSP  2.1413557 0.0322453658 0.080613414
12 DEWATS - WSP  2.5396835 0.0110952819 0.041607307
13    LSP - WSP  3.7557023 0.0001728561 0.002592841
14    SSU - WSP  3.1811014 0.0014671626 0.011003719
15    UFF - WSP  3.0743170 0.0021098511 0.010549256

For plant‑level geometric mean BOD, Dunn post‑hoc tests showed that WSP plants had significantly lower effluent BOD than DEWATS, LSP, SSU, and UFF (adjusted p < 0.05). The difference between ABR and LSP was borderline significant. Other pairs do not differ significantly.

Code
dunn_cod <- dunnTest(
  geom_mean_cod ~ fstp_type_short,
  data = geom_means_table,
  method = "bh"
)

dunn_cod
     Comparison          Z      P.unadj       P.adj
1  ABR - DEWATS -0.6856366 0.4929423227 0.616177903
2     ABR - LSP -1.5828690 0.1134513271 0.243109987
3  DEWATS - LSP -0.8773875 0.3802762257 0.570414338
4     ABR - SSU -2.0196592 0.0434187550 0.108546888
5  DEWATS - SSU -1.1751378 0.2399395719 0.399899287
6     LSP - SSU -0.1878678 0.8509802868 0.850980287
7     ABR - UFF -1.3664313 0.1718036230 0.322131793
8  DEWATS - UFF -0.5156981 0.6060653042 0.699306120
9     LSP - UFF  0.4849079 0.6277416960 0.672580389
10    SSU - UFF  0.7749077 0.4383941789 0.597810244
11    ABR - WSP  2.4371642 0.0148029575 0.044408873
12 DEWATS - WSP  2.7827351 0.0053902789 0.020213546
13    LSP - WSP  3.3740094 0.0007408185 0.003704092
14    SSU - WSP  3.7802005 0.0001567021 0.002350532
15    UFF - WSP  3.3747376 0.0007388611 0.005541459

For COD, WSP had significantly lower plant‑level geometric mean effluent than all other technologies (ABR, DEWATS, LSP, SSU, UFF; adjusted p < 0.05), whereas no other technology pairs differed significantly.

Code
dunn_ecoli <- dunnTest(
  geom_mean_ecoli ~ fstp_type_short,
  data = geom_means_table,
  method = "bh"
)

dunn_ecoli
     Comparison          Z      P.unadj        P.adj
1  ABR - DEWATS  4.1587607 3.199788e-05 1.199921e-04
2     ABR - LSP -1.8112285 7.010549e-02 1.752637e-01
3  DEWATS - LSP -5.1257489 2.963574e-07 2.222680e-06
4     ABR - SSU -1.3576098 1.745875e-01 3.273516e-01
5  DEWATS - SSU -5.0549196 4.305720e-07 2.152860e-06
6     LSP - SSU  0.5724273 5.670325e-01 8.505488e-01
7     ABR - UFF -1.7753939 7.583288e-02 1.624990e-01
8  DEWATS - UFF -5.6977791 1.213782e-08 1.820673e-07
9     LSP - UFF  0.3855384 6.998386e-01 8.747982e-01
10    SSU - UFF -0.2538562 7.996066e-01 9.226231e-01
11    ABR - WSP -1.0118784 3.115962e-01 5.193270e-01
12 DEWATS - WSP -4.0041726 6.223493e-05 1.867048e-04
13    LSP - WSP  0.4898176 6.242630e-01 8.512677e-01
14    SSU - WSP  0.0224240 9.821097e-01 9.821097e-01
15    UFF - WSP  0.2175253 8.277990e-01 8.869275e-01

For E. coli, DEWATS plants had significantly lower plant‑level geometric mean effluent E. coli than all other technologies (ABR, LSP, SSU, UFF, WSP; adjusted p < 0.001). No statistically significant differences were detected among the remaining technologies after multiple‑comparison adjustment.

Microbial risk: proportion of fstp with high effluent E. coli only

Still thinking how to represent this in better way

Code
plant_camp <- effluent_clean |>
  dplyr::distinct(fstp_id, camp_id)
Code
geom_means_with_camp <- geom_means_table |>
  dplyr::left_join(plant_camp, by = "fstp_id")
Code
plants_flag <- geom_means_with_camp |>
  dplyr::mutate(
    high_risk_plant = geom_mean_ecoli > 1e4
  )
Code
camp_risk_simple <- plants_flag |>
  dplyr::group_by(camp_id) |>
  dplyr::summarise(
    n_plants       = dplyr::n(),
    n_high_risk    = sum(high_risk_plant, na.rm = TRUE),
    prop_high_risk = n_high_risk / n_plants,
    .groups = "drop"
  )
Code
camp_risk_simple <- camp_risk_simple |>
  dplyr::mutate(
    camp_id = reorder(camp_id, -prop_high_risk)
  )

ggplot(camp_risk_simple,
       aes(x = 1, y = camp_id, fill = prop_high_risk)) +
  geom_tile(color = "grey70") +
  geom_text(aes(label = paste0(n_high_risk, "/", n_plants)),
            color = "black", size = 3) +
  scale_fill_gradient(
    low  = "#FFE5E5",
    high = "#8B0000"
  ) +
  labs(
    title = "Camp-wise share of high-E. coli plants (sorted by risk)",
    x = NULL,
    y = "Camp",
    fill = "Proportion of plants\ngeom. mean E. coli > 10^4"
  ) +
  theme_bw() +
  theme(
    axis.text.x  = element_blank(),
    axis.ticks.x = element_blank()
  )
Figure 7: Camp-wise share of high E. coli risk among FSTPs
Code
camp_tech_risk <- plants_flag |>
  group_by(camp_id, fstp_type_short) |>
  summarise(
    n_plants      = n(),
    n_high_risk   = sum(high_risk_plant, na.rm = TRUE),
    prop_high_risk = n_high_risk / n_plants,
    .groups = "drop"
  )
Code
camp_tech_risk <- camp_tech_risk |>
  group_by(fstp_type_short) |>
  arrange(desc(prop_high_risk), .by_group = TRUE) |>
  mutate(
    camp_tech_id = factor(
      paste(fstp_type_short, camp_id, sep = "__"),
      levels = paste(fstp_type_short, camp_id, sep = "__")
    )
  ) |>
  ungroup()

ggplot(camp_tech_risk,
       aes(x = 1, y = camp_tech_id, fill = prop_high_risk)) +
  geom_tile(color = "grey70") +
  geom_text(aes(label = paste0(n_high_risk, "/", n_plants)),
            color = "black", size = 3) +
  scale_y_discrete(
    labels = function(x) sub("^[^_]+__", "", x)  # strip "TECH__" so only camp name shows
  ) +
  scale_fill_gradient(
    low  = "#FFE5E5",
    high = "#8B0000"
  ) +
  facet_wrap(~ fstp_type_short, scales = "free_y") +
  labs(
    title = "Camp-wise share of high-E. coli plants by technology",
    x = NULL,
    y = "Camp (sorted high to low risk within each technology)",
    fill = "Proportion of plants\ngeom. mean E. coli > 10^4"
  ) +
  theme_bw() +
  theme(
    axis.text.x  = element_blank(),
    axis.ticks.x = element_blank()
  )
Figure 8: Camp-wise share of high-E. coli plants by technology

Sanky (used AI) need to change the load_band case_when for e coli load based on literature

Code
library(networkD3)

## 0) CHECK INPUT -----------------------------------------------------------
# geom_means_with_camp should have:
# fstp_id, fstp_type_short, camp_id, geom_mean_ecoli

## 1) Plant-level E. coli risk bands ---------------------------------------

plants_risk_bands <- geom_means_with_camp |>
  mutate(
    ecoli_band = case_when(
      geom_mean_ecoli <= 1e3  ~ "≤ 10^3 (compliant)",
      geom_mean_ecoli <= 1e4  ~ "> 10^3–10^4",
      geom_mean_ecoli <= 1e5  ~ "> 10^4–10^5",
      TRUE                    ~ "> 10^5"
    ),
    ecoli_band = factor(
      ecoli_band,
      levels = c("≤ 10^3 (compliant)",
                 "> 10^3–10^4",
                 "> 10^4–10^5",
                 "> 10^5")
    )
  )

## 2) Camp-level total E. coli load and load bands -------------------------

camp_load <- plants_risk_bands |>
  group_by(camp_id) |>
  summarise(
    total_geom_ecoli = sum(geom_mean_ecoli, na.rm = TRUE),
    .groups = "drop"
  )

# Choose load bands based on your range (edit thresholds if needed)
camp_load <- camp_load |>
  mutate(
    load_band = case_when(
      total_geom_ecoli <= 1e4        ~ "Low load (≤ 10^4)",
      total_geom_ecoli <= 1e5        ~ "Moderate load (10^4–10^5)",
      total_geom_ecoli <= 5e5        ~ "High load (10^5–5×10^5)",
      TRUE                           ~ "Very high load (> 5×10^5)"
    ),
    load_band = factor(
      load_band,
      levels = c("Low load (≤ 10^4)",
                 "Moderate load (10^4–10^5)",
                 "High load (10^5–5×10^5)",
                 "Very high load (> 5×10^5)")
    )
  )

## 3) Rank camps from low to high risk (for nicer ordering) ----------------

camp_risk_rank <- plants_risk_bands |>
  mutate(high_band = ecoli_band == "> 10^5") |>
  group_by(camp_id) |>
  summarise(
    n_plants    = n(),
    n_high_band = sum(high_band, na.rm = TRUE),
    prop_high   = n_high_band / n_plants,
    .groups = "drop"
  ) |>
  arrange(prop_high, n_high_band)   # low → high risk

camp_nodes <- camp_risk_rank$camp_id   # ordered camps

## 4) Build link tables for each layer -------------------------------------

# Technology -> plant risk band
tech_to_band <- plants_risk_bands |>
  count(fstp_type_short, ecoli_band, name = "value")

# Risk band -> camp
band_to_camp <- plants_risk_bands |>
  count(ecoli_band, camp_id, name = "value")

# Camp -> load band
camp_to_load <- camp_load |>
  count(camp_id, load_band, name = "value")

## 5) Build node list (technologies, bands, camps, load bands) -------------

tech_nodes <- sort(unique(plants_risk_bands$fstp_type_short))
band_nodes <- levels(plants_risk_bands$ecoli_band)
load_nodes <- levels(camp_load$load_band)

nodes <- data.frame(
  name = c(tech_nodes, band_nodes, camp_nodes, load_nodes),
  stringsAsFactors = FALSE
)

# helper for index
node_id <- function(x) match(x, nodes$name) - 1

## 6) Create combined links -------------------------------------------------

links_tech_band <- data.frame(
  source = node_id(tech_to_band$fstp_type_short),
  target = node_id(as.character(tech_to_band$ecoli_band)),
  value  = tech_to_band$value
)

links_band_camp <- data.frame(
  source = node_id(as.character(band_to_camp$ecoli_band)),
  target = node_id(band_to_camp$camp_id),
  value  = band_to_camp$value
)

links_camp_load <- data.frame(
  source = node_id(camp_to_load$camp_id),
  target = node_id(as.character(camp_to_load$load_band)),
  value  = camp_to_load$value
)

links <- rbind(links_tech_band, links_band_camp, links_camp_load)

## 7) Node groups for colouring by level -----------------------------------

nodes$group <- c(
  rep("Technology", length(tech_nodes)),
  rep("Band",       length(band_nodes)),
  rep("Camp",       length(camp_nodes)),
  rep("Load",       length(load_nodes))
)

## 8) Plot Sankey: Technology → risk band → camp → load band --------------

sankeyNetwork(
  Links     = links,
  Nodes     = nodes,
  Source    = "source",
  Target    = "target",
  Value     = "value",
  NodeID    = "name",
  NodeGroup = "group",
  fontSize  = 11,
  nodeWidth = 20
)
Figure 9: Sankey diagram of technology, plant-level E. coli risk bands, camps, and total camp E. coli load bands

Compliance

Code
compliance_long <- effluent_clean |>
  mutate(
    bod   = as.numeric(bod),
    cod   = as.numeric(cod),
    ecoli = as.numeric(ecoli)
  ) |>
  # make long by parameter
  tidyr::pivot_longer(
    cols = c(bod, cod, ecoli),
    names_to = "parameter",
    values_to = "value"
  ) |>
  mutate(
    parameter = recode(parameter,
                       bod   = "BOD",
                       cod   = "COD",
                       ecoli = "E. coli"),
    status = case_when(
      parameter == "BOD"   & value <= 30   ~ "Meets standard",
      parameter == "COD"   & value <= 125  ~ "Meets standard",
      parameter == "E. coli" & value <= 1000 ~ "Meets standard",
      TRUE ~ "Exceeds standard"
    )
  ) |>
  group_by(parameter, fstp_type_short, status) |>
  summarise(n = n(), .groups = "drop") |>
  group_by(parameter, fstp_type_short) |>
  mutate(pct = 100 * n / sum(n)) |>
  ungroup()

ggplot(compliance_long,
       aes(x = fstp_type_short, y = pct, fill = status)) +
  geom_col(position = "stack") +
  geom_text(
    aes(label = sprintf("%.1f%%", pct)),
    position = position_stack(vjust = 0.5),
    size = 3
  ) +
  facet_wrap(~ parameter, nrow = 1) +
  scale_y_continuous(labels = scales::percent_format(scale = 1)) +
  scale_fill_manual(
    values = c("Meets standard" = "#009E73",
               "Exceeds standard" = "#D55E00")
  ) +
  labs(
    title = "Percentage of samples meeting discharge standards by technology and parameter",
    x = "Technology",
    y = "Samples (%)",
    fill = "Status"
  ) +
  theme_bw()
Figure 10: Percentage of samples meeting discharge standards by technology and parameter

Conclusion:

  • Decentralized fecal sludge treatment in the Rohingya camps substantially reduces pollutant concentrations but not showed high variability over the years across all technologies.

  • None of the assessed technologies consistently achieve microbiologically safe or standards‑compliant effluent, especially for BOD and COD.

  • WSPs provided the highest BOD and COD reduction but only moderate E. coli removal, whereas DEWATS achieved the highest microbial reduction.

  • A subset of camps contributes disproportionately high E. coli loads.

  • These findings highlight the need for technology‑specific upgrades, strengthened operation and monitoring, and risk‑based prioritization of high‑burden camps to better protect public health in protracted humanitarian crises.

References:

Arnold, Jeffrey B. 2024. “Ggthemes: Extra Themes, Scales and Geoms for ’Ggplot2’.” https://jrnold.github.io/ggthemes/.
Dinno, Alexis. 2024. “Dunn.test: Dunn’s Test of Multiple Comparisons Using Rank Sums.”
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the Tidyverse 4: 1686. https://doi.org/10.21105/joss.01686.