EVALUATING THE PERFORMANCE OF FECAL SLUDGE TREATMENT TECHNOLOGIES IN ROHINGYA REFUGEE CAMPS

Author

Affiliation

Mejbah Uddin Chowdhury

University of Victoria

Published

December 11, 2025

Abstract

add here

Introduction:

Project description

This project analyzes four years of fecal sludge effluent quality data from 183 fecal sludge treatment plants (FSTPs) operating in the Rohingya refugee camps. These facilities use six different decentralized treatment technologies, with effluent samples collected across eight monitoring rounds.

The primary objective of this study is to evaluate the performance of these treatment technologies over time by analyzing key effluent quality parameters—BOD, COD, and E. coli. The analysis assesses the extent to which the systems comply with national discharge standards and provides a partial understanding of camp-level microbial risks in this densely populated camp setting.

Note: For the Capstone Project, the performance of the FSTPs is assessed based solely on effluent quality. Removal efficiency calculations will be conducted later once influent quality data becomes available. Due to the current lack of influent data, the project goal has been slightly revised to focus primarily on effluent quality.

Method:

Raw effluent monitoring data were imported from CSV files and restricted to samples with non‑missing BOD, COD, and E. coli, resulting in 1,163 observations for analysis. Data were summarised by camp and technology to quantify the number of plants and monitoring coverage, and plant‑level geometric means were calculated from log₁₀‑transformed BOD, COD, and E. coli values. Because all three parameters were strongly right‑skewed and non‑normal, subsequent analyses used log‑transformed values, geometric means, and non‑parametric tests (Kruskal–Wallis with Dunn post‑hoc) for comparisons between technologies.

Library

Code

library(ggplot2)
library(dplyr)
library(readr)
library(ggthemes)
library(tidyverse)
library(gt)
library(FSA)
library(dunn.test)
library(here)

(Wickham et al. 2019) (Dinno 2024) (Arnold 2024)

Import

Code

effluent <- read_csv(here::here("data/raw/fstp_effluent_quality.csv"))
influent <- read_csv(here::here("data/raw/fstp_influent_quality_avg.csv")) #not considered for capstone project
design_objectives <- read_csv(here::here("data/raw/design_objectives.csv")) #not considered for capstone project
national_standard <- read_csv(here::here("data/raw/national_standard.csv"))

Data Preprocessing

Remove NAs

Code

effluent_clean <- effluent |> 
  filter(!is.na(bod) & !is.na(cod) & !is.na(ecoli))

Code

head(effluent_clean)

# A tibble: 6 × 9
  sample_id    camp_id fstp_id fstp_type_short fstp_type_long   bod   cod  ecoli
  <chr>        <chr>   <chr>   <chr>           <chr>          <dbl> <dbl>  <dbl>
1 DPHE_FSL_S_… Camp 04 FSTP_0… LSP             Lime Stabiliz…    22  47.7   1800
2 DPHE_FSL_S_… Camp 0… FSTP_0… LSP             Lime Stabiliz…   138 458   110000
3 DPHE_FSL_S_… Camp 0… FSTP_0… ABR             Anaerobic Baf…   195 472   220000
4 DPHE_FSL_S_… Camp 18 FSTP_1… SSU             Solid Separat…   148 412      300
5 DPHE_FSL_S_… Camp 12 FSTP_1… DEWATS          Decentralised…   311 688     2100
6 DPHE_FSL_S_… Camp 12 FSTP_1… DEWATS          Decentralised…   292 750      300
# ℹ 1 more variable: data_round <chr>

Code

summary(effluent_clean)

  sample_id           camp_id            fstp_id          fstp_type_short   
 Length:1163        Length:1163        Length:1163        Length:1163       
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
 fstp_type_long          bod              cod              ecoli          
 Length:1163        Min.   :   3.0   Min.   :   17.4   Min.   :        0  
 Class :character   1st Qu.: 112.0   1st Qu.:  296.0   1st Qu.:     1000  
 Mode  :character   Median : 191.0   Median :  502.0   Median :    56000  
                    Mean   : 237.9   Mean   :  655.7   Mean   :   775319  
                    3rd Qu.: 326.0   3rd Qu.:  889.0   3rd Qu.:   250000  
                    Max.   :4554.0   Max.   :17438.0   Max.   :395000000  
  data_round       
 Length:1163       
 Class :character  
 Mode  :character

Data Processed

Code

write_csv(
  effluent_clean,
  here::here("data/processed/effluent-processed.csv")
)

Number of technologies

Code

camp_tech_counts <- effluent_clean |>
  distinct(camp_id, fstp_id, fstp_type_short) |>
  count(camp_id, fstp_type_short, name = "n_plants") |>
  arrange(camp_id, fstp_type_short)

Code

camp_totals <- camp_tech_counts |>
  group_by(camp_id) |>
  summarise(total_plants = sum(n_plants), .groups = "drop")

#| label: setup-palettes
cbp1 <- c(
  "#999999", "#E69F00", "#56B4E9", "#009E73",
  "#F0E442", "#0072B2", "#D55E00", "#CC79A7"
)

ggplot(camp_tech_counts,
       aes(x = camp_id, y = n_plants, fill = fstp_type_short)) +
  geom_col(position = "stack") +
  scale_fill_manual(values = cbp1) +
  geom_text(data = camp_totals,
            aes(x = camp_id, y = total_plants, label = total_plants),
            vjust = -0.3,
            inherit.aes = FALSE) +
  labs(
    title = "Number of FSTPs by camp and technology",
    x = "Camp",
    y = "Number of plants",
    fill = "Technology"
  ) +
  theme_bw() +
  theme(
    axis.text.x = element_text(
      angle = 45,
      hjust = 1,
      vjust = 1
    )
  )

Figure 1: Number of FSTPs by camp and technology

Figure 1 shows the number of FSTPs by camp and technology. Camps vary widely in both the number and type of FSTPs: some camps have only one or two plants of a single technology, whereas others have more than 10 plants and a mixture of several technologies. Important note: The numbers of plants shown for each camp do not represent all FSTPs present in the camps. This study includes only plants with monitoring data spanning more than one year, or with at least four sampling rounds. Because reliable information on the establishment or start‑up dates of the plants was not available.

Coverage by technology

Code

summary_table <- effluent |>
  group_by(fstp_type_short) |>
  summarise(
    Number_of_Plants = n_distinct(fstp_id),
    Total_Possible_Samples = Number_of_Plants * 8 * 3,
    Actual_Samples_Collected = sum(bod >= 0, na.rm = TRUE) +
      sum(cod >= 0, na.rm = TRUE) +
      sum(ecoli >= 0, na.rm = TRUE),
    Missing_Samples = sum(is.na(bod) + is.na(cod) + is.na(ecoli)),
    Coverage_Percent = (Actual_Samples_Collected / Total_Possible_Samples) * 100,
    .groups = "drop"
  ) |>
  gt() |>
  cols_label(
    fstp_type_short = "Technology",
    Number_of_Plants = "Number of Plants",
    Total_Possible_Samples = "Total Possible Samples",
    Actual_Samples_Collected = "Actual Samples Collected",
    Missing_Samples = "Missing Samples",
    Coverage_Percent = "Coverage (%)"
  ) |>
  fmt_number(
    columns = c(Number_of_Plants, Total_Possible_Samples,
                Actual_Samples_Collected, Missing_Samples),
    decimals = 0
  ) |>
  fmt_number(
    columns = c(Coverage_Percent),
    decimals = 1
  )

summary_table

Table 1: Sampling coverage by FSTP technology

Technology	Number of Plants	Total Possible Samples	Actual Samples Collected	Missing Samples	Coverage (%)
ABR	45	1,080	924	156	85.6
DEWATS	28	672	510	162	75.9
LSP	21	504	318	186	63.1
SSU	31	744	549	195	73.8
UFF	44	1,056	894	162	84.7
WSP	14	336	294	42	87.5

Overall sample monitoring coverage was fairly high but not complete for any technology (Table 1), ranging from about 63% for LSP to almost 88% for WSP. LSP had 21 plants with the lowest coverage (63%), and SSU had 31 plants with coverage of about 74%, indicating more missing sampling rounds for these technologies than for others. In contrast, ABR and UFF had the largest numbers of plants (45 and 44, respectively) with reasonably good coverage of around 85%, while WSP had only 14 plants but with the highest coverage (about 88%).

Sampling monitoring rounds and calendar periods

Code

round_periods <- tibble::tibble(
  round = factor(paste0("R", 1:8), levels = paste0("R", 1:8)),
  year  = factor(c(2022, 2022, 2022,
                   2023, 2023, 2023,
                   2024, 2025)),
  period = c(
    "Jan–Apr", "May–Aug", "Sep–Dec",
    "Jan–Apr", "May–Aug", "Sep–Dec",
    "Jan–May", "Jan–Apr"
  )
)

Code

ggplot(round_periods,
       aes(x = year, y = round, fill = period)) +
  geom_tile(color = "white", linewidth = 0.7) +
  geom_text(aes(label = period), size = 3) +
  scale_fill_brewer(palette = "Blues") +
  labs(
    title = "Sampling monitoring rounds and calendar periods",
    x = "Year",
    y = "Monitoring round",
    fill = "Calendar period"
  ) +
  theme_bw()

Figure 2: Sampling monitoring rounds and calendar periods

Coverage by sampling rounds per technology

Code

rounds_summary <- effluent |>
  group_by(fstp_type_short, fstp_id) |>
  summarise(
    Total_Rounds = sum(!is.na(bod) & !is.na(cod) & !is.na(ecoli)),
    Ecoli_Rounds = sum(!is.na(ecoli)),
    Cod_Rounds   = sum(!is.na(cod)),
    Bod_Rounds   = sum(!is.na(bod)),
    .groups = "drop"
  ) |>
  group_by(fstp_type_short) |>
  summarise(
    Plants_with_8_Rounds = sum(Total_Rounds == 8),
    Plants_with_7_Rounds = sum(Total_Rounds == 7),
    Plants_with_6_Rounds = sum(Total_Rounds == 6),
    Plants_with_5_Rounds = sum(Total_Rounds == 5),
    Plants_with_4_Rounds = sum(Total_Rounds == 4),
    Plants_with_3_Rounds = sum(Total_Rounds == 3),
    Mean_Rounds_Per_Plant = mean(Total_Rounds),
    .groups = "drop"
  ) |>
  gt()

rounds_summary

Table 2: Number of monitoring rounds per FSTP technology.

fstp_type_short	Plants_with_8_Rounds	Plants_with_7_Rounds	Plants_with_6_Rounds	Plants_with_5_Rounds	Plants_with_4_Rounds	Mean_Rounds_Per_Plant
ABR	20	9	9	3	4	6.844444
DEWATS	8	6	2	4	8	6.071429
LSP	1	3	4	1	12	5.047619
SSU	9	4	4	3	11	5.903226
UFF	12	21	5	1	5	6.772727
WSP	6	5	1	1	1	7.000000

Coverage by rounds was uneven across technologies (Table 2). WSP had the highest average monitoring intensity (mean 7.0 rounds per plant), and most WSP plants had data available for 7–8 sampling rounds. ABR and UFF also had relatively strong coverage, each with a mean of 6.8 rounds per plant, and many plants monitored in 7 or 8 rounds. In contrast, LSP and SSU had the lowest average numbers of available rounds (means of 5.0 and 5.9 per plant, respectively), with many plants monitored in only 4–6 rounds. DEWATS was intermediate, with a mean of 6.1 rounds per plant and a mix of plants monitored in 4–8 rounds. These patterns indicate that time‑series information is richest for WSP, ABR, and UFF, and somewhat thinner for LSP and SSU.

Effluent descriptive statistics (raw data)

Code

# considering original effluent values
effluent_long <- effluent_clean |>
  pivot_longer(
    cols = c(bod, cod, ecoli),
    names_to = "parameter",
    values_to = "value"
  )

summary_statistics <- effluent_long |>
  group_by(fstp_type_short, parameter) |>
  summarise(
    n      = sum(!is.na(value)),
    Mean   = mean(value, na.rm = TRUE),
    SD     = sd(value, na.rm = TRUE),
    Median = median(value, na.rm = TRUE),
    Q1     = quantile(value, 0.25, na.rm = TRUE),
    Q3     = quantile(value, 0.75, na.rm = TRUE),
    Min    = min(value, na.rm = TRUE),
    Max    = max(value, na.rm = TRUE),
    .groups = "drop"
  ) |>
  mutate(
    IQR_text = paste0(round(Q1, 2), " – ", round(Q3, 2))
  )

gt_summary_table <- summary_statistics |>
  gt() |>
  tab_header(
    title = "Summary Statistics of Effluent by Technology",
    subtitle = "Original-data BOD, COD, and E. coli"
  ) |>
  cols_label(
    fstp_type_short = "Technology",
    parameter       = "Parameter",
    n               = "n",
    Mean            = "Mean",
    SD              = "SD",
    Median          = "Median",
    IQR_text        = "IQR (Q1–Q3)",
    Min             = "Min",
    Max             = "Max"
  ) |>
  fmt_number(
    columns = c(Mean, SD, Median, Min, Max),
    decimals = 2
  )

gt_summary_table

Table 3: Summary statistics of effluent parameters by technology (original values)

Technology	Parameter	n	Mean	SD	Median	Q1	Q3	Min	Max	IQR (Q1–Q3)
Summary Statistics of Effluent by Technology
Original-data BOD, COD, and E. coli
ABR	bod	308	218.31	160.44	179.00	111.75	282.00	9.00	1,305.00	111.75 – 282
ABR	cod	308	600.15	417.06	485.50	289.75	801.50	40.25	2,223.00	289.75 – 801.5
ABR	ecoli	308	284,666.56	737,700.77	56,500.00	1000.00	272500.00	0.00	8,300,000.00	1000 – 272500
DEWATS	bod	170	224.71	135.61	202.50	116.00	311.00	15.00	753.00	116 – 311
DEWATS	cod	170	627.34	378.38	581.50	320.25	835.25	34.40	1,790.00	320.25 – 835.25
DEWATS	ecoli	170	151,124.12	1,087,772.73	2,050.00	0.00	30000.00	0.00	13,800,000.00	0 – 30000
LSP	bod	106	310.82	234.41	269.00	140.25	420.25	11.00	1,396.00	140.25 – 420.25
LSP	cod	106	753.62	584.20	631.50	366.25	989.75	47.70	4,320.00	366.25 – 989.75
LSP	ecoli	106	243,599.06	385,198.44	85,500.00	10000.00	257500.00	0.00	2,150,000.00	10000 – 257500
SSU	bod	183	251.52	160.62	206.00	128.00	359.00	3.00	998.00	128 – 359
SSU	cod	183	727.31	506.52	553.00	339.00	1041.50	17.40	2,730.00	339 – 1041.5
SSU	ecoli	183	559,797.81	1,457,019.95	80,000.00	10000.00	330000.00	0.00	11,500,000.00	10000 – 330000
UFF	bod	298	255.26	304.07	192.00	108.25	360.75	13.00	4,554.00	108.25 – 360.75
UFF	cod	298	724.22	1,083.19	505.60	296.25	1063.00	42.80	17,438.00	296.25 – 1063
UFF	ecoli	298	644,952.55	3,913,022.90	100,000.00	10000.00	340000.00	0.00	60,800,000.00	10000 – 340000
WSP	bod	98	165.29	176.49	119.00	63.25	210.25	15.00	1,207.00	63.25 – 210.25
WSP	cod	98	431.67	371.40	306.75	176.00	524.75	58.00	1,927.00	176 – 524.75
WSP	ecoli	98	4,774,150.31	40,070,278.72	60,000.00	6275.00	205000.00	0.00	395,000,000.00	6275 – 205000

Check data distributions (normal/non-normal)

Data distributions (original values)

Code

# Simple min/max for each parameter
min(effluent_clean$bod,   na.rm = TRUE)

[1] 3

Code

max(effluent_clean$bod,   na.rm = TRUE)

[1] 4554

Code

min(effluent_clean$cod,   na.rm = TRUE)

[1] 17.4

Code

max(effluent_clean$cod,   na.rm = TRUE)

[1] 17438

Code

min(effluent_clean$ecoli, na.rm = TRUE)

[1] 0

Code

max(effluent_clean$ecoli, na.rm = TRUE)

[1] 3.95e+08

Code

effluent_long <- effluent_clean |>
  pivot_longer(
    cols = c(bod, cod, ecoli),
    names_to = "parameter",
    values_to = "value"
  )

ggplot(effluent_long, aes(x = value)) +
  geom_histogram(bins = 30, color = "black", fill = "steelblue") +
  facet_wrap(~ parameter, scales = "free_x") +
  labs(
    title = "Distribution of Effluent Parameters",
    x = "Value",
    y = "Count"
  )

Because the distributions are heavily right-skewed and dominated by a few extreme high values, the data clearly deviate from normality. This supports the use of log‑transformed values, geometric means, and non‑parametric statistical tests in the subsequent analyses, rather than assuming normally distributed residuals or relying on arithmetic mean.

Code

shapiro.test(effluent_clean$bod)


    Shapiro-Wilk normality test

data:  effluent_clean$bod
W = 0.65982, p-value < 2.2e-16

Code

shapiro.test(effluent_clean$cod)


    Shapiro-Wilk normality test

data:  effluent_clean$cod
W = 0.51361, p-value < 2.2e-16

Code

shapiro.test(effluent_clean$ecoli)


    Shapiro-Wilk normality test

data:  effluent_clean$ecoli
W = 0.029727, p-value < 2.2e-16

Shapiro–Wilk tests provided strong evidence against normality for all three parameters (BOD: W = 0.66, COD: W = 0.51, E. coli: W = 0.03; p < 0.001 for each).

Data distributions (Log transformed values)

Code

log_conversion_table <- effluent_clean |> 
  mutate(
    log_bod = log10(bod),
    log_cod = log10(cod),
    log_ecoli = log10(ecoli + 1)
  )

Code

# Long format for log10 values
effluent_long_log <- log_conversion_table |>
  pivot_longer(
    cols = c(log_bod, log_cod, log_ecoli),
    names_to = "parameter",
    values_to = "value"
  )

ggplot(effluent_long_log, aes(x = value)) +
  geom_histogram(bins = 30, color = "black", fill = "steelblue") +
  facet_wrap(~ parameter, scales = "free_x") +
  labs(
    title = "Distribution of log10-transformed effluent values",
    x = "log10(value)",
    y = "Count"
  )

After applying a base‑10 logarithmic transformation (log10) to BOD, COD, and E. coli, the histograms of the transformed values (Figure Y) were much more symmetric and bell‑shaped compared with the raw data. For BOD and COD, the log‑transformed distributions are centred around log10 values of roughly 2–3, with noticeably reduced skewness and more balanced tails. For E. coli, most non‑zero observations fall between log10 values of about 4 and 6, with a separate bar at 0 representing samples below 1 CFU/100 mL after adding 1 prior to transformation.

These patterns suggest that the log10‑transformed data better satisfy the assumptions of many statistical methods and provide more stable measures of central tendency. Consequently, subsequent analyses use log‑transformed values and geometric means, rather than the raw effluent measurements, when comparing technologies and rounds.

Code

shapiro.test(log_conversion_table$log_bod)


    Shapiro-Wilk normality test

data:  log_conversion_table$log_bod
W = 0.9677, p-value = 2.038e-15

Code

shapiro.test(log_conversion_table$log_cod)


    Shapiro-Wilk normality test

data:  log_conversion_table$log_cod
W = 0.98132, p-value = 4.359e-11

Code

shapiro.test(log_conversion_table$log_ecoli)


    Shapiro-Wilk normality test

data:  log_conversion_table$log_ecoli
W = 0.81184, p-value < 2.2e-16

Effluent BOD, COD, and E. coli were strongly right‑skewed in the original scale, and although log10 transformation improved symmetry, all three parameters remained statistically non‑normal (Shapiro–Wilk p < 0.001). Therefore, non-parametric tests (Kruskal–Wallis with Dunn post-hoc) were used for inter-technology comparisons.

Results:

Plant-level geometric means (for technology comparison)

Code

log_means_table <- log_conversion_table |> 
  group_by(fstp_id, fstp_type_short, fstp_type_long) |>
  summarise(
    log_mean_bod   = mean(log_bod,   na.rm = TRUE),
    log_mean_cod   = mean(log_cod,   na.rm = TRUE),
    log_mean_ecoli = mean(log_ecoli, na.rm = TRUE),
    .groups = 'drop'
  )

geom_means_table <- log_means_table |>
  mutate(
    geom_mean_bod   = 10^log_mean_bod,
    geom_mean_cod   = 10^log_mean_cod,
    geom_mean_ecoli = 10^log_mean_ecoli
  )

Code

geom_means_long <- geom_means_table |>
  pivot_longer(
    cols = c(geom_mean_bod, geom_mean_cod, geom_mean_ecoli),
    names_to = "parameter",
    values_to = "geom_mean"
  )

Code

geom_means_long2 <- geom_means_long |>
  mutate(parameter_label = case_when(
    parameter == "geom_mean_bod"   ~ "A) BOD",
    parameter == "geom_mean_cod"   ~ "B) COD",
    parameter == "geom_mean_ecoli" ~ "C) E. coli",
    TRUE                           ~ parameter
  ))

ggplot(geom_means_long2,
       aes(x = fstp_type_short, y = geom_mean)) +
  geom_boxplot(outlier.shape = NA) +
  geom_jitter(width = 0.2, alpha = 0.5) +
  facet_wrap(~ parameter_label, scales = "free_y") +
  labs(
    title = "Plant-level geometric mean effluent by technology",
    x = "Technology",
    y = "Geometric mean (original units)"
  )

Figure 3: Plant-level geometric mean effluent by technology

BOD geometric means

Figure 3 (A) shows that across technologies most plants have geometric mean BOD between roughly 100 and 300 mg/L, but the central tendency and spread differ. LSP and UFF show the highest medians and wide interquartile ranges, indicating higher and more variable BOD in their effluents. In contrast, WSP has the lowest median BOD and a relatively narrow IQR, showing better and more consistent organic matter removal than the other technologies.

COD geometric means

Figure 3 (B) shows a similar pattern. SSU and UFF have higher median geometric mean COD and broader IQR, indicating poorer and more variable performance. ABR and DEWATS lie in the middle, while WSP again has the lowest median COD with a tighter distribution, reinforcing its relatively strong performance for organic parameters.

E. coli geometric means

For E. coli Figure 3 (C), ABR and WSP have similar, relatively low medians, and their IQRs are lower than those of LSP, SSU, and UFF. DEWATS also has the lowest median compared to other technologies. LSP and SSU show largea geometric mean E. coli, indicating weaker and more variable microbial reduction.

Overall, these plots indicate that WSP performs best for BOD and COD, DEWATS and WSP are relatively favourable for E. coli. LSP, SSU and UFF are generally the least effective and most variable across all three indicators.

Non‑parametric test

Code

kruskal.test(geom_mean_bod ~ fstp_type_short,
             data = geom_means_table)


    Kruskal-Wallis rank sum test

data:  geom_mean_bod by fstp_type_short
Kruskal-Wallis chi-squared = 17.301, df = 5, p-value = 0.003963

Code

kruskal.test(geom_mean_cod ~ fstp_type_short,
             data = geom_means_table)


    Kruskal-Wallis rank sum test

data:  geom_mean_cod by fstp_type_short
Kruskal-Wallis chi-squared = 17.782, df = 5, p-value = 0.003233

Code

kruskal.test(geom_mean_ecoli ~ fstp_type_short,
             data = geom_means_table)


    Kruskal-Wallis rank sum test

data:  geom_mean_ecoli by fstp_type_short
Kruskal-Wallis chi-squared = 42.67, df = 5, p-value = 4.31e-08

Non‑parametric Kruskal–Wallis tests indicated statistically significant differences in plant‑level geometric mean effluent across technologies for BOD, COD, and E. coli (BOD: χ²(5) = 17.3, p = 0.004; COD: χ²(5) = 17.8, p = 0.003; E. coli: χ²(5) = 42.7, p < 0.001). These results confirm that at least one technology differs from the others in its typical effluent quality for each parameter. Pairwise Dunn post‑hoc tests were therefore used to identify which technologies differed significantly from each other.

Code

dunn_bod <- dunnTest(
  geom_mean_bod ~ fstp_type_short,
  data = geom_means_table,
  method = "bh" # Benjamini-Hochberg for multiple comparisons
)

dunn_bod

     Comparison          Z      P.unadj       P.adj
1  ABR - DEWATS -0.7312004 0.4646567431 0.536142396
2     ABR - LSP -2.4237398 0.0153616076 0.046084823
3  DEWATS - LSP -1.6091924 0.1075742717 0.230516296
4     ABR - SSU -1.5809986 0.1138783634 0.213521931
5  DEWATS - SSU -0.7403527 0.4590860162 0.573857520
6     LSP - SSU  0.9606806 0.3367127849 0.459153798
7     ABR - UFF -1.3586064 0.1742713332 0.290452222
8  DEWATS - UFF -0.4634691 0.6430281871 0.688958772
9     LSP - UFF  1.3290096 0.1838447956 0.275767193
10    SSU - UFF  0.3453412 0.7298378442 0.729837844
11    ABR - WSP  2.1413557 0.0322453658 0.080613414
12 DEWATS - WSP  2.5396835 0.0110952819 0.041607307
13    LSP - WSP  3.7557023 0.0001728561 0.002592841
14    SSU - WSP  3.1811014 0.0014671626 0.011003719
15    UFF - WSP  3.0743170 0.0021098511 0.010549256

For plant‑level geometric mean BOD, Dunn post‑hoc tests showed that WSP plants had significantly lower effluent BOD than DEWATS, LSP, SSU, and UFF (adjusted p < 0.05). The difference between ABR and LSP was borderline significant. Other pairs do not differ significantly.

Code

dunn_cod <- dunnTest(
  geom_mean_cod ~ fstp_type_short,
  data = geom_means_table,
  method = "bh"
)

dunn_cod

     Comparison          Z      P.unadj       P.adj
1  ABR - DEWATS -0.6856366 0.4929423227 0.616177903
2     ABR - LSP -1.5828690 0.1134513271 0.243109987
3  DEWATS - LSP -0.8773875 0.3802762257 0.570414338
4     ABR - SSU -2.0196592 0.0434187550 0.108546888
5  DEWATS - SSU -1.1751378 0.2399395719 0.399899287
6     LSP - SSU -0.1878678 0.8509802868 0.850980287
7     ABR - UFF -1.3664313 0.1718036230 0.322131793
8  DEWATS - UFF -0.5156981 0.6060653042 0.699306120
9     LSP - UFF  0.4849079 0.6277416960 0.672580389
10    SSU - UFF  0.7749077 0.4383941789 0.597810244
11    ABR - WSP  2.4371642 0.0148029575 0.044408873
12 DEWATS - WSP  2.7827351 0.0053902789 0.020213546
13    LSP - WSP  3.3740094 0.0007408185 0.003704092
14    SSU - WSP  3.7802005 0.0001567021 0.002350532
15    UFF - WSP  3.3747376 0.0007388611 0.005541459

For COD, WSP had significantly lower plant‑level geometric mean effluent than all other technologies (ABR, DEWATS, LSP, SSU, UFF; adjusted p < 0.05), whereas no other technology pairs differed significantly.

Code

dunn_ecoli <- dunnTest(
  geom_mean_ecoli ~ fstp_type_short,
  data = geom_means_table,
  method = "bh"
)

dunn_ecoli

     Comparison          Z      P.unadj        P.adj
1  ABR - DEWATS  4.1587607 3.199788e-05 1.199921e-04
2     ABR - LSP -1.8112285 7.010549e-02 1.752637e-01
3  DEWATS - LSP -5.1257489 2.963574e-07 2.222680e-06
4     ABR - SSU -1.3576098 1.745875e-01 3.273516e-01
5  DEWATS - SSU -5.0549196 4.305720e-07 2.152860e-06
6     LSP - SSU  0.5724273 5.670325e-01 8.505488e-01
7     ABR - UFF -1.7753939 7.583288e-02 1.624990e-01
8  DEWATS - UFF -5.6977791 1.213782e-08 1.820673e-07
9     LSP - UFF  0.3855384 6.998386e-01 8.747982e-01
10    SSU - UFF -0.2538562 7.996066e-01 9.226231e-01
11    ABR - WSP -1.0118784 3.115962e-01 5.193270e-01
12 DEWATS - WSP -4.0041726 6.223493e-05 1.867048e-04
13    LSP - WSP  0.4898176 6.242630e-01 8.512677e-01
14    SSU - WSP  0.0224240 9.821097e-01 9.821097e-01
15    UFF - WSP  0.2175253 8.277990e-01 8.869275e-01

For E. coli, DEWATS plants had significantly lower plant‑level geometric mean effluent E. coli than all other technologies (ABR, LSP, SSU, UFF, WSP; adjusted p < 0.001). No statistically significant differences were detected among the remaining technologies after multiple‑comparison adjustment.

Trends over rounds-Geom-mean (technology-level)

Code

tech_round_geom <- log_conversion_table |>
  group_by(fstp_type_short, data_round) |>
  summarise(
    log_mean_bod   = mean(log_bod,   na.rm = TRUE),
    log_mean_cod   = mean(log_cod,   na.rm = TRUE),
    log_mean_ecoli = mean(log_ecoli, na.rm = TRUE),
    .groups = "drop"
  ) |>
  mutate(
    geom_mean_bod   = 10^log_mean_bod,
    geom_mean_cod   = 10^log_mean_cod,
    geom_mean_ecoli = 10^log_mean_ecoli,
    data_round      = factor(data_round, levels = sort(unique(data_round)))
  )

Code

ggplot(tech_round_geom,
       aes(x = data_round,
           y = geom_mean_bod,
           color = fstp_type_short,
           shape = fstp_type_short,
           group = fstp_type_short)) +
  geom_line() +
  geom_point(size = 2) +
  labs(
    title = "Technology-level geometric mean BOD by round",
    x = "Monitoring round",
    y = "Geometric mean BOD (mg/L)",
    color = "Technology",
    shape = "Technology"
  )

Figure 4: Technology-level geometric mean BOD by monitoring round

Figure 4 WSP consistently had the lowest BOD, starting around 125 mg/L in R and remaining below about 150 mg/L from R3 - R8, with only a modest increase in the later rounds. ABR also showed relatively stable and moderate BOD (below 200 mg/L) from R1-R5 R8 and close to 220 mg/L in R6-R7, without the significant peaks seen for other technologies.

In contrast, LSP, SSU, UFF, and to a lesser extent DEWATS exhibited larger fluctuations and higher peaks. UFF, SSU and DEWATS reached BOD values near or above 300 mg/L in R7. These patterns suggest that WSP, and to some extent ABR, provided more consistently low BOD over the monitoring period, while LSP, SSU, UFF, and DEWATS showed more variable and occasionally much poorer organic matter removal.

Code

ggplot(tech_round_geom,
       aes(x = data_round,
           y = geom_mean_cod,
           color = fstp_type_short,
           shape = fstp_type_short,
           group = fstp_type_short)) +
  geom_line() +
  geom_point(size = 2) +
  labs(
    title = "Technology-level geometric mean COD by round",
    x = "Monitoring round",
    y = "Geometric mean COD (mg/L)",
    color = "Technology",
    shape = "Technology"
  )

Figure 5: Technology-level geometric mean COD by monitoring round

Figure 5 WSP consistently had the lowest COD values, starting around 350–400 mg/L and remaining below about 450 mg/L through rounds R1–R8. ABR and DEWATS were intermediate, with COD typically between ~350 and 600 mg/L, while UFF showed a similar pattern but with higher peaks in the later rounds.
LSP and SSU exhibited the highest and most variable COD, particularly towards the end of the monitoring period. LSP increased from around 400 mg/L in R4 to nearly 1,000 mg/L by R8, and SSU rose from about 350 mg/L in R4 to similarly high values in R7–R8. These trends mirror the BOD results, suggesting that WSP performs best and most consistently for organic matter removal, whereas LSP and SSU often have much poorer COD effluent quality, especially in later rounds.

Code

ggplot(tech_round_geom,
       aes(x = data_round,
           y = geom_mean_ecoli,
           color = fstp_type_short,
           shape = fstp_type_short,
           group = fstp_type_short)) +
  geom_line() +
  geom_point(size = 2) +
  scale_y_log10() +
  labs(
    title = "Technology-level geometric mean E. coli by round",
    x = "Monitoring round",
    y = "Geometric mean E. coli (CFU/100 mL, log scale)",
    color = "Technology",
    shape = "Technology"
  )

Figure 6: Technology-level geometric mean E. coli by monitoring round.

Figure 6 Geometric mean effluent E. coli declined over time for most technologies, particularly after round R5. DEWATS started with relatively low E. coli and showed the steepest decline, reaching the lowest geometric means by R6–R8. ABR, LSP, SSU, and UFF began at higher levels but all showed marked reductions from R5 onwards, converging to geometric means around or below 10²–10³ CFU/100 mL in the last rounds.
WSP displayed somewhat different behaviour: E. coli decreased between R1 and R7 but increased again in R8, indicating less consistent microbial removal late in the monitoring period. Overall, these trends suggest improved E. coli reduction performance over time for most technologies, with DEWATS performing best and WSP showing more variable behaviour despite good organic removal.

Microbial risk: proportion of fstp with high effluent E. coli only

Still thinking how to represent this in better way

Code

plant_camp <- effluent_clean |>
  dplyr::distinct(fstp_id, camp_id)

Code

geom_means_with_camp <- geom_means_table |>
  dplyr::left_join(plant_camp, by = "fstp_id")

Code

plants_flag <- geom_means_with_camp |>
  dplyr::mutate(
    high_risk_plant = geom_mean_ecoli > 1e4
  )

Code

camp_risk_simple <- plants_flag |>
  dplyr::group_by(camp_id) |>
  dplyr::summarise(
    n_plants       = dplyr::n(),
    n_high_risk    = sum(high_risk_plant, na.rm = TRUE),
    prop_high_risk = n_high_risk / n_plants,
    .groups = "drop"
  )

Code

camp_risk_simple <- camp_risk_simple |>
  dplyr::mutate(
    camp_id = reorder(camp_id, -prop_high_risk)
  )

ggplot(camp_risk_simple,
       aes(x = 1, y = camp_id, fill = prop_high_risk)) +
  geom_tile(color = "grey70") +
  geom_text(aes(label = paste0(n_high_risk, "/", n_plants)),
            color = "black", size = 3) +
  scale_fill_gradient(
    low  = "#FFE5E5",
    high = "#8B0000"
  ) +
  labs(
    title = "Camp-wise share of high-E. coli plants (sorted by risk)",
    x = NULL,
    y = "Camp",
    fill = "Proportion of plants\ngeom. mean E. coli > 10^4"
  ) +
  theme_bw() +
  theme(
    axis.text.x  = element_blank(),
    axis.ticks.x = element_blank()
  )

Figure 7: Camp-wise share of high E. coli risk among FSTPs

Code

camp_tech_risk <- plants_flag |>
  group_by(camp_id, fstp_type_short) |>
  summarise(
    n_plants      = n(),
    n_high_risk   = sum(high_risk_plant, na.rm = TRUE),
    prop_high_risk = n_high_risk / n_plants,
    .groups = "drop"
  )

Code

camp_tech_risk <- camp_tech_risk |>
  group_by(fstp_type_short) |>
  arrange(desc(prop_high_risk), .by_group = TRUE) |>
  mutate(
    camp_tech_id = factor(
      paste(fstp_type_short, camp_id, sep = "__"),
      levels = paste(fstp_type_short, camp_id, sep = "__")
    )
  ) |>
  ungroup()

ggplot(camp_tech_risk,
       aes(x = 1, y = camp_tech_id, fill = prop_high_risk)) +
  geom_tile(color = "grey70") +
  geom_text(aes(label = paste0(n_high_risk, "/", n_plants)),
            color = "black", size = 3) +
  scale_y_discrete(
    labels = function(x) sub("^[^_]+__", "", x)  # strip "TECH__" so only camp name shows
  ) +
  scale_fill_gradient(
    low  = "#FFE5E5",
    high = "#8B0000"
  ) +
  facet_wrap(~ fstp_type_short, scales = "free_y") +
  labs(
    title = "Camp-wise share of high-E. coli plants by technology",
    x = NULL,
    y = "Camp (sorted high to low risk within each technology)",
    fill = "Proportion of plants\ngeom. mean E. coli > 10^4"
  ) +
  theme_bw() +
  theme(
    axis.text.x  = element_blank(),
    axis.ticks.x = element_blank()
  )

Figure 8: Camp-wise share of high-E. coli plants by technology

Sanky (used AI) need to change the load_band case_when for e coli load based on literature

Code

library(networkD3)

## 0) CHECK INPUT -----------------------------------------------------------
# geom_means_with_camp should have:
# fstp_id, fstp_type_short, camp_id, geom_mean_ecoli

## 1) Plant-level E. coli risk bands ---------------------------------------

plants_risk_bands <- geom_means_with_camp |>
  mutate(
    ecoli_band = case_when(
      geom_mean_ecoli <= 1e3  ~ "≤ 10^3 (compliant)",
      geom_mean_ecoli <= 1e4  ~ "> 10^3–10^4",
      geom_mean_ecoli <= 1e5  ~ "> 10^4–10^5",
      TRUE                    ~ "> 10^5"
    ),
    ecoli_band = factor(
      ecoli_band,
      levels = c("≤ 10^3 (compliant)",
                 "> 10^3–10^4",
                 "> 10^4–10^5",
                 "> 10^5")
    )
  )

## 2) Camp-level total E. coli load and load bands -------------------------

camp_load <- plants_risk_bands |>
  group_by(camp_id) |>
  summarise(
    total_geom_ecoli = sum(geom_mean_ecoli, na.rm = TRUE),
    .groups = "drop"
  )

# Choose load bands based on your range (edit thresholds if needed)
camp_load <- camp_load |>
  mutate(
    load_band = case_when(
      total_geom_ecoli <= 1e4        ~ "Low load (≤ 10^4)",
      total_geom_ecoli <= 1e5        ~ "Moderate load (10^4–10^5)",
      total_geom_ecoli <= 5e5        ~ "High load (10^5–5×10^5)",
      TRUE                           ~ "Very high load (> 5×10^5)"
    ),
    load_band = factor(
      load_band,
      levels = c("Low load (≤ 10^4)",
                 "Moderate load (10^4–10^5)",
                 "High load (10^5–5×10^5)",
                 "Very high load (> 5×10^5)")
    )
  )

## 3) Rank camps from low to high risk (for nicer ordering) ----------------

camp_risk_rank <- plants_risk_bands |>
  mutate(high_band = ecoli_band == "> 10^5") |>
  group_by(camp_id) |>
  summarise(
    n_plants    = n(),
    n_high_band = sum(high_band, na.rm = TRUE),
    prop_high   = n_high_band / n_plants,
    .groups = "drop"
  ) |>
  arrange(prop_high, n_high_band)   # low → high risk

camp_nodes <- camp_risk_rank$camp_id   # ordered camps

## 4) Build link tables for each layer -------------------------------------

# Technology -> plant risk band
tech_to_band <- plants_risk_bands |>
  count(fstp_type_short, ecoli_band, name = "value")

# Risk band -> camp
band_to_camp <- plants_risk_bands |>
  count(ecoli_band, camp_id, name = "value")

# Camp -> load band
camp_to_load <- camp_load |>
  count(camp_id, load_band, name = "value")

## 5) Build node list (technologies, bands, camps, load bands) -------------

tech_nodes <- sort(unique(plants_risk_bands$fstp_type_short))
band_nodes <- levels(plants_risk_bands$ecoli_band)
load_nodes <- levels(camp_load$load_band)

nodes <- data.frame(
  name = c(tech_nodes, band_nodes, camp_nodes, load_nodes),
  stringsAsFactors = FALSE
)

# helper for index
node_id <- function(x) match(x, nodes$name) - 1

## 6) Create combined links -------------------------------------------------

links_tech_band <- data.frame(
  source = node_id(tech_to_band$fstp_type_short),
  target = node_id(as.character(tech_to_band$ecoli_band)),
  value  = tech_to_band$value
)

links_band_camp <- data.frame(
  source = node_id(as.character(band_to_camp$ecoli_band)),
  target = node_id(band_to_camp$camp_id),
  value  = band_to_camp$value
)

links_camp_load <- data.frame(
  source = node_id(camp_to_load$camp_id),
  target = node_id(as.character(camp_to_load$load_band)),
  value  = camp_to_load$value
)

links <- rbind(links_tech_band, links_band_camp, links_camp_load)

## 7) Node groups for colouring by level -----------------------------------

nodes$group <- c(
  rep("Technology", length(tech_nodes)),
  rep("Band",       length(band_nodes)),
  rep("Camp",       length(camp_nodes)),
  rep("Load",       length(load_nodes))
)

## 8) Plot Sankey: Technology → risk band → camp → load band --------------

sankeyNetwork(
  Links     = links,
  Nodes     = nodes,
  Source    = "source",
  Target    = "target",
  Value     = "value",
  NodeID    = "name",
  NodeGroup = "group",
  fontSize  = 11,
  nodeWidth = 20
)

Figure 9: Sankey diagram of technology, plant-level E. coli risk bands, camps, and total camp E. coli load bands

Compliance

Code

compliance_long <- effluent_clean |>
  mutate(
    bod   = as.numeric(bod),
    cod   = as.numeric(cod),
    ecoli = as.numeric(ecoli)
  ) |>
  # make long by parameter
  tidyr::pivot_longer(
    cols = c(bod, cod, ecoli),
    names_to = "parameter",
    values_to = "value"
  ) |>
  mutate(
    parameter = recode(parameter,
                       bod   = "BOD",
                       cod   = "COD",
                       ecoli = "E. coli"),
    status = case_when(
      parameter == "BOD"   & value <= 30   ~ "Meets standard",
      parameter == "COD"   & value <= 125  ~ "Meets standard",
      parameter == "E. coli" & value <= 1000 ~ "Meets standard",
      TRUE ~ "Exceeds standard"
    )
  ) |>
  group_by(parameter, fstp_type_short, status) |>
  summarise(n = n(), .groups = "drop") |>
  group_by(parameter, fstp_type_short) |>
  mutate(pct = 100 * n / sum(n)) |>
  ungroup()

ggplot(compliance_long,
       aes(x = fstp_type_short, y = pct, fill = status)) +
  geom_col(position = "stack") +
  geom_text(
    aes(label = sprintf("%.1f%%", pct)),
    position = position_stack(vjust = 0.5),
    size = 3
  ) +
  facet_wrap(~ parameter, nrow = 1) +
  scale_y_continuous(labels = scales::percent_format(scale = 1)) +
  scale_fill_manual(
    values = c("Meets standard" = "#009E73",
               "Exceeds standard" = "#D55E00")
  ) +
  labs(
    title = "Percentage of samples meeting discharge standards by technology and parameter",
    x = "Technology",
    y = "Samples (%)",
    fill = "Status"
  ) +
  theme_bw()

Figure 10: Percentage of samples meeting discharge standards by technology and parameter

Conclusion:

Decentralized fecal sludge treatment in the Rohingya camps substantially reduces pollutant concentrations but not showed high variability over the years across all technologies.
None of the assessed technologies consistently achieve microbiologically safe or standards‑compliant effluent, especially for BOD and COD.
WSPs provided the highest BOD and COD reduction but only moderate E. coli removal, whereas DEWATS achieved the highest microbial reduction.
A subset of camps contributes disproportionately high E. coli loads.
These findings highlight the need for technology‑specific upgrades, strengthened operation and monitoring, and risk‑based prioritization of high‑burden camps to better protect public health in protracted humanitarian crises.

References:

Arnold, Jeffrey B. 2024. “Ggthemes: Extra Themes, Scales and Geoms for ’Ggplot2’.” https://jrnold.github.io/ggthemes/.

Dinno, Alexis. 2024. “Dunn.test: Dunn’s Test of Multiple Comparisons Using Rank Sums.”

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the Tidyverse” 4: 1686. https://doi.org/10.21105/joss.01686.