Quantitative Multi-Pathway Assessment of Infant Exposure to Escherichia coli in Rural Ethiopia

Author

Alazar Negash Horecha

Published

December 18, 2025

Introduction

Exposure to fecal contamination remains a major driver of enteric infections among infants and young children in low- and middle-income countries (LMICs) . Despite decades of investment in Water, Sanitation and Hygiene (WASH) interventions, evidence from large trials in Kenya and Zimbabwe shows limited improvements in growth or diarrheal outcomes, in part due to insufficient understanding of how and where infants acquire fecal microbes in their environments (Gough et al., 2020),(Null et al., 2018), (Rogawski McQuade et al., 2020). Emerging evidence illustrates that young children are exposed along diverse pathways, not only drinking water, but also soil ingestion, hand-to-mouth behavior, contaminated food, and contact with caregivers’ hands or areola during breastfeeding (Horecha & Mekonen, 2025). Yet, critical knowledge gaps remain, including how infant behavior changes over time, which pathways dominate exposure, and how environmental contamination correlates with such behavior. To address these gaps, this study systematically quantified infants’ exposure to Escherichia coli (E. coli), used as a fecal indicator organism, in rural Haramaya Woreda, Ethiopia. High-resolution, structured behavioral observations with extensive environmental sampling and a multi-pathway agent-based exposure model were integrated. This study aimed to characterize infant behaviors relevant to fecal exposure and how they evolve from early to late infancy, and quantify exposure to E. coli from multiple environmental and social pathways across two critical developmental stages .

library(readxl) 
library(tidyverse) 
library(MPN)

Methods

Between June 2021 and June 2022, 79 infants were enrolled from rural Haramaya Woreda, Ethiopia, and observed at two developmental stages: Timepoint 1: 4–8 months and Timepoint 2: 11–15 months Three infants were lost to follow-up before the second timepoint .

file <- "Final E.coli data 20221202.xlsx"
## Read and clean data
ecmug <- read_excel(file, sheet = "EC-MUG")
names(ecmug) <- c("sample_id","sample_type2","vol_collected","vol_tested","vol_tested_undiluted","dilution","num_reaction","conc_factor","num_well_flur")

meta.dat <- read_excel(file, sheet = "Metadata")

chromo <- read_excel(file, sheet = "Chromocult")

#merge
ecmug <- merge(ecmug,chromo,all=T)
ecmug <- merge(ecmug,meta.dat,all.x=T)

Three datasets were imported from an Excel file containing laboratory measurements: EC-MUG sheet: Fluorescence-based quantification, including wells positive, number of reactions, dilution factors, and sample metadata. Chromocult sheet: Plate-based E. coli colony counts. Metadata sheet: Sample-level identifiers and contextual information. Column names in the EC-MUG sheet were standardized, and all three datasets were merged into a unified dataset using sample IDs.

file <- "Final E.coli data 20221202.xlsx"
## Read and clean data
ecmug <- read_excel(file, sheet = "EC-MUG")
names(ecmug) <- c("sample_id","sample_type2","vol_collected","vol_tested","vol_tested_undiluted","dilution","num_reaction","conc_factor","num_well_flur")

meta.dat <- read_excel(file, sheet = "Metadata")

chromo <- read_excel(file, sheet = "Chromocult")

#merge
ecmug <- merge(ecmug,chromo,all=T)
ecmug <- merge(ecmug,meta.dat,all.x=T)# Extract information from sample_id;
id.nchar <- nchar(ecmug$sample_id)
ecmug$study_id <- NA
ecmug$source <- NA
ecmug$member <- NA
ecmug$timepoint <- NA
for (i in 1:length(id.nchar)){
  ecmug$study_id[i] <- substr(ecmug$sample_id[i],1,3)
  ecmug$source[i] <- substr(ecmug$sample_id[i],4,4)
  ecmug$member[i] <- substr(ecmug$sample_id[i],5,id.nchar[i]-2)
  ecmug$timepoint[i] <- substr(ecmug$sample_id[i],id.nchar[i]-1,id.nchar[i])
}

Using the extracted source and member codes, each record was assigned into one of several standardized sample types, including: Areola swab , Breastmilk , Bathing water , Drinking water , Food , Hand rinses (mother/child/sibling) , Soil, Fomite surfaces These categories define the unit of measurement (per mL, per swab, per gram, per pair of hands, etc.) and inform how concentrations are calculated.

# Create sample_type;
ecmug$sample_type <- NA
ecmug$sample_type[which(ecmug$source=="H" & ecmug$member=="A")] <- "areola swab"
ecmug$sample_type[which(ecmug$source=="E" & ecmug$member=="B")] <- "bathing water"
ecmug$sample_type[which(ecmug$source=="H" & ecmug$member=="B")] <- "breastmilk"
ecmug$sample_type[which(ecmug$source=="E" & ecmug$member=="D")] <- "drinking water"
ecmug$sample_type[which(ecmug$source=="E" & ecmug$member=="F")] <- "food"
ecmug$sample_type[which(ecmug$source=="H" & ecmug$member=="H")] <- "sibling handrinse"
ecmug$sample_type[which(ecmug$source=="H" & ecmug$member=="P")] <- "mother handrinse"
ecmug$sample_type[which(ecmug$source=="H" & ecmug$member=="R")] <- "child handrinse"
ecmug$sample_type[which(ecmug$source=="E" & ecmug$member=="S")] <- "soil"
# Check.
ecmug$sample_type[which(ecmug$member %in% c("V(1)","V(2)","V(3)"))] <- "fomite"

Across both timepoints, 1,338 environmental samples were collected and tested for E. coli, including : Infant, mother, and sibling hand rinses, Areola swabs, Breast milk, Food items (injera, cow milk, rice, biscuits, etc.), Fomites (cups, bottles, mobile phones, etc.) , Drinking water , Bathing water, Soil (collected only at Timepoint 2). E. coli concentrations were analyzed in log10 CFU using left-censored normal models; negative samples (below LLOD) were assigned LLOD values. See Table 1

Table 1

Sample sizes by sample type and timepoint

library(kableExtra)

table(ecmug$sample_type, ecmug$timepoint) |>
  kable() |>
  kable_styling(full_width = FALSE,
                bootstrap_options = c("striped", "hover"))

	01	02
areola swab	78	75
bathing water	24	21
breastmilk	51	69
child handrinse	79	76
drinking water	75	76
fomite	129	145
food	34	68
mother handrinse	79	76
sibling handrinse	64	62
soil	0	57

# calculate concentration of samples tested and then back calculate the concentration of raw samples with different units;
# areola swab (per swab) and fomite (per swab) 
# water samples (per mL)
# food (per g) 
# handrinse (per pair of hands).
#concentration factor NA means no dilution
ecmug$conc_factor[which(is.na(ecmug$conc_factor) & !is.na(ecmug$Concentration_factor))] <- ecmug$Concentration_factor[which(is.na(ecmug$conc_factor) & !is.na(ecmug$Concentration_factor))]
ecmug$conc_factor[which(ecmug$conc_factor==0)] <- 1

ecmug$conc <- NA

# bathing water, drinking water (per mL)
ecmug$conc[which(ecmug$sample_type %in% c("bathing water","drinking water"))] <- (ecmug$num_well_flur/(ecmug$num_reaction*180)*ecmug$conc_factor*1000)[which(ecmug$sample_type %in% c("bathing water","drinking water"))]
# handrinse (per pair of hands)
ecmug$conc[which(ecmug$sample_type %in% c("mother handrinse","child handrinse","sibling handrinse"))] <- (ecmug$num_well_flur/(ecmug$num_reaction*180)*ecmug$conc_factor*1000*200)[which(ecmug$sample_type %in% c("mother handrinse","child handrinse","sibling handrinse"))]
# swab (per swab); 
ecmug$conc[which(ecmug$sample_type %in% c("areola swab"))] <- (ecmug$num_well_flur/(ecmug$num_reaction*180)*ecmug$conc_factor*1000*3/4*5)[which(ecmug$sample_type %in% c("areola swab"))]
# fomite sponge (per sponge); 
ecmug$conc[which(ecmug$sample_type %in% c("fomite"))] <- (ecmug$num_well_flur/(ecmug$num_reaction*180)*ecmug$conc_factor*1000*3/9*10)[which(ecmug$sample_type %in% c("fomite"))]
# food (per gram)
ecmug$conc[which(ecmug$sample_type %in% c("food"))] <- (ecmug$num_well_flur/(ecmug$num_reaction*180)*ecmug$conc_factor*1000*3/8*9)[which(ecmug$sample_type %in% c("food"))]
# breastmilk (per mL)
ecmug$conc[which(ecmug$sample_type %in% c("breastmilk"))] <- (ecmug$Ecoli_colony_counted/0.2*ecmug$conc_factor)[which(ecmug$sample_type %in% c("breastmilk"))]
# soil surface (per surface): 
ecmug$conc[which(ecmug$sample_type %in% c("soil"))] <- (ecmug$Ecoli_colony_counted/0.2*ecmug$conc_factor*15)[which(ecmug$sample_type %in% c("soil"))]
# for food tested using chromo; food (per gram);
ecmug$conc[which(ecmug$sample_type %in% c("food") & is.na(ecmug$conc))] <- (ecmug$Ecoli_colony_counted/0.2*ecmug$conc_factor*3/8*9)[which(ecmug$sample_type %in% c("food") & is.na(ecmug$conc))]

# Replace samples with more than 1 well tested using MPN calculation;

ecmug_sub <- ecmug[which(ecmug$num_reaction>1),]

mpn_res <- rep(NA, nrow(ecmug_sub))

excam_dat <- data.frame(
  positive = ecmug_sub$num_well_flur,
  tubes = ecmug_sub$num_reaction,
  amount = rep(180,length(ecmug_sub$vol_tested_undiluted))
)

for(i in 1:nrow(excam_dat)) {
  result = mpn(positive = excam_dat[i, 1],
               tubes = excam_dat[i, 2],
               amount = excam_dat[i, 3])
  mpn_res[i] = result$MPN
}

mpn_res[which(ecmug_sub$sample_type %in% c("bathing water","drinking water"))] = mpn_res[which(ecmug_sub$sample_type %in% c("bathing water","drinking water"))]*1000
mpn_res[which(ecmug_sub$sample_type %in% c("mother handrinse","child handrinse","sibling handrinse"))] = mpn_res[which(ecmug_sub$sample_type %in% c("mother handrinse","child handrinse","sibling handrinse"))]*1000*200
mpn_res[which(ecmug_sub$sample_type %in% c("areola swab"))] = mpn_res[which(ecmug_sub$sample_type %in% c("areola swab"))]*1000*3/4*5
mpn_res[which(ecmug_sub$sample_type %in% c("fomite"))] = mpn_res[which(ecmug_sub$sample_type %in% c("fomite"))]*1000*3/9*10
mpn_res[which(ecmug_sub$sample_type %in% c("food"))] = mpn_res[which(ecmug_sub$sample_type %in% c("food"))]*1000*3/8*9

mpn_res[which(mpn_res==Inf)] <- ecmug_sub$conc[which(mpn_res==Inf)]

ecmug$conc[which(ecmug$num_reaction>1)] <- mpn_res

For each sample type, censored lognormal models are fitted using: fitdistrplus, Maximum likelihood estimation and Censoring at the LOD. Parameters (mean and SD on the log scale) and covariance matrices are saved for use in simulation and downstream modeling. Simulated datasets (10,000 draws) are created for comparison, and diagnostic plots (histograms, QQ plots) are generated. Separate parameter sets are produced for: Timepoint 1 and Timepoint 2.

#Limit of detection. 
# "areola swab"       "bathing water"     "breastmilk"        "child handrinse"  
# "drinking water"    "fomite"            "food"              "mother handrinse" 
# "sibling handrinse" "soil"
# using 1 wells or 1 count as LOD for each sample type.
list.LOD <- c(25/6,25/396,5,1250/99,
              25/396,100/27,3.75,1250/99,
              1250/99,75)


library(fitdistrplus)
library(MASS)

list_sample_type <- sort(unique(ecmug$sample_type))
n.iter <- 10000

env.paras <- list()

pdf(file="/cloud/project/docs/processed/plots/est_conc_082323.pdf",width=8,height=8)

From Figure 1 and Figure 2 , we can see that for the areola swab samples the distribution is highly skewed, with most observations near zero and a few larger values meaning a few higher samples deviate from the model, suggesting occasional contamination spikes.

The histograms at Figure 3 and Figure 4 for bathing water amples show a distribution that is moderately right-skewed, with most observed concentrations falling near or slightly below log10 ≈ 0 (i.e., around 1 CFU/mL) and a tail of higher values.

From Figure 5 and Figure 6 , we can see that, for the breast milk samples, the \(\text{log}_{10}\) concentration is successfully modeled by a normal distribution using methods for left-censored data (non-detects) at a detection limit of \(\text{log}_{10}(\text{conc}) \approx -1.0\).

From Figure 7 and Figure 8 , the child hand rinse samples data, when viewed on a log10 scale, is expected to follow a normal distribution, but the observed data is significantly affected by a detection limit that groups all lower concentrations into a single highly frequent value.

From Figure 9 and Figure 10, the drinking water sample data is approximately log-normally distributed but is heavily impacted by a left-censoring detection limit at \(\text{log}_{10}(\text{conc}) \approx -1.0\).

From Figure 11 and Figure 12 , we can see that, the concentration on fomites is successfully modeled by a log-normal distribution using methods appropriate for left-censored data (non-detects). The detection limit for these samples is approximately \(\text{log}_{10}(\text{conc}) = 0.5\).

From Figure 13 and Figure 14 , we can see that, the concentration in food samples is successfully modeled by a log-normal distribution using methods for left-censored data (non-detects). The effective detection limit for these samples appears to be around \(\text{log}_{10}(\text{conc}) = 0.5\).

From Figure 15 and Figure 16 , we can see that, he contaminant concentration on mother hand rinse samples is successfully modeled by a log-normal distribution using methods for left-censored data (non-detects) at an effective detection limit of \(\text{log}_{10}(\text{conc}) \approx 1.0\).

From Figure 17 and Figure 18 , we can see that, the contaminant concentration on sibling hand rinse samples is successfully modeled by a log-normal distribution using methods for left-censored data (non-detects) at an effective detection limit of \(\text{log}_{10}(\text{conc}) \approx 1.0\) to \(1.5\).

From Figure 19 and Figure 20 , we can see that, the contaminant concentration in soil samples is successfully modeled by a log-normal distribution, with the mean \(\text{log}_{10}\) concentration being the highest among all samples analyzed (around 5.0). This dataset appears to have minimal or no significant censoring (non-detects), allowing the observed distribution to closely reflect the underlying log-normal distribution.

for (i in 1:length(list_sample_type)){
  temp <- data.frame(cbind(ecmug$conc[which(ecmug$sample_type==list_sample_type[i])],
                ecmug$conc[which(ecmug$sample_type==list_sample_type[i])]))
  names(temp) <- c("left","right")
  temp$left <- replace(temp$left,temp$left==0,NA)
  temp$right <- replace(temp$right,temp$right==0,list.LOD[i])
  fit_cens <- fitdistcens(log10(temp),"norm")
  env.paras[[i]] <- list(list_sample_type[i],summary(fit_cens)$estimate,summary(fit_cens)$vcov)
  para <- mvrnorm(n.iter,summary(fit_cens)$estimate,summary(fit_cens)$vcov)
  conc <- 10^(rnorm(n.iter,mean = para[,1], sd = para[,2]))
  conc_cens <- conc
  conc_cens[which(conc<list.LOD[i])] <- list.LOD[i]
  par(mfrow=c(3,1))
  hist(log10(conc),freq = F,main=paste0(list_sample_type[i],", Simulated"),xlim=c(-10,10),breaks=seq(-10,10,by=0.5))
  hist(log10(conc_cens),freq = F,main="Simulated Censored",xlim=c(-10,10),breaks=seq(-10,10,by=0.5))
  hist(log10(temp$right),freq = F,main="Observed",xlim=c(-10,10),breaks=seq(-10,10,by=0.5))
  par(mfrow=c(1,1))
  qqplot(log10(conc_cens),log10(temp$right),main=list_sample_type[i])
  abline(coef = c(0, 1))
}
dev.off()

pdf 
  3

save(env.paras,file="/cloud/project/docs/processed/res/env.paras_080823.rda")

Figure 1: Histograms of simulated, censored, and observed concentrations for each sample type.

Figure 2: Histograms of simulated, censored, and observed concentrations for each sample type.

Figure 3: Histograms of simulated, censored, and observed concentrations for each sample type.

Figure 4: Histograms of simulated, censored, and observed concentrations for each sample type.

Figure 5: Histograms of simulated, censored, and observed concentrations for each sample type.

Figure 6: Histograms of simulated, censored, and observed concentrations for each sample type.

Figure 7: Histograms of simulated, censored, and observed concentrations for each sample type.

Figure 8: Histograms of simulated, censored, and observed concentrations for each sample type.

Figure 9: Histograms of simulated, censored, and observed concentrations for each sample type.

Figure 10: Histograms of simulated, censored, and observed concentrations for each sample type.

Figure 11: Histograms of simulated, censored, and observed concentrations for each sample type.

Figure 12: Histograms of simulated, censored, and observed concentrations for each sample type.

Figure 13: Histograms of simulated, censored, and observed concentrations for each sample type.

Figure 14: Histograms of simulated, censored, and observed concentrations for each sample type.

Figure 15: Histograms of simulated, censored, and observed concentrations for each sample type.

Figure 16: Histograms of simulated, censored, and observed concentrations for each sample type.

Figure 17: Histograms of simulated, censored, and observed concentrations for each sample type.

Figure 18: Histograms of simulated, censored, and observed concentrations for each sample type.

Figure 19: Histograms of simulated, censored, and observed concentrations for each sample type.

Figure 20: Histograms of simulated, censored, and observed concentrations for each sample type.

Results

Across sample types, E. coli contamination was widespread. The Food samples showed high concentrations at both timepoints (mean log10 2.51 - 3.35 CFU/g) , The Fomites samples showed increased contamination at Timepoint 2 (mean log10 2.07 - 3.55 CFU/sample) , The Soil samples have extremely high contamination at Timepoint 2 (mean log10 5.94 CFU/sample), The Breast milk samples are 21.6% positive at Timepoint 1; contamination likely originating from the areola (16–19% positive) rather than the milk itself and Infant hands samples have higher positivity and concentrations at Timepoint 2 correlating with mother’s hands (ρ=0.39), siblings’ hands (ρ=0.35), areola (ρ=0.28), and fomites (ρ=0.29).

#percent of positive
table(ecmug$sample_type[which(ecmug$conc>0)],ecmug$timepoint[which(ecmug$conc>0)])/table(ecmug$sample_type,ecmug$timepoint)

                   
                            01         02
  areola swab       0.16666667 0.18666667
  bathing water     0.91666667 0.80952381
  breastmilk        0.21568627 0.02898551
  child handrinse   0.51898734 0.77631579
  drinking water    0.77333333 0.51315789
  fomite            0.63565891 0.56551724
  food              0.35294118 0.25000000
  mother handrinse  0.81012658 0.84210526
  sibling handrinse 0.89062500 0.85483871
  soil                         0.98245614

table(ecmug$timepoint[which(ecmug$conc>0)])/table(ecmug$timepoint)


       01        02 
0.5872757 0.5558621

#mean
aggregate(data=ecmug[which(ecmug$conc>0),],conc ~ timepoint + sample_type,FUN=mean,na.rm=T)

   timepoint       sample_type         conc
1         01       areola swab 1.288554e+01
2         02       areola swab 1.612028e+02
3         01     bathing water 2.535479e+02
4         02     bathing water 5.154619e+00
5         01        breastmilk 8.863636e+01
6         02        breastmilk 2.250000e+01
7         01   child handrinse 4.204252e+02
8         02   child handrinse 4.248072e+03
9         01    drinking water 1.181066e+00
10        02    drinking water 4.759671e-01
11        01            fomite 1.182663e+02
12        02            fomite 3.562254e+03
13        01              food 3.264156e+02
14        02              food 2.217984e+03
15        01  mother handrinse 4.412484e+03
16        02  mother handrinse 1.798970e+04
17        01 sibling handrinse 1.951779e+05
18        02 sibling handrinse 2.131045e+04
19        02              soil 8.675183e+05

Across all environmental and biological sample types, the script summarizes the number of samples collected at each survey timepoint and quantifies the proportion of samples with detectable E. coli concentrations. Sample counts varied across matrices, but all types were represented in both survey rounds. Consistent with expectations for household environments, several matrices such as breastmilk and areola swabs, were dominated by non-detects, while bathing water, drinking water, and handrinse samples displayed higher detection frequencies.

Mean concentrations were calculated among positive samples only, providing a measure of typical contamination levels when E. coli was present. Study-level averages were also computed by reshaping the dataset into a wide format, allowing assessment of within-household contamination patterns for each matrix at each timepoint.

To investigate changes in contamination between the two survey rounds, two complementary comparisons were conducted for each sample type.

Two-proportion tests compared the fraction of samples exceeding the detection limit at Timepoint 1 versus Timepoint 2. These tests indicate whether the likelihood of detecting E. coli changed across rounds. Some matrices showed stable detection patterns across timepoints, while others exhibited increases or decreases, suggesting shifts in environmental or behavioral conditions.

For samples with detectable E. coli, log₁₀-transformed concentrations were compared between rounds using Welch’s t-tests. These tests evaluate whether contamination intensity (conditional on being positive) differed across time. In several sample types, positive-sample concentrations remained similar between rounds; in others, mean concentrations changed, indicating temporal variation in contamination magnitude.

Together, the detection-frequency tests and concentration comparisons provide a complementary view of how both the presence and levels of contamination evolved across study visits.

library(reshape)

ecmug_data <- ecmug[,c("study_id","timepoint","sample_type","conc")]
ecmug_data1 <- ecmug_data %>% filter(timepoint=="01")
ecmug_data2 <- ecmug_data %>% filter(timepoint=="02")

ec.data1 <- cast(ecmug_data1[,-which(names(ecmug_data1)=="timepoint")], study_id ~ sample_type, mean)

ec.data <- cast(ecmug_data, timepoint + study_id ~ sample_type, mean, na.rm=T)

#two prop test;
for (i in 3:11){
  print(names(ec.data[i]))
  temp <- matrix(c(length(which(ec.data[,i]>0 & ec.data$timepoint=="01")),length(which(ec.data[,i]==0 & ec.data$timepoint=="01")),
                 length(which(ec.data[,i]>0 & ec.data$timepoint=="02")),length(which(ec.data[,i]==0 & ec.data$timepoint=="02"))),
                 nrow=2,byrow = T, dimnames = list(c("01", "02"), c("Pos", "Neg")))
  print(prop.test(temp))
}

[1] "areola swab"

    2-sample test for equality of proportions with continuity correction

data:  temp
X-squared = 0.01261, df = 1, p-value = 0.9106
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.1539755  0.1139755
sample estimates:
   prop 1    prop 2 
0.1666667 0.1866667 

[1] "bathing water"


    2-sample test for equality of proportions with continuity correction

data:  temp
X-squared = 0.37861, df = 1, p-value = 0.5384
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.1385802  0.3528659
sample estimates:
   prop 1    prop 2 
0.9166667 0.8095238 

[1] "breastmilk"

    2-sample test for equality of proportions with continuity correction

data:  temp
X-squared = 8.7374, df = 1, p-value = 0.003117
alternative hypothesis: two.sided
95 percent confidence interval:
 0.05003038 0.32337116
sample estimates:
    prop 1     prop 2 
0.21568627 0.02898551 

[1] "child handrinse"

    2-sample test for equality of proportions with continuity correction

data:  temp
X-squared = 10.108, df = 1, p-value = 0.001476
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.41486082 -0.09979607
sample estimates:
   prop 1    prop 2 
0.5189873 0.7763158 

[1] "drinking water"

    2-sample test for equality of proportions with continuity correction

data:  temp
X-squared = 10.019, df = 1, p-value = 0.001549
alternative hypothesis: two.sided
95 percent confidence interval:
 0.09994062 0.42041026
sample estimates:
   prop 1    prop 2 
0.7733333 0.5131579 

[1] "fomite"

    2-sample test for equality of proportions with continuity correction

data:  temp
X-squared = 1.3091, df = 1, p-value = 0.2526
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.05771855  0.24773905
sample estimates:
   prop 1    prop 2 
0.7792208 0.6842105 

[1] "food"

    2-sample test for equality of proportions with continuity correction

data:  temp
X-squared = 0.41028, df = 1, p-value = 0.5218
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.1301179  0.2967846
sample estimates:
   prop 1    prop 2 
0.3333333 0.2500000 

[1] "mother handrinse"

    2-sample test for equality of proportions with continuity correction

data:  temp
X-squared = 0.097933, df = 1, p-value = 0.7543
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.1640525  0.1000951
sample estimates:
   prop 1    prop 2 
0.8101266 0.8421053 

[1] "sibling handrinse"

    2-sample test for equality of proportions with continuity correction

data:  temp
X-squared = 0.1126, df = 1, p-value = 0.7372
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.0964325  0.1680051
sample estimates:
   prop 1    prop 2 
0.8906250 0.8548387

#t.test;
for (i in 3:11){
  print(names(ec.data[i]))
  print(t.test(log10(ec.data[which(ec.data[,i]>0 & ec.data$timepoint=="01"),i]),log10(ec.data[which(ec.data[,i]>0 & ec.data$timepoint=="02"),i])))
}

[1] "areola swab"

    Welch Two Sample t-test

data:  log10(ec.data[which(ec.data[, i] > 0 & ec.data$timepoint == "01"), i]) and log10(ec.data[which(ec.data[, i] > 0 & ec.data$timepoint == "02"), i])
t = -0.8875, df = 20.011, p-value = 0.3854
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.6552791  0.2641036
sample estimates:
mean of x mean of y 
0.9402204 1.1358081 

[1] "bathing water"

    Welch Two Sample t-test

data:  log10(ec.data[which(ec.data[, i] > 0 & ec.data$timepoint == "01"), i]) and log10(ec.data[which(ec.data[, i] > 0 & ec.data$timepoint == "02"), i])
t = -0.31603, df = 36.994, p-value = 0.7538
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.7303371  0.5332517
sample estimates:
 mean of x  mean of y 
-0.2442870 -0.1457443 

[1] "breastmilk"

    Welch Two Sample t-test

data:  log10(ec.data[which(ec.data[, i] > 0 & ec.data$timepoint == "01"), i]) and log10(ec.data[which(ec.data[, i] > 0 & ec.data$timepoint == "02"), i])
t = 0.18451, df = 1.4625, p-value = 0.8759
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -3.009144  3.192586
sample estimates:
mean of x mean of y 
 1.242236  1.150515 

[1] "child handrinse"

    Welch Two Sample t-test

data:  log10(ec.data[which(ec.data[, i] > 0 & ec.data$timepoint == "01"), i]) and log10(ec.data[which(ec.data[, i] > 0 & ec.data$timepoint == "02"), i])
t = -3.0122, df = 97.532, p-value = 0.003304
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.7587544 -0.1560389
sample estimates:
mean of x mean of y 
 1.792643  2.250039 

[1] "drinking water"

    Welch Two Sample t-test

data:  log10(ec.data[which(ec.data[, i] > 0 & ec.data$timepoint == "01"), i]) and log10(ec.data[which(ec.data[, i] > 0 & ec.data$timepoint == "02"), i])
t = 1.7589, df = 91.149, p-value = 0.08194
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.02709644  0.44629027
sample estimates:
 mean of x  mean of y 
-0.5228181 -0.7324150 

[1] "fomite"

    Welch Two Sample t-test

data:  log10(ec.data[which(ec.data[, i] > 0 & ec.data$timepoint == "01"), i]) and log10(ec.data[which(ec.data[, i] > 0 & ec.data$timepoint == "02"), i])
t = -2.4596, df = 79.83, p-value = 0.01607
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.82849617 -0.08741157
sample estimates:
mean of x mean of y 
 1.184547  1.642501 

[1] "food"

    Welch Two Sample t-test

data:  log10(ec.data[which(ec.data[, i] > 0 & ec.data$timepoint == "01"), i]) and log10(ec.data[which(ec.data[, i] > 0 & ec.data$timepoint == "02"), i])
t = 0.86783, df = 25.087, p-value = 0.3937
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.4682432  1.1504217
sample estimates:
mean of x mean of y 
 1.680718  1.339628 

[1] "mother handrinse"

    Welch Two Sample t-test

data:  log10(ec.data[which(ec.data[, i] > 0 & ec.data$timepoint == "01"), i]) and log10(ec.data[which(ec.data[, i] > 0 & ec.data$timepoint == "02"), i])
t = 1.4495, df = 125.98, p-value = 0.1497
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.08068333  0.52243599
sample estimates:
mean of x mean of y 
 2.387111  2.166235 

[1] "sibling handrinse"

    Welch Two Sample t-test

data:  log10(ec.data[which(ec.data[, i] > 0 & ec.data$timepoint == "01"), i]) and log10(ec.data[which(ec.data[, i] > 0 & ec.data$timepoint == "02"), i])
t = -0.24882, df = 107.98, p-value = 0.804
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.3521837  0.2736268
sample estimates:
mean of x mean of y 
 2.172355  2.211633

t.test(log10(ec.data[which(ec.data[,3]>0 & ec.data$timepoint=="01"),3]),log10(ec.data[which(ec.data[,3]>0 & ec.data$timepoint=="02"),3]))


    Welch Two Sample t-test

data:  log10(ec.data[which(ec.data[, 3] > 0 & ec.data$timepoint == "01"), 3]) and log10(ec.data[which(ec.data[, 3] > 0 & ec.data$timepoint == "02"), 3])
t = -0.8875, df = 20.011, p-value = 0.3854
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.6552791  0.2641036
sample estimates:
mean of x mean of y 
0.9402204 1.1358081

LOD-adjusted, log₁₀-transformed concentrations across sample types are used to produce: Pairwise scatterplots, Spearman correlation coefficients. This assesses co-occurrence patterns of environmental contamination across sample types.

To evaluate co-occurrence patterns of contamination across environmental compartments, all concentrations were adjusted for limits of detection and transformed to log₁₀ scale. Pairwise scatterplots and Spearman correlation coefficients were generated across the ten sample types.

Correlation strengths varied across matrices. Some environmental sample types, particularly those related to water and hand hygiene, showed moderate positive correlations, suggesting shared pathways or common contamination sources. Conversely, biological samples such as breastmilk and areola swabs showed weak or no correlation with other matrices, consistent with their low detection frequencies and distinct biological origins.

In the below Figure 21 correlation matrix, plots in the bottom left half show the scatterplots of log10 scale E. coli concentration levels for pairs of sample types. Plots on the diagonal line show the histogram of log10 scaleE. coliconcentration levels by sample type. Plots on the top right half show the Spearman correlation coefficients of log10 scale E. coli concentration levels between different sample types. The stars in the figure define the level of significance. * = 0.05, ** = 0.01, *** = 0.001.

# Save identical PDF
pdf("/cloud/project/docs/processed/plots/corr_plot.pdf", width = 8, height = 8)
corr_plot()
dev.off()

png 
  2

ecmug1 <- ecmug[which(ecmug$timepoint=="01"),]
ecmug2 <- ecmug[which(ecmug$timepoint=="02"),]

env.paras <- list()
for (i in 1:(length(list_sample_type)-1)){
  temp <- data.frame(cbind(ecmug1$conc[which(ecmug1$sample_type==list_sample_type[i])],
                           ecmug1$conc[which(ecmug1$sample_type==list_sample_type[i])]))
  names(temp) <- c("left","right")
  temp$left <- replace(temp$left,temp$left==0,NA)
  temp$right <- replace(temp$right,temp$right==0,list.LOD[i])
  fit_cens <- fitdistcens(log10(temp),"norm")
  env.paras[[i]] <- list(list_sample_type[i],summary(fit_cens)$estimate,summary(fit_cens)$vcov)
  para <- mvrnorm(n.iter,summary(fit_cens)$estimate,summary(fit_cens)$vcov)
  conc <- 10^(rnorm(n.iter,mean = para[,1], sd = para[,2]))
  conc_cens <- conc
  conc_cens[which(conc<list.LOD[i])] <- list.LOD[i]
  # par(mfrow=c(3,1))
  # hist(log10(conc),freq = F,main=paste0(list_sample_type[i],", Simulated"),xlim=c(-10,10),breaks=seq(-10,10,by=0.5))
  # hist(log10(conc_cens),freq = F,main="Simulated Censored",xlim=c(-10,10),breaks=seq(-10,10,by=0.5))
  # hist(log10(temp$right),freq = F,main="Observed",xlim=c(-10,10),breaks=seq(-10,10,by=0.5))
  # par(mfrow=c(1,1))
  # qqplot(log10(conc_cens),log10(temp$right),main=list_sample_type[i])
  # abline(coef = c(0, 1))
}

#env.paras[[10]] <- env.paras2[[10]]
env.paras

[[1]]
[[1]][[1]]
[1] "areola swab"

[[1]][[2]]
       mean          sd 
-0.04069082  0.66989273 

[[1]][[3]]
            mean          sd
mean  0.04367984 -0.02747175
sd   -0.02747175  0.02383614


[[2]]
[[2]][[1]]
[1] "bathing water"

[[2]][[2]]
     mean        sd 
-0.378888  1.128873 

[[2]][[3]]
             mean           sd
mean  0.054277265 -0.001910912
sd   -0.001910912  0.029779422


[[3]]
[[3]][[1]]
[1] "breastmilk"

[[3]][[2]]
      mean         sd 
-0.2029659  1.1033146 

[[3]][[3]]
            mean          sd
mean  0.11887234 -0.07449359
sd   -0.07449359  0.07365827


[[4]]
[[4]][[1]]
[1] "child handrinse"

[[4]][[2]]
    mean       sd 
1.097656 0.942558 

[[4]][[3]]
             mean           sd
mean  0.016465136 -0.006081775
sd   -0.006081775  0.013066704


[[5]]
[[5]][[1]]
[1] "drinking water"

[[5]][[2]]
      mean         sd 
-0.7851218  0.7626990 

[[5]][[3]]
             mean           sd
mean  0.008433971 -0.001016928
sd   -0.001016928  0.005466481


[[6]]
[[6]][[1]]
[1] "fomite"

[[6]][[2]]
     mean        sd 
0.7630791 0.8651518 

[[6]][[3]]
             mean           sd
mean  0.007115732 -0.001701648
sd   -0.001701648  0.005216147


[[7]]
[[7]][[1]]
[1] "food"

[[7]][[2]]
       mean          sd 
-0.01338436  1.56939991 

[[7]][[3]]
            mean          sd
mean  0.17170459 -0.09690086
sd   -0.09690086  0.13561393


[[8]]
[[8]][[1]]
[1] "mother handrinse"

[[8]][[2]]
    mean       sd 
2.028015 1.088193 

[[8]][[3]]
             mean           sd
mean  0.015853242 -0.001488147
sd   -0.001488147  0.010129949


[[9]]
[[9]][[1]]
[1] "sibling handrinse"

[[9]][[2]]
    mean       sd 
1.999679 0.948677 

[[9]][[3]]
              mean            sd
mean  0.0144405529 -0.0006749755
sd   -0.0006749755  0.0082758511

save(env.paras,file="/cloud/project/docs/processed/res/env.paras_T1_101023.rda")

env.paras <- list()
for (i in 1:length(list_sample_type)){
  temp <- data.frame(cbind(ecmug2$conc[which(ecmug2$sample_type==list_sample_type[i])],
                           ecmug2$conc[which(ecmug2$sample_type==list_sample_type[i])]))
  names(temp) <- c("left","right")
  temp$left <- replace(temp$left,temp$left==0,NA)
  temp$right <- replace(temp$right,temp$right==0,list.LOD[i])
  fit_cens <- fitdistcens(log10(temp),"norm")
  env.paras[[i]] <- list(list_sample_type[i],summary(fit_cens)$estimate,summary(fit_cens)$vcov)
  para <- mvrnorm(n.iter,summary(fit_cens)$estimate,summary(fit_cens)$vcov)
  conc <- 10^(rnorm(n.iter,mean = para[,1], sd = para[,2]))
  conc_cens <- conc
  conc_cens[which(conc<list.LOD[i])] <- list.LOD[i]
  # par(mfrow=c(3,1))
  # hist(log10(conc),freq = F,main=paste0(list_sample_type[i],", Simulated"),xlim=c(-10,10),breaks=seq(-10,10,by=0.5))
  # hist(log10(conc_cens),freq = F,main="Simulated Censored",xlim=c(-10,10),breaks=seq(-10,10,by=0.5))
  # hist(log10(temp$right),freq = F,main="Observed",xlim=c(-10,10),breaks=seq(-10,10,by=0.5))
  # par(mfrow=c(1,1))
  # qqplot(log10(conc_cens),log10(temp$right),main=list_sample_type[i])
  # abline(coef = c(0, 1))
}
env.paras2 <- env.paras
save(env.paras,file="/cloud/project/docs/processed/res/env.paras_T2_101023.rda")

Conclusions

This study provides one of the most comprehensive, high-resolution assessments of infant fecal exposure conducted in a rural LMIC setting. The integration of multi-source environmental sampling and agent-based exposure modelling revealed several crucial insights:

Dominant exposure pathways evolve as infants grow, driven by increased mobility, soil contact, and diversification of diet.
Contaminated food, breast milk (via contaminated areola), and soil are critical sources of E. coli exposure.
Hygiene practices of caregivers, especially before breastfeeding and feeding, are likely major determinants of early infant exposure.
WASH interventions focusing only on water, latrines, and handwashing infrastructure may miss the most important infant-specific exposure pathways.

References

Gough, E. K., Moulton, L. H., Mutasa, K., Ntozini, R., Stoltzfus, R. J., Majo, F. D., Smith, L. E., Panic, G., Giallourou, N., Jamell, M., Kosek, P., Swann, J. R., Humphrey, J. H., Prendergast, A. J., & for the Sanitation Hygiene Infant Nutrition Efficacy (SHINE) Trial Team. (2020). Effects of improved water, sanitation, and hygiene and improved complementary feeding on environmental enteric dysfunction in children in rural Zimbabwe: A cluster-randomized controlled trial. PLOS Neglected Tropical Diseases, 14(2), e0007963. https://doi.org/10.1371/journal.pntd.0007963

Horecha, A. N., & Mekonen, S. (2025). Assessment of Escherichia coli contamination risk across different environmental media in rural Ethiopia. Journal of Water and Health, 23(8), 940–951. https://doi.org/10.2166/wh.2025.050

Null, C., Stewart, C. P., Pickering, A. J., Dentz, H. N., Arnold, B. F., Arnold, C. D., Benjamin-Chung, J., Clasen, T., Dewey, K. G., Fernald, L. C. H., Hubbard, A. E., Kariger, P., Lin, A., Luby, S. P., Mertens, A., Njenga, S. M., Nyambane, G., Ram, P. K., & Colford, J. M. (2018). Effects of water quality, sanitation, handwashing, and nutritional interventions on diarrhoea and child growth in rural Kenya: A cluster-randomised controlled trial. The Lancet Global Health, 6(3), e316–e329. https://doi.org/10.1016/S2214-109X(18)30005-6

Rogawski McQuade, E. T., Platts-Mills, J. A., Gratz, J., Zhang, J., Moulton, L. H., Mutasa, K., Majo, F. D., Tavengwa, N., Ntozini, R., Prendergast, A. J., Humphrey, J. H., Liu, J., & Houpt, E. R. (2020). Impact of Water Quality, Sanitation, Handwashing, and Nutritional Interventions on Enteric Infections in Rural Zimbabwe: The Sanitation Hygiene Infant Nutrition Efficacy (SHINE) Trial. The Journal of Infectious Diseases, 221(8), 1379–1386. https://doi.org/10.1093/infdis/jiz179