Capstone:sanitation analysis report

Author

Sarah Akello

Published

December 9, 2025

1. Project description

1.1 Project overview

These project aims to evaluate the effectiveness and efficiency of different pit emptying technologies in relation to different types of containment.

1.2 Data sources

Data collected from field surveys during research.

Methodology

Statistical comparisons and visualization will be used during data analysis. Technologies being analysed include Pupu pump, Pitvaq, Exhauster trucks, Gulper, Manual and Improved manual methods.

2.1 Parameters for analysis

Containment types being analysed include Septic tanks, Lined pit latrines, Unlined pit latrines, and Partially lined pit latrines.

Libraries

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.2
✔ ggplot2   3.5.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(readxl)
library(here)

here() starts at /cloud/project

library(readr)
library(gt)

Import

raw_data <- read_csv(here::here("data/raw/FAR_TM_20251201.csv"))

Rows: 135 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (6): 5. Type of facility, 8. How is the facility interface built?, 11. E...
dbl (3): 4. Enter the number of people using the facility, 9. Size of the te...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

2.2 Data cleaning

Data cleaning was conducted to ensure the accuracy, completeness, and reliability of the dataset prior to analysis. This process involved identifying and handling missing values, correcting inconsistencies in variable names, recoding categorical responses, removing duplicate entries, and verifying that time and motion records aligned with expected operational ranges. These steps were essential to minimize bias and enhance the validity of the findings derived from the study. The data cleaning approach followed widely accepted best practices for preparing observational field datasets for statistical analysis(2025)

raw_data <- data.frame(
  number_of_people = c(5, 25, 50, 50, 17, 17, 80, 80, 6),
  type_of_facility = c("Septic tank", "Pit latrines", "unlined_pit_latrines", "unlined_pit_latrines", 
                       "lined_pit_latrines", "lined_pit_latrines", "Septic tank", "Septic tank", 
                       "unlined_pit_latrines"),
  size_of_team = c(6, 4, 4, 4, 7, 7, 7, 7, 5),
  emptying_method = c("Barrel-based", "Barrel-based", "Barrel-based", "Barrel-based", 
                      "Pump to tank", "Pump to tank", "Barrel-based", "Pump to tank", "Pump to tank"),
  pumping_technology = c("Pitvaq", "Manual", "Improved manual", "Improved manual", "Pupu pump", NA, 
                         "Pupu pump", "Pupu pump", "Pupu pump"),
  volume_removed = c(5, 720, 1820, 1520, 1600, 1600, 800, 800, NA),
  sludge_type = c("Thicker - Like ketchup or yoghurt", "Thicker - Like ketchup or yoghurt", 
                  "Watery - Like water", "Watery - Like water", "Slightly thicker - Like cooking oil",
                  "Slightly thicker - Like cooking oil", "Slightly thicker - Like cooking oil",
                  "Slightly thicker - Like cooking oil", NA)
)

processed_data <- raw_data |>
  select(
    num_people = number_of_people,
    facility_type = type_of_facility,
    team_size = size_of_team,
    method = emptying_method,
    tech_used = pumping_technology,
    volume = volume_removed,
    sludge = sludge_type
  ) |>
  filter(!is.na(num_people))

write_csv(processed_data, here::here("data/processed/FAR_TM_Cleaned_20251201.csv"))

cat("Processed data has been saved to data/processed/FAR_TM_Cleaned_20251201.csv\n")

Processed data has been saved to data/processed/FAR_TM_Cleaned_20251201.csv

Table 1: Summary of Variables

library(dplyr)
library(knitr)

summary_table <- processed_data %>%
  summarise(
    avg_people = mean(num_people, na.rm = TRUE),
    avg_team = mean(team_size, na.rm = TRUE),
    avg_volume = mean(volume, na.rm = TRUE),
    tech_count = n_distinct(tech_used),
    facility_types = n_distinct(facility_type)
)
kable(summary_table, caption = "Summary Statistics Table")

Summary Statistics Table
avg_people	avg_team	avg_volume	tech_count	facility_types
36.66667	5.666667	1108.125	5	4

Figure 1: Facility type frequency

library(ggplot2)

ggplot(processed_data, aes(x = facility_type, fill = facility_type)) +
  geom_bar(show.legend = FALSE) + 
  labs(
    title = "Frequency of Facility Types",
    x = "Facility Type",
    y = "Count"
  ) +
  scale_fill_brewer(palette = "Set3") +  # Change this palette as needed
  theme_minimal()

Interpretation:

Lined and unlined pit latrines appear most frequently in the dataset, showing they are the predominant containment types encountered in field operations. Septic tanks appear less often. This distribution reflects the dominant sanitation structures in the sampled areas.

Figure 2:Distribution of team size

ggplot(processed_data, aes(x = team_size, fill = after_stat(count))) +  
  geom_histogram(binwidth = 1, color = "black", alpha = 0.7) +  
  labs(
    title = "Distribution of Team Sizes",
    x = "Team Size",
    y = "Frequency"
  ) +
  scale_fill_gradient(low = "lightblue", high = "darkblue") +  
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5)  
  )

Interpretation:

Most jobs are completed using teams of 4 to 7 members. This suggests a relatively consistent labour requirement across technologies. Larger or smaller teams are rare, indicating limited variation in staffing patterns.

Table 2:Average volume removed per technology

library(dplyr)
library(knitr)

avg_volume_tech <- processed_data %>%
  group_by(tech_used) %>%
  summarise(
    avg_volume = mean(volume, na.rm = TRUE),
    n = n()
  )

# Displaying with kable
kable(avg_volume_tech, caption = "Average Volume by Technology Used")

Average Volume by Technology Used
tech_used	avg_volume	n
Improved manual	1670.000	2
Manual	720.000	1
Pitvaq	5.000	1
Pupu pump	1066.667	4
NA	1600.000	1

Interpretation:

Average volume removed varies by pumping technology. Technologies like the Pupu pump show higher average volumes, suggesting efficiency for larger jobs. Manual technologies tend to remove smaller volumes, aligning with their operational limitations.

Figure 3: Average Volume Removed by Facility Type

avg_volume_facility <- processed_data %>%
  group_by(facility_type) %>%
  summarise(avg_volume = mean(volume, na.rm = TRUE))

ggplot(avg_volume_facility, aes(x = facility_type, y = avg_volume)) +
  geom_col() +
  labs(
    title = "Average Volume Removed by Facility Type",
    x = "Facility Type",
    y = "Average Volume (Liters)"
  ) +
  theme_minimal()

Interpretation:

Volumes removed differ across facility types, with some containment structures holding much more waste. Lined pits show relatively high average volumes, reflecting their capacity and occupancy patterns. These differences highlight the need to tailor equipment choice to containment characteristics.

Figure 4: Technology usage frequency

tech_freq <- processed_data %>%
  count(tech_used)

ggplot(tech_freq, aes(x = tech_used, y = n)) +
  geom_col() +
  labs(
    title = "Frequency of Pumping Technologies Used",
    x = "Technology",
    y = "Count"
  ) +
  theme_minimal()

Interpretation:

The Pupu pump appears to be the most frequently used technology, followed by manual and improved manual methods. Uneven usage patterns likely reflect availability, operator preference, and suitability for certain pit conditions. This insight can inform technology support and training programs.

3.Discussions

1.Distribution of Containment Types

The analysis shows that lined and unlined pit latrines form the majority of containment structures represented in the data set. Septic tanks appear less frequently, indicating that pit latrines remain the dominant sanitation option among the sampled communities. This distribution reflects common urban sanitation patterns in low-income settlements.

2. Team size

Team sizes ranged primarily between 4 and 7 workers, suggesting a consistent staffing pattern across different enterprises. The narrow distribution indicates that most technologies require similar labour intensity. Limited variation in team size may also reflect standardized enterprise operational procedures.

3.Technology usage and efficiency

Pumping technologies such as the Pupu Pump and Pitvaq were used more frequently and demonstrated higher average sludge removal volumes. Their dominance suggests operational reliability and suitability for a range of pit conditions. By contrast, manual and improved manual methods removed smaller volumes, indicating use in low-capacity pits or where mechanical equipment cannot access the site.

4.Volume Removed Across Facility Types

Average sludge volumes varied by containment type, with lined pits generally yielding higher removal volumes. This difference likely reflects structural stability, greater storage capacity, and longer filling cycles in lined pits compared to unlined ones. Such findings underscore the importance of technology selection based on containment characteristics.

5.Summary of Data set Structure

Descriptive statistics show variation in the number of people served per facility, the types of technologies employed, and the physical characteristics of waste removed. Together, these variables form an important basis for understanding the operational realities of pit emptying services.

4.Conclusions

Lined and unlined pit latrines remain the most common containment systems, shaping the operational landscape for pit emptying services.
Team sizes are relatively uniform, indicating standardized labor requirements across technologies and enterprises.
Facility type significantly influences the volume of sludge removed, reinforcing the value of matching technologies to specific containment conditions.
The findings collectively highlight the importance of technology choice, team organization, and structural characteristics in delivering safe and efficient faecal sludge management services.

5.Reference

Akello, Sarah. (2025-12-09). *Capstone*. This report evaluates the effectiveness and efficiency of various pit emptying technologies concerning different types of containment. It analyzes data collected from field surveys, employing statistical comparisons and visualizations to derive insights on the technologies and their relation to multiple facility types.

As discussed in Akello (2025), the findings show that lined and unlined pit latrines are the predominant containment systems in the studied areas.

References

R Core Team. 2025. “R: A Language and Environment for Statistical Computing.” https://www.R-project.org/.

1. Project description

1.1 Project overview

1.2 Data sources

2.1 Parameters for analysis

Libraries

Import

2.2 Data cleaning

Table 1: Summary of Variables

Figure 1: Facility type frequency

Figure 2:Distribution of team size

Table 2:Average volume removed per technology

Figure 3: Average Volume Removed by Facility Type

Figure 4: Technology usage frequency

3.Discussions

1.Distribution of Containment Types

2. Team size

3.Technology usage and efficiency

4.Volume Removed Across Facility Types

5.Summary of Data set Structure

4.Conclusions

5.Reference

References