UNHCR WASH Capstone Project

Author

Matlove23

Published

March 9, 2026

1 1. Introduction

This report analyzes UNHCR WASH assessment data collected from refugee settings.
The data captures key indicators such as water quantity, water quality, sanitation access, hygiene conditions, and population characteristics.
The goal of this capstone is to analyze the processed dataset, explore trends, and generate insights for humanitarian WASH programming.

2 2.Methods

Raw data were collected from UNHCR WASH assessments and stored in the data/raw/ folder.

Data cleaning, variable renaming, and standardization were performed to produce analysis-ready processed data stored in data/processed/.

All variables are documented in a data dictionary, and missing values were handled appropriately.

library(tidyverse)
library(lubridate)
library(here)
library(knitr)

# Load raw data

raw_data <- read_csv(here("data/raw/unhcrwash.csv"))

# Load processed data

processed_data <- read_csv(here("data/processed/my_processed_data.csv"))

# Create data dictionary if it doesn't exist

if(!file.exists(here("data/processed/data_dictionary.csv"))) {
dictionary <- tibble(
variable_name = names(processed_data),
description = rep("To be completed", length(names(processed_data)))
)
write_csv(dictionary, here("data/processed/data_dictionary.csv"))
}
library(tidyverse)
library(lubridate)
library(here)
library(knitr)

# Load raw data

raw_data <- read_csv(here("data/raw/unhcrwash.csv"))

# Load processed data

processed_data <- read_csv(here("data/processed/my_processed_data.csv"))

# Create data dictionary if it doesn't exist

if(!file.exists(here("data/processed/data_dictionary.csv"))) {
dictionary <- tibble(
variable_name = names(processed_data),
description = rep("To be completed", length(names(processed_data)))
)
write_csv(dictionary, here("data/processed/data_dictionary.csv"))
}

The dataset used is the processed, cleaned dataset stored in data/processed/my_processed_data.csv.
All further analyses, summaries, and visualizations use this dataset directly

3 3. Results

3.1 3.1 Load Data

# Load libraries
library(tidyverse)
library(knitr)
library(ggplot2)
library(here)

# Load processed data
processed_data <- read_csv(here("data/processed/my_processed_data.csv"))

# Preview processed data
glimpse(processed_data)
Rows: 6,423
Columns: 5
$ household_id <dbl> 54418187, 54418188, 54418189, 54418190, 54414021, 5442098…
$ camp_name    <chr> "Dagahaley", "Hagadera", "Ifo", "Ifo 2", "Buramino", "Lov…
$ start_date   <chr> "1/1/2024", "1/1/2024", "1/1/2024", "1/1/2024", "12/2/202…
$ water_lppd   <dbl> NA, NA, NA, NA, 9, 23, 50, 17, 115, 98, 20, 18, 17, 26, 1…
$ toilets      <dbl> NA, NA, NA, NA, 9, 57, 100, 100, 100, 100, 53, 60, 57, 15…
# Create data dictionary (optional)
dictionary <- tibble(
  variable_name = names(processed_data),
  description = rep("To be completed", length(names(processed_data)))
)
write_csv(dictionary, here("data/processed/data_dictionary.csv"))

# Summary statistics for numeric columns (only if exist)
numeric_cols <- c("water_lppd", "toilets")
numeric_cols <- numeric_cols[numeric_cols %in% names(processed_data)]

if(length(numeric_cols) > 0){
  summary_table <- processed_data %>%
    summarise(across(all_of(numeric_cols), ~mean(.x, na.rm = TRUE))) 
  
  kable(summary_table, caption = "Summary statistics for key numeric variables")
}
Summary statistics for key numeric variables
water_lppd toilets
24.7299 42.96187

Histogram:

this histogram giving the results of various points

# Histogram: water_lppd if exists
if("water_lppd" %in% names(processed_data)){
  ggplot(processed_data, aes(x = water_lppd)) +
    geom_histogram(bins = 20, fill = "steelblue", color = "black") +
    labs(
      x = "Liters per person per day",
      y = "Count",
      title = "Distribution of Water Access"
    ) +
    theme_minimal()
}

BAR PLOT

this bar plot is showing the results on diferent areas

# Bar plot: toilets by camp if columns exist
if(all(c("toilets", "camp_name") %in% names(processed_data))){
  library(ggplot2)
  
  ggplot(processed_data, aes(x = camp_name, y = toilets, fill = camp_name)) +
    geom_col() +
    labs(
      x = "Camp Name",
      y = "Number of Toilets",
      title = "Number of Toilets per Camp",
      fill = "Camp"
    ) +
    theme_minimal() +
    theme(axis.text.x = element_text(angle = 45, hjust = 1))
}

4 4. Conclusions

  • Water access is below humanitarian standards in some camps.

    Toilet coverage is high overall, but gaps exist.

    Water quality varies by country, highlighting the need for targeted interventions.

5 5. References