library(tidyverse)
library(lubridate)
library(here)
library(knitr)
# Load raw data
raw_data <- read_csv(here("data/raw/unhcrwash.csv"))
# Load processed data
processed_data <- read_csv(here("data/processed/my_processed_data.csv"))
# Create data dictionary if it doesn't exist
if(!file.exists(here("data/processed/data_dictionary.csv"))) {
dictionary <- tibble(
variable_name = names(processed_data),
description = rep("To be completed", length(names(processed_data)))
)
write_csv(dictionary, here("data/processed/data_dictionary.csv"))
}
library(tidyverse)
library(lubridate)
library(here)
library(knitr)
# Load raw data
raw_data <- read_csv(here("data/raw/unhcrwash.csv"))
# Load processed data
processed_data <- read_csv(here("data/processed/my_processed_data.csv"))
# Create data dictionary if it doesn't exist
if(!file.exists(here("data/processed/data_dictionary.csv"))) {
dictionary <- tibble(
variable_name = names(processed_data),
description = rep("To be completed", length(names(processed_data)))
)
write_csv(dictionary, here("data/processed/data_dictionary.csv"))
}UNHCR WASH Capstone Project
1 1. Introduction
This report analyzes UNHCR WASH assessment data collected from refugee settings.
The data captures key indicators such as water quantity, water quality, sanitation access, hygiene conditions, and population characteristics.
The goal of this capstone is to analyze the processed dataset, explore trends, and generate insights for humanitarian WASH programming.
2 2.Methods
Raw data were collected from UNHCR WASH assessments and stored in the data/raw/ folder.
Data cleaning, variable renaming, and standardization were performed to produce analysis-ready processed data stored in data/processed/.
All variables are documented in a data dictionary, and missing values were handled appropriately.
The dataset used is the processed, cleaned dataset stored in data/processed/my_processed_data.csv.
All further analyses, summaries, and visualizations use this dataset directly
3 3. Results
3.1 3.1 Load Data
# Load libraries
library(tidyverse)
library(knitr)
library(ggplot2)
library(here)
# Load processed data
processed_data <- read_csv(here("data/processed/my_processed_data.csv"))
# Preview processed data
glimpse(processed_data)Rows: 6,423
Columns: 5
$ household_id <dbl> 54418187, 54418188, 54418189, 54418190, 54414021, 5442098…
$ camp_name <chr> "Dagahaley", "Hagadera", "Ifo", "Ifo 2", "Buramino", "Lov…
$ start_date <chr> "1/1/2024", "1/1/2024", "1/1/2024", "1/1/2024", "12/2/202…
$ water_lppd <dbl> NA, NA, NA, NA, 9, 23, 50, 17, 115, 98, 20, 18, 17, 26, 1…
$ toilets <dbl> NA, NA, NA, NA, 9, 57, 100, 100, 100, 100, 53, 60, 57, 15…
# Create data dictionary (optional)
dictionary <- tibble(
variable_name = names(processed_data),
description = rep("To be completed", length(names(processed_data)))
)
write_csv(dictionary, here("data/processed/data_dictionary.csv"))
# Summary statistics for numeric columns (only if exist)
numeric_cols <- c("water_lppd", "toilets")
numeric_cols <- numeric_cols[numeric_cols %in% names(processed_data)]
if(length(numeric_cols) > 0){
summary_table <- processed_data %>%
summarise(across(all_of(numeric_cols), ~mean(.x, na.rm = TRUE)))
kable(summary_table, caption = "Summary statistics for key numeric variables")
}| water_lppd | toilets |
|---|---|
| 24.7299 | 42.96187 |
Histogram:
this histogram giving the results of various points
# Histogram: water_lppd if exists
if("water_lppd" %in% names(processed_data)){
ggplot(processed_data, aes(x = water_lppd)) +
geom_histogram(bins = 20, fill = "steelblue", color = "black") +
labs(
x = "Liters per person per day",
y = "Count",
title = "Distribution of Water Access"
) +
theme_minimal()
}
BAR PLOT
this bar plot is showing the results on diferent areas
# Bar plot: toilets by camp if columns exist
if(all(c("toilets", "camp_name") %in% names(processed_data))){
library(ggplot2)
ggplot(processed_data, aes(x = camp_name, y = toilets, fill = camp_name)) +
geom_col() +
labs(
x = "Camp Name",
y = "Number of Toilets",
title = "Number of Toilets per Camp",
fill = "Camp"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
}
4 4. Conclusions
Water access is below humanitarian standards in some camps.
Toilet coverage is high overall, but gaps exist.
Water quality varies by country, highlighting the need for targeted interventions.