Code
library(tidyverse)
library(here)
library(dplyr)
library(ggplot2)
library(gt)
library(gtsummary)
library(knitr)
library(DT)
library(kableExtra)This project asses historical information on materials in the U.S. municipal waste stream from 1960 to 2018 (measured in tons) from EPA. It includes data on the waste generation, recycling, and composting of the materials in question.
The main aim is to assess how materials waste generation has evolved in comparison with waste management practices.
The questions to be answered are :
library(tidyverse)
library(here)
library(dplyr)
library(ggplot2)
library(gt)
library(gtsummary)
library(knitr)
library(DT)
library(kableExtra)The dataset was tidied by renaming columns for clarity, converting specific values in Yardtfood_recy to numeric, and reshaping the data from wide to long format. This was done using pivot_longer(), which organized multiple related columns into two new ones—material and mat_status—with their values in a single column called qty_ton. Finally, it was saved the cleaned data as a CSV file.
raw_data <- read_csv(here::here("data/raw/mat_raw.csv"))
raw_data <- raw_data |>
rename(
Paper_gen = Paper,
Glass_gen = Glass,
Metals_gen = Metals,
Plastics_gen = Plastics,
Food_gen = Food,
Yardt_gen = `Yard trimmings`,
Allothers_gen = `All other`,
Paper_recy = Paper_recycled ,
Glass_recy = Glass_recycled ,
Metals_recy = Metals_recycled ,
Plastics_recy = Plastics_recycled ,
Yardtfood_recy = `Yard trimmings, Food waste_Composted`,
Allothers_recy = `All others_Recycled`
)
raw_data <- raw_data |>
mutate(Yardtfood_recy = ifelse(Yardtfood_recy == "Neg.", 0,
as.numeric(Yardtfood_recy)))
processed_data <- raw_data |>
pivot_longer(
cols = Paper_gen:Allothers_recy,
names_to = c("material", "mat_status"),
names_pattern = "(.*)_(.*)",
values_to = "qty_ton")
write_csv(processed_data,
here::here("data/processed/my-processed-data.csv"))To answer the three research questions, the following methods were applied:
A linear regression was applied to total waste generation over the study period to quantify long-term trends. In addition, the Compound Annual Growth Rate (CAGR) was calculated to summarize the average annual growth in waste generation.
Waste generation quantities (in tons) by material were analyzed using descriptive statistics, including mean, median, minimum, and maximum values. Line plots were used to assess temporal trends in the quantity of each material. Furthermore, the CAGR (expressed as a percentage) was calculated for each material to identify differences in growth patterns among materials.
The dataset was then prepared to produce a combined visualization showing total waste generation, total waste recycled, and the recycling rate. A line plot was used to display these three indicators simultaneously for total waste. Additionally, a separate plot was created to illustrate differences in recycling rates among materials over time, enabling a comparative analysis of recycling performance by material.
It is observed that waste generation increased over time, with an average compound annual growth rate CAGR ( Figure 1 ) of approximately 2% between 1960 and 2018
total_gen_year <- processed_data |>
filter(
mat_status == "gen") |>
group_by(Year) |>
summarise(
total_waste_gen = sum(qty_ton)
)
total_recy_year <- processed_data |>
filter(
mat_status == "recy") |>
group_by(Year) |>
summarise(
total_waste_recy = sum(qty_ton)
)
#Calculus of CAGR for total waste generated
years <- 2018 - 1960
cagr_gene <- (total_gen_year$total_waste_gen[total_gen_year$Year == 2018] /
total_gen_year$total_waste_gen[total_gen_year$Year == 1960])^(1/years) - 1
cagr_round <- round(cagr_gene * 100, 2)
# Plot of CAGR
ggplot() +
labs(title = "CAGR of total waste generated between 1960 adn 2018") + xlim(0, 1) +
ylim(0, 1) +
theme_void() +
annotate("text", x = 0.5, y = 0.5,
label = paste("CAGR =", cagr_round, "%"),
size = 8, color = "black")Figure 2 illustrates that total waste generation follows a strong linear trend, with an 𝑅^2 coefficient of 0.97, indicating a highly consistent increase over time. Based on the regression results, an average annual increase of approximately 3,442 tons is observed.
# Fit the linear model
Waste_evol <- lm(total_waste_gen ~ Year, data = total_gen_year)
# Extract coefficients
coef <- summary(Waste_evol)$coefficients
intercept <- coef[1, 1]
slope <- coef[2, 1]
r_squared <- summary(Waste_evol)$r.squared
# Create the equation and R-squared string
equation <- paste("y =", round(intercept, 2), "+", round(slope, 2), "* Year")
r_squared_text <- paste("R² =", round(r_squared, 3))
# Create the ggplot
ggplot(total_gen_year, aes(x = Year, y = total_waste_gen)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Linear Regression: Total Waste Generation") +
annotate("text", x = 1970, y = 250000, label = equation, size = 4, color = "black") +
annotate("text", x = 1970, y = 190000, label = r_squared_text, size = 4, color = "black")The results of CAGR show that plastic is the material with the largest increase over time, followed by the category “other materials,” which includes rubber, leather, textiles, and wood. Food waste ranks third in terms of long-term growth. This is summarize in the Table 1.
mat_gen_summa <- processed_data |>
filter(mat_status == "gen")
summary_materials <- mat_gen_summa |>
filter(Year %in% c(1960, 2018)) |>
group_by(material, Year) |>
summarise(quantity = sum(qty_ton, na.rm = TRUE), .groups = "drop") |>
tidyr::pivot_wider(
names_from = Year,
values_from = quantity,
names_prefix = "q_"
)
summary_materials <- summary_materials |>
mutate(
CAGR_percent = ((q_2018 / q_1960)^(1 / (2018 - 1960)) - 1)*100
)
summary_materials |>
gt() |>
tab_header(title = "Compound Annual Growth Rate (CAGR) in percent for each material",
subtitle = "q_year = Total waste in tons") |>
fmt_number(columns = material:CAGR_percent, decimals = 0) | Compound Annual Growth Rate (CAGR) in percent for each material | |||
| q_year = Total waste in tons | |||
| material | q_1960 | q_2018 | CAGR_percent |
|---|---|---|---|
| Allothers | 8,002 | 52,910 | 3 |
| Food | 12,200 | 63,130 | 3 |
| Glass | 6,721 | 12,250 | 1 |
| Metals | 10,818 | 25,600 | 1 |
| Paper | 29,985 | 67,390 | 1 |
| Plastics | 391 | 35,680 | 8 |
| Yardt | 20,000 | 35,400 | 1 |
Comparing waste generation by material in term of quantity not grow, paper is the most large waste in the US, follow by yardtriming, all others and food. This can be suported by mean and medians illustrated in Table 2 and Figure 3.
waste_tbl_gene <- mat_gen_summa |>
group_by(material) |>
summarise(
mean = mean(qty_ton),
sd = sd(qty_ton),
median = median(qty_ton),
min = min(qty_ton),
max = max(qty_ton)
)
waste_tbl_gene |>
gt() |>
tab_header(title = "Total waste generation by material",
subtitle = "Data between 1960-2018") |>
fmt_number(columns = material:max, decimals = 0) | Total waste generation by material | |||||
| Data between 1960-2018 | |||||
| material | mean | sd | median | min | max |
|---|---|---|---|---|---|
| Allothers | 27,671 | 14,468 | 27,911 | 8,002 | 53,090 |
| Food | 22,652 | 11,363 | 13,400 | 12,200 | 63,130 |
| Glass | 12,244 | 1,949 | 12,612 | 6,721 | 15,355 |
| Metals | 16,848 | 3,953 | 15,661 | 10,818 | 25,600 |
| Paper | 63,255 | 16,878 | 67,480 | 29,985 | 88,260 |
| Plastics | 16,171 | 12,065 | 15,871 | 391 | 35,680 |
| Yardt | 28,976 | 4,727 | 30,000 | 20,000 | 35,400 |
ggplot(mat_gen_summa, aes(x = material, y = qty_ton)) +
geom_boxplot(fill = "lightblue", outlier.colour = "red", outlier.shape = 16, outlier.size = 2) +
labs(title = "Box Plot of Waste generation by material",
x = "Material",
y = "Waste generated (tons)") +
theme_minimal()By comparing waste generation and recycling through recycling rate percentages in Figure 4, it is observed that recycling levels remain substantially lower than waste generation in terms of total tons. This difference is illustrated by the recycling rate curve, which reaches a maximum value of approximately 35%. The results also show that the recycling rate increased from about 5% to 35% over a period of 58 years, corresponding to a compound annual growth rate (CAGR) of approximately 5%.
joined_totals <- right_join(total_gen_year, total_recy_year, by = "Year")
joined_totals <- joined_totals |>
mutate(
recycling_rate = (total_waste_recy / total_waste_gen) *100
)
scale_factor <- max(joined_totals$total_waste_gen, na.rm = TRUE) / 100
# Initialize the plot
ggplot(joined_totals, aes(x = Year)) +
geom_line(aes(y = total_waste_gen, color = "Waste Generated"), linewidth = 1) +
geom_line(aes(y = total_waste_recy, color = "Waste Recycled"), linewidth = 1) +
geom_line(
aes(y = recycling_rate * scale_factor, color = "Recycling rate"),
linewidth = 1,
linetype = "dashed"
) +
scale_y_continuous(
name = "Total Waste (tons)",
sec.axis = sec_axis(
~ . / scale_factor,
name = "Recycling rate (%)"
)
) +
scale_color_manual(
values = c(
"Waste Generated" = "black",
"Waste Recycled" = "blue",
"Recycling rate" = "red"
)
) +
labs(
title = "Waste Generation, Recycling, and Recycling Rate Over Time",
x = "Year",
color = ""
) +
theme_minimal()
#Calculating CAGR for recycling
years <- 2018 - 1960
cagr_recy <- (total_recy_year$total_waste_recy[total_recy_year$Year == 2018] /
total_recy_year$total_waste_recy[total_recy_year$Year == 1960])^(1/years) - 1By comparing the recycling rates by material in Figure 5 , it is observed that paper exhibits the largest and most consistent increase over time. However, for most of the other materials, an inflection point in recycling rates is evident between 1980 and 2000. In contrast, plastic consistently presents the lowest recycling rate throughout the entire study period.
# Preparation of data, combining food and yard trimming generation amounts to one new categoric
recy_rate_mat <- processed_data |>
mutate(material = ifelse(material %in% c("Food", "Yardt"), "Yardtfood", material)) |>
group_by(Year, material, mat_status) |>
summarise(qty_ton = sum(qty_ton, na.rm = TRUE), .groups = "drop")
# Creating a new data frame with the new categoric Yardtfood to calculate recycled rates for each material
recy_rate_mat <- recy_rate_mat |>
group_by(Year, material, mat_status) |>
summarise(qty_ton = sum(qty_ton, na.rm = TRUE), .groups = "drop") |>
pivot_wider(
names_from = mat_status,
values_from = qty_ton,
values_fill = 0
)
recy_rate_mat <-recy_rate_mat |>
mutate(recy_percent = (recy/gen)*100)
# Filterng NA and INF values
recy_rate_plot <- recy_rate_mat |>
filter(!is.na(recy_percent) & is.finite(recy_percent))
# Ploting recycling rate by material over time
ggplot(recy_rate_plot, aes(x = Year, y = recy_percent, color = material)) +
geom_line(linewidth = 1) +
labs(
title = "Recycling Rate by Material Over Time",
x = "Year",
y = "Recycling rate (%)",
color = "Material"
) +
theme_minimal()
- Waste generation experienced a global annual growth, with a compound annual growth rate (CAGR) of 2%. Additionally, based on the regression results, an average annual increase of approximately 3,442 tons is observed.
- An inflection point in recycling rates was observed between 1980 and 2000, during which metals, glass, and organic waste experienced the largest increases. This change was likely associated with the introduction of new recycling technologies and the strengthening of municipal waste management systems during that period, as noted by (Ross and Law 2023a, 2023b).
- Plastic consistently exhibited the lowest recycling rate throughout the study period. In contrast, paper, the largest waste stream in terms of quantity, also achieved the highest recycling rates, a pattern explained by (Kinnaman 2000) due to well-established paper recycling programs and infrastructure.
- Although total waste generation grew at an average rate of 2% per year, recycling increased more rapidly, with a CAGR of 5% annually, indicating a relative improvement in the system’s capacity to recover materials over time. However, this trend applies to the global system overall and does not necessarily reflect improvements for all individual materials.