Integrity Practices of Global Water Service Providers

A Review of Utility Integrity Assessment Data

Author

Affiliation

Claire Grandadam

Published

January 31, 2026

Introduction

To reach national policy objectives for water and sanitation service delivery and SDG6, water service providers globally must increase service coverage and service quality in the face of growing challenges related to financing, population growth, and climate change. This has spurred high interest in understanding the drivers of service provider performance, particularly from service provider managers, policy-makers, regulators, banks and other funders, as well as sector analysts.

National and global utility benchmarking processes have been used widely for such analyses, to assess the way service providers operate, and to orient sector reforms. The most common benchmarking tools tend to focus on service quality, financial management, and operational efficiency. Recently, water sector utility benchmarking tools have also been evolving in part in response to two developments:

First, the recognition that performance is affected by governance issues and corruption, and the related evidence that anti-corruption and integrity management can support utility performance (Acuña Mantilla & Vergara Stuardo, 2024; Barreto-Dillon et al., 2018; Water Integrity Network (WIN), 2021). There is a need for better tools that can support service providers striving for integrity, equity, sustainability, and resilience.
Second, growing criticism of the limits and bias in major utility benchmarking tools that have primarily focused on efficiency, to the detriment of distributional equity (Bhatt, 2024).

Major utility benchmarking tools such as Aquarating and NewIBnet recently introduced new process indicators focusing on management practices (in 2022 and 2023 respectively) (Cubillo et al., 2022). There is however limited data available on their use. The Water Integrity Network (WIN) also developed a set of indicators in 2018, to assess integrity management practices of service providers.

This analysis aims to examine anonymised data from applications of the Water Integrity Network tool since 2018, to asses the tool’s relevance and to understand patterns of use, possible resistance and challenges to assessing integrity practices, and trends in integrity management practices.

Methods

WIN developed integrity indicators for utilities with support from the Inter-American Development Bank and in partnership with the Consortium for Water Integrity in Latin America (with SIWI and cewas). This set of indicators is generally referred to as the Utility Integrity Assessment. It is part of WIN’s InWASH integrity management toolbox for water and sanitation service providers, an intensive process for prioritising and managing the integrity risks service providers face in their daily operations (a process which can also be run without the Utility Integrity Assessment). The tool is also freely available separately as an online survey at https://www.waterintegritynetwork.net/inwash.

Early pilots and analyses by WIN indicated that there are patterns in how good governance and integrity practices are implemented in organisations in the water and sanitation sectors globally and that benchmarking can be useful for urban service providers of a certain size. This is the case even though local context and the local regulatory environment of course influence which governance and integrity practices are possible and effective for different utilities.

Based on these observations, the Utility Integrity Assessment was designed as a short survey examining 15 indicators categorised under 5 integrity principles:

Tone at the top (with 2 indicators: on leadership and on codes of conduct);
Integrity risk assessment (with 1 indicator);
Integrity controls (with 7 indicators, on control of conflicts of interest, whistleblowing, recruitment, procurement, disclosure, participation, and financial management);
Corrective action (with 2 indicators, on sanctions for staff and on sanctions for contractors);
Monitoring (with 3 indicators, on risk monitoring, review of risk management functions, and responsiveness to external accountability mechanisms).

Service providers use the tool for a self-assessment. They score themselves for each indicator on a scale of 1 to 5, based on clear descriptions of the practices that they would have in place for each score. All indicators are formulated in similar ways. For example, the second indicator (IRM1.2) examines how effectively a service provider sets integrity standards and enforces them through a code of conduct or similar document. The 5 possible scores are:

Score 1: There is no written code of conduct (or similar document) outlining what the Utility expects from staff regarding values, rules, standards, and principles.
Score 2: Between 1 and 3.
Score 3: The code of conduct (or a similar document) contains most of the following elements (some elements may be missing): an ethical framework for decision making, generic examples of what constitute acceptable and unacceptable behaviour, guidelines on reporting problems anonymously, accountability and disciplinary policies for unethical behaviour, a list of ethics and compliance resources. The code of conduct was not revised in the last 3 calendar years. There is evidence that the Utility has organised a training on the code of conduct in the past, but the training is not routinely provided.
Score 4: Between 3 and 5.
Score 5: The code of conduct (or a similar document) contains ALL of the elements outlined in Level 3. The code of conduct was revised at most 3 years ago or more recently, after 1) the most recent changes to workplace profiles, including restructuring, relocation, changes in key roles or decision-making processes, or 2) the most recent changes in the external environment, including sector reform, new relevant legislation, changes in government strategies or in contractors’ business practices. Code of conduct training is routinely given to new employees as part of their induction programme.

This analysis looks at available data since 2018, from service providers that filled the survey online independently, and from service providers responding offline either as pilot users of the Utility Integrity Assesment or as participants of an InWASH process. The offline respondents received some support to fill in the survey and had to provide some justification for the scores (a brief explanation or link to evidence). WIN or partner expert or consultant also read and validated the scores.

The analysis looks at:

whether there are clear trends, strengths, and weaknesses across indicators and principles,
whether scores differ based on the way the the survey was taken,
whether there is significant variation in the practices of service providers across regions.

Code

library(tidyverse)
library(gt)
library(gtsummary)
library(knitr)
library(DT)

ia_data <- read_csv(here::here("data/processed/processed_data.csv"))

glimpse(ia_data)

Rows: 75
Columns: 30
$ on_off          <chr> "online", "online", "online", "online", "online", "onl…
$ id              <dbl> 20, 21, 26, 27, 29, 31, 33, 43, 47, 54, 49, 50, 51, 52…
$ date_submitted  <date> 2020-09-20, 2020-09-20, NA, NA, NA, 2030-09-20, NA, 2…
$ last_page       <dbl> 6, 6, 5, 4, 2, 6, 5, 6, 6, 6, 6, 2, 6, 6, 6, 6, 2, 5, …
$ date_started    <date> 2020-09-20, 2020-09-20, 2028-09-20, 2029-09-20, 2029-…
$ date_last_act   <date> 2020-09-20, 2020-09-20, 2028-09-20, 2029-09-20, 2029-…
$ irm11_short     <dbl> 5, 3, 1, 4, 5, 4, 3, 5, 5, 5, 5, 5, 5, 5, 5, 5, 1, 1, …
$ irm12_short     <dbl> 5, 3, 1, 4, 3, 4, 2, 4, 5, 3, 5, 5, 5, 5, 5, 5, 1, 1, …
$ mean_principle1 <dbl> 5.0, 3.0, 1.0, 4.0, 4.0, 4.0, 2.5, 4.5, 5.0, 4.0, 5.0,…
$ irm21_short     <dbl> 5, 1, 1, 3, 4, 5, 1, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 2, …
$ mean_principle2 <dbl> 5, 1, 1, 3, 4, 5, 1, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 2, …
$ irm31_short     <dbl> 4, 1, 1, 4, NA, 4, 4, 4, 5, 5, 5, NA, 5, 5, 3, 5, NA, …
$ irm32_short     <dbl> 4, 1, 1, 3, NA, 4, 4, 5, 4, 5, 4, NA, 5, 5, 3, 5, NA, …
$ irm33_short     <dbl> 3, 4, 3, 4, NA, 4, 3, 3, 4, 3, 4, NA, 5, 4, 3, 4, NA, …
$ irm34_short     <dbl> 5, 3, 1, 4, NA, 4, 2, 4, 5, 5, 5, NA, 5, 5, 2, 5, NA, …
$ irm35_short     <dbl> 5, 4, 2, 5, NA, 5, 2, 3, 5, 3, 5, NA, 5, 4, 2, 5, NA, …
$ irm36_short     <dbl> 4, 5, 3, 5, NA, 5, 2, 4, 5, 4, 5, NA, 5, 5, 4, 5, NA, …
$ irm37_short     <dbl> 4, 5, 1, 4, NA, 4, 5, 5, 5, 5, 5, NA, 5, 5, 5, 5, NA, …
$ mean_principle3 <dbl> 4.142857, 3.285714, 1.714286, 4.142857, NA, 4.285714, …
$ irm41_short     <dbl> 4, 3, 2, 4, NA, 4, 1, 4, 4, 5, 4, NA, 5, 5, 3, 5, NA, …
$ irm42_short     <dbl> 5, 4, 2, NA, NA, 4, 4, 4, 2, 4, 4, NA, 5, 1, 2, 5, NA,…
$ mean_principle4 <dbl> 4.5, 3.5, 2.0, 4.0, NA, 4.0, 2.5, 4.0, 3.0, 4.5, 4.0, …
$ irm51_short     <dbl> 3, 3, 1, NA, NA, 4, 1, 5, 5, 4, 5, NA, 5, 5, 3, 5, NA,…
$ irm52_short     <dbl> 4, 5, 2, NA, NA, 3, 4, 5, 5, 3, 4, NA, 5, 5, 2, 5, NA,…
$ irm53_short     <dbl> 4, 5, 1, NA, NA, 5, 4, 5, 5, 5, 5, NA, 5, 5, 5, 5, NA,…
$ mean_principle5 <dbl> 3.666667, 4.333333, 1.333333, NA, NA, 4.000000, 3.0000…
$ utility_code    <dbl> 1, 2, 3, NA, NA, 5, NA, 6, 7, 7, 8, NA, 7, 10, 11, 12,…
$ country_id      <dbl> 252, 252, 252, NA, NA, 252, NA, NA, 3, 3, 3, NA, 3, 3,…
$ region          <chr> "Eastern and Southern Africa", "Eastern and Southern A…
$ version         <chr> "current", "current", "current", "current", "current",…

Code

tbl_data_overview <- ia_data |>  
  summarise(
      online_count = sum(on_off == "online", na.rm = TRUE),
      offline_count = sum(on_off == "offline", na.rm = TRUE), 
      completed = sum(last_page > 4),
      incomplete = sum(last_page < 5),
      anon = sum(region == "UNKNOWN"),
      utilities = n_distinct(utility_code,  na.rm = TRUE),
      countries = n_distinct(country_id,  na.rm = TRUE)
  )

Table 1 shows the number of entries assessed, noting how many were submitted online and offline, how many were complete or not, how many were anonymous, and the number of utilities and countries represented.

Code

tbl_data_overview |>
    gt() |>
    tab_header(title = "Utility integrity assessment",
              subtitle = "Data from 75 surveys") |>
    cols_label(online_count = "Submitted online",
               offline_count = "Submitted offline",
               completed = "Complete surveys",
               incomplete = "Incomplete surveys",
               anon = "Anonymous entries",
               utilities = "Distinct utilities",
               countries = "Distinct countries")

Table 1: Overview of Utility Integrity Assessment data

Submitted online	Submitted offline	Complete surveys	Incomplete surveys	Anonymous entries	Distinct utilities	Distinct countries
Utility integrity assessment
Data from 75 surveys
62	13	55	20	44	27	11

Results

Survey completeness

Code

library(ggplot2)
library(ggthemes)
library(ggpattern)

ia_data <- ia_data |> mutate(status = case_when(
    last_page > 4 ~ "complete",
    last_page < 5 ~ "incomplete"
    ))

Most respondents were able to complete the survey (over 70%). This could indicate that the indicators are generally well understood and relevant for different service providers.

There are many possible reasons for not completing the survey, but it is possible that there are still categories of service providers for whom the indicators are not easy to follow, relevant, or applicable in their context. WIN indicates utility size is most likely a factor in how relevant the survey is. This could not be confirmed, nor could a threshold be specified with this data.

Figure 1 shows that all respondents using the offline tool (with support or some validation) completed the survey in full and provided contact information on page 6. Most online respondents also completed the survey, even without dedicated support, though 12 did abandon early on and provided information for only up to 3 indicators out of 15, on page 1 (indicator IRM1.1 and IRM1.2) and page 2 (IRM2.1).

Code

ggplot(ia_data, aes(
    x = last_page,
    fill = on_off)) +
  geom_bar_pattern(aes(pattern_fill = status),
                   pattern = 'circle') +
    labs(title = "Integrity assessment completeness by input method (online / offline)",
       x = "Last page acted on",
       y = "Count",
       fill = "Online / Offline",
       pattern_fill = "Status") +
  scale_fill_grey(start = 0.4, end = 0.8) +
  theme_minimal()

Figure 1: Integrity assessment completeness by input method (online/offline)

Main score trends

Code

#|eval: true

library(ggplot2)
library(ggthemes)

principle_overview <- ia_data |> 
  summarise(
    mean_allprinciple1 = mean(mean_principle1, na.rm = TRUE),
    mean_allprinciple2 = mean(mean_principle2, na.rm = TRUE),
    mean_allprinciple3 = mean(mean_principle3, na.rm = TRUE),
    mean_allprinciple4 = mean(mean_principle4, na.rm = TRUE),
    mean_allprinciple5 = mean(mean_principle5, na.rm = TRUE)
  )

principle_overview

# A tibble: 1 × 5
  mean_allprinciple1 mean_allprinciple2 mean_allprinciple3 mean_allprinciple4
               <dbl>              <dbl>              <dbl>              <dbl>
1               3.06               3.14               3.27               2.99
# ℹ 1 more variable: mean_allprinciple5 <dbl>

Code

ia_ordered_regions <- ia_data |> mutate(
  region = factor(region, levels = c("Eastern and Southern Africa", "Middle East and North Africa", "South Asia", "Latin America and Caribbean", "Western Europe", "UNKNOWN" ))
  )


ia_data_long <- ia_ordered_regions |>
    relocate(mean_principle1, .after = last_col()) |> 
    relocate(mean_principle2, .after = last_col()) |> 
    relocate(mean_principle3, .after = last_col()) |> 
    relocate(mean_principle4, .after = last_col()) |> 
    relocate(mean_principle5, .after = last_col()) |> 
      mutate(id = as.character(id)) |>
        pivot_longer(cols = irm11_short:irm53_short,
               names_to = "indicator",
               values_to = "score")


score_overview <- ia_data_long |> 
  mutate(
    indicator_name = case_when(
      indicator == "irm11_short" ~ "Integrity leadership",
      indicator == "irm12_short" ~ "Code of conduct",
      indicator == "irm21_short" ~ "Integrity risk assessment",
      indicator == "irm31_short" ~ "Control of conflicts of interest",
      indicator == "irm32_short" ~ "Whistleblowing",
      indicator == "irm33_short" ~ "Merit-based recruitment and selection",
      indicator == "irm34_short" ~ "Procurement controls",
      indicator == "irm35_short" ~ "Transparency and disclosure",
      indicator == "irm36_short" ~ "Feedback and participation",
      indicator == "irm37_short" ~ "Management and financial controls",
      indicator == "irm41_short" ~ "Sanctions against staff for integrity violations",
      indicator == "irm42_short" ~ "Sanctions against contractors for integrity violations",
      indicator == "irm51_short" ~ "Integrity risk monitoring",
      indicator == "irm52_short" ~ "Reveiw of integrity risk management function",
      indicator == "irm53_short" ~ "Responsiveness to external accountability mechanisms"
    )) |> 
  group_by(indicator, indicator_name) |> 
      summarise(
      mean_scores = mean(score, na.rm = TRUE),
      mode_scores = names(sort(table(score), decreasing = TRUE)[1]),
      sd_scores = sd(score, na.rm = TRUE)
  )

score_overview

# A tibble: 15 × 5
# Groups:   indicator [15]
   indicator   indicator_name                  mean_scores mode_scores sd_scores
   <chr>       <chr>                                 <dbl> <chr>           <dbl>
 1 irm11_short Integrity leadership                   3.09 5                1.57
 2 irm12_short Code of conduct                        3.03 3                1.42
 3 irm21_short Integrity risk assessment              3.14 5                1.43
 4 irm31_short Control of conflicts of intere…        3    1                1.5 
 5 irm32_short Whistleblowing                         2.78 1                1.52
 6 irm33_short Merit-based recruitment and se…        3.25 3                1.26
 7 irm34_short Procurement controls                   3.41 4                1.37
 8 irm35_short Transparency and disclosure            3.30 3                1.33
 9 irm36_short Feedback and participation             3.31 5                1.38
10 irm37_short Management and financial contr…        3.68 5                1.33
11 irm41_short Sanctions against staff for in…        3.05 4                1.33
12 irm42_short Sanctions against contractors …        2.92 2                1.30
13 irm51_short Integrity risk monitoring              3.09 5                1.47
14 irm52_short Reveiw of integrity risk manag…        2.83 1                1.45
15 irm53_short Responsiveness to external acc…        3.63 5                1.42

Code

ia_data_long_anon <- ia_data_long |> 
   mutate(anonymity = case_when(
  region == "UNKNOWN" ~ "unknown respondent",
  TRUE ~ "known respondent"
))

score_overview_anon <- ia_data_long_anon |> 
  group_by(indicator, anonymity) |> 
  summarise(
    mean_scores = mean(score, na.rm = TRUE),
    mode_scores = names(sort(table(score), decreasing = TRUE)[1]),
    sd_scores = sd(score, na.rm = TRUE)
  )

score_overview_anon

# A tibble: 30 × 5
# Groups:   indicator [15]
   indicator   anonymity          mean_scores mode_scores sd_scores
   <chr>       <chr>                    <dbl> <chr>           <dbl>
 1 irm11_short known respondent          3.52 5                1.55
 2 irm11_short unknown respondent        2.80 1                1.53
 3 irm12_short known respondent          3.35 3                1.43
 4 irm12_short unknown respondent        2.80 1                1.39
 5 irm21_short known respondent          3.32 5                1.60
 6 irm21_short unknown respondent        3    3                1.29
 7 irm31_short known respondent          2.84 1                1.66
 8 irm31_short unknown respondent        3.15 3                1.35
 9 irm32_short known respondent          2.58 1                1.59
10 irm32_short unknown respondent        2.97 1                1.45
# ℹ 20 more rows

Code

ia_data_long_filteredanon <- ia_data_long |> 
  filter(region != "UNKNOWN")

score_overview_filtered <- ia_data_long_filteredanon |> 
  group_by(indicator) |> 
  summarise(
    mean_scores = mean(score, na.rm = TRUE),
    mode_scores = names(sort(table(score), decreasing = TRUE)[1]),
    sd_scores = sd(score, na.rm = TRUE)
  )

score_overview_filtered

# A tibble: 15 × 4
   indicator   mean_scores mode_scores sd_scores
   <chr>             <dbl> <chr>           <dbl>
 1 irm11_short        3.52 5                1.55
 2 irm12_short        3.35 3                1.43
 3 irm21_short        3.32 5                1.60
 4 irm31_short        2.84 1                1.66
 5 irm32_short        2.58 1                1.59
 6 irm33_short        3.48 3                1.12
 7 irm34_short        3.68 4                1.28
 8 irm35_short        3.58 5                1.26
 9 irm36_short        3.74 5                1.21
10 irm37_short        4.06 5                1.18
11 irm41_short        3.23 2                1.26
12 irm42_short        3.03 2                1.40
13 irm51_short        3    3                1.44
14 irm52_short        2.7  1                1.53
15 irm53_short        4.23 5                1.14

Code

score_overview_filtered_source <- ia_data_long_filteredanon |> 
  group_by(indicator, on_off) |> 
  summarise(
    mean_scores = mean(score, na.rm = TRUE),
    mode_scores = names(sort(table(score), decreasing = TRUE)[1]),
    sd_scores = sd(score, na.rm = TRUE)
  )

score_overview_filtered_source

# A tibble: 30 × 5
# Groups:   indicator [15]
   indicator   on_off  mean_scores mode_scores sd_scores
   <chr>       <chr>         <dbl> <chr>           <dbl>
 1 irm11_short offline        2.85 3                1.41
 2 irm11_short online         4    5                1.50
 3 irm12_short offline        3    3                1.35
 4 irm12_short online         3.61 5                1.46
 5 irm21_short offline        3    3                1.41
 6 irm21_short online         3.56 5                1.72
 7 irm31_short offline        2    1                1.08
 8 irm31_short online         3.44 5                1.76
 9 irm32_short offline        1.85 1                1.14
10 irm32_short online         3.11 1                1.68
# ℹ 20 more rows

Code

score_overview_filtered_regional <- ia_data_long_filteredanon |> 
  group_by(indicator, region) |> 
  summarise(
    mean_scores = mean(score, na.rm = TRUE),
    mode_scores = names(sort(table(score), decreasing = TRUE)[1]),
    sd_scores = sd(score, na.rm = TRUE)
  )


score_variation_filtered_regional <- ia_data_long_filteredanon |> 
  filter(region != "Middle East and North Africa") |> 
    group_by(indicator, region) |> 
      summarise(sd_regions = sd(score, na.rm = TRUE)
  )

score_variation_filtered_regional

# A tibble: 60 × 3
# Groups:   indicator [15]
   indicator   region                      sd_regions
   <chr>       <fct>                            <dbl>
 1 irm11_short Eastern and Southern Africa      1.54 
 2 irm11_short South Asia                       1.03 
 3 irm11_short Latin America and Caribbean      1.93 
 4 irm11_short Western Europe                   0    
 5 irm12_short Eastern and Southern Africa      1.36 
 6 irm12_short South Asia                       0.408
 7 irm12_short Latin America and Caribbean      1.85 
 8 irm12_short Western Europe                   0.756
 9 irm21_short Eastern and Southern Africa      1.64 
10 irm21_short South Asia                       0.753
# ℹ 50 more rows

Code

ia_region_wide <- score_variation_filtered_regional |> 
  pivot_wider(
    names_from = region,
    values_from = sd_regions,
    names_prefix = "sd_"
  )

ia_region_wide

# A tibble: 15 × 5
# Groups:   indicator [15]
   indicator   sd_Eastern and Southern …¹ `sd_South Asia` sd_Latin America and…²
   <chr>                            <dbl>           <dbl>                  <dbl>
 1 irm11_short                       1.54           1.03                   1.93 
 2 irm12_short                       1.36           0.408                  1.85 
 3 irm21_short                       1.64           0.753                  1.69 
 4 irm31_short                       1.66           0.516                  1.25 
 5 irm32_short                       1.32           0.816                  1.49 
 6 irm33_short                       1.22           1.52                   1.16 
 7 irm34_short                       1.5            0.837                  0.641
 8 irm35_short                       1.67           0.816                  1.06 
 9 irm36_short                       1.39           0.816                  0.886
10 irm37_short                       1.59           0.837                  0.991
11 irm41_short                       1.39           0.548                  1.19 
12 irm42_short                       1.56           0.753                  1.41 
13 irm51_short                       1.12           0.753                  1.75 
14 irm52_short                       1.27           0.548                  1.41 
15 irm53_short                       1.59           0.837                  0.744
# ℹ abbreviated names: ¹`sd_Eastern and Southern Africa`,
#   ²`sd_Latin America and Caribbean`
# ℹ 1 more variable: `sd_Western Europe` <dbl>

Table 2 shows the mean scores aggregated at principle level. Scores for all principles are close to or above 3. Principle 3 (Integrity controls) is strongest whereas Principle 4 (Disciplinary action) is weakest, though the differences are relatively small. There is more variation across indvidual indicators.

Table 3 shows the mean scores for all the individual indicators. Service providers scored their management practices highest for indicator IRM5.3, which assesses responsiveness to external accountability mechanisms (including external audits and regulatory requirements). Service providers also scored themselves high for indicators IRM3.7 (management and financial controls) and IRM3.6 (feedback and participation). This would imply they have strong budgeting, accounting, and audit processes in place as well as clear connection policies (IRM3.7). It also implies they have a policy for public participation, forums for participation, and strong customer feedback mechanisms in place, which all influence their management choices (IRM3.6). The areas with weakest scores were IRM3.2 (on whistleblowing) followed by IRM5.2 (on review of integrity risk management function and processes) and IRM4.2 (on sanctions on contractors for non-compliance).

These findings contrast somewhat with analyses of results of other integrity management processes where service providers have identified the integrity risks they face, in particular risks related to operations and customer relations (Avello et al., 2023). Indeed, this data shows service providers are rating themselves relatively highly even in areas where they are still perceiving high risks, most clearly in how they interact with customers. The disconnect may be linked to regional variations not captured in integrity risk management process analyses, or it could indicate that the practices described in the highest level of indicator IRM3.6 are not ambitious enough to mitigate significant risk, or that some utilities misjudge the scale and effectiveness of their feedback and participation practices.

A weakness of the Utility Integrity Assessment is that it provides only limited insight on practices to mitigate integrity risks related to operations, even though these are perceived as high.

Code

principle_overview |> 
  gt() |> 
  tab_header (title = "Utility Integrity Assessment- Mean scores per principle") |> 
  cols_label(mean_allprinciple1 = "Principle 1 - Tone from the top",
    mean_allprinciple2 = "Principle 2 - Integrity assessment",
    mean_allprinciple3 = "Principle 3 - Integrity controls",
    mean_allprinciple4 = "Principle 4 - Corrective action",
    mean_allprinciple5 = "Principle 5 - Monitoring") |> 
      fmt_number(
      columns = everything(), 
      decimals = 2
      )

Table 2: Mean assessment scores per principle

Principle 1 - Tone from the top	Principle 2 - Integrity assessment	Principle 3 - Integrity controls	Principle 4 - Corrective action	Principle 5 - Monitoring
Utility Integrity Assessment- Mean scores per principle
3.06	3.14	3.27	2.99	3.18

Code

score_overview |>
    gt() |>
    tab_header(title = "Utility Integrity Assessment: Scores per indicator") |>
    cols_label(indicator_name = "Indicator",
               mean_scores = "Mean",
               mode_scores = "Mode",
               sd_scores = "Standard Deviation"
               ) |> 
    fmt_number(
      columns = everything(), 
      decimals = 2
      )

Table 3: Mean assessment scores per indicator

Indicator	Mean	Mode	Standard Deviation
Utility Integrity Assessment: Scores per indicator
irm11_short
Integrity leadership	3.09	5	1.57
irm12_short
Code of conduct	3.03	3	1.42
irm21_short
Integrity risk assessment	3.14	5	1.43
irm31_short
Control of conflicts of interest	3.00	1	1.50
irm32_short
Whistleblowing	2.78	1	1.52
irm33_short
Merit-based recruitment and selection	3.25	3	1.26
irm34_short
Procurement controls	3.41	4	1.37
irm35_short
Transparency and disclosure	3.30	3	1.33
irm36_short
Feedback and participation	3.31	5	1.38
irm37_short
Management and financial controls	3.68	5	1.33
irm41_short
Sanctions against staff for integrity violations	3.05	4	1.33
irm42_short
Sanctions against contractors for integrity violations	2.92	2	1.30
irm51_short
Integrity risk monitoring	3.09	5	1.47
irm52_short
Reveiw of integrity risk management function	2.83	1	1.45
irm53_short
Responsiveness to external accountability mechanisms	3.63	5	1.42

Impact of anonymity

Figure 2 shows the mean scores per principle for all respondents and indicates, with dots, the impact of anonymity on mean scores. Unknown respondents appear to have assessed themselves more severely across all indicators except indicators IRM3.1 (on control of conflicts of interest), IRM 3.2 (on whistleblower protection), IRM5.1 (on integrity risk monitoring), and IRM5.2 (on review of integrity risk management function and processes)

Code

ggplot(score_overview, aes(
    x = indicator,
    y = mean_scores)) +
  geom_col() +
  geom_point(data = score_overview_anon, 
            aes(x = indicator, y = mean_scores, group = anonymity, color = anonymity)) +
  labs(title = "Mean scores across integrity assessment indicators, with comparison of entries by known or anonymous respondents",
       x = "Indicator",
       y = "Mean scores", 
       color = "Data anonymity") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 80, hjust = 0.8)
        )

Figure 2: Mean scores across integrity assessment indicators, with comparison of entries by known or anonymous respondents

Impact of input method, support and survey validation

Figure 3 shows the differences between scores of known utilities that responded online and those of known utilities that responded offline, received some support, briefly justified their responses, and had their responses reviewed and commented on.

There is marked difference between scores. Offline, verified scores are significantly lower than online scores, except for indicator IRM5.3 on responsiveness to external accountability mechanisms. The difference is most pronounced for indicators IRM5.2 on review of integrity risk management function and processes, IRM3.1 on control of conflicts of interest, and IRM3.2 on whistleblowing. The validation process does therefore have an impact though it is possible that some of the differences in scores could also be tied to regional differences and different regulatory requirements on service providers (regional distribution is not the same across online and offline responses).

Code

ggplot(score_overview_filtered_source, aes(
    x = indicator,
    y = mean_scores, 
    fill = on_off)) +
  geom_col(position = "dodge") +
  labs(title = "Comparison of mean scores across integrity assessment indicators from known respondents, according to source (online/offline)",
       x = "Indicator",
       y = "Mean scores", 
       fill = "Source (online/offline)") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 80, hjust = 0.8)
        )

Figure 3: Comparison of mean scores across integrity assessment indicators from known respondents, according to source (online/offline)

Regional variations

Figure 4 shows mean scores of all known utilities and how these scores differ for utilities grouped by region. It is to be noted that some regions are not represented as widely as others. Utilities in Western Europe appear to have somewhat higher scores on many indicators but other scores are quite mixed. The main differences can possibly be explained by different policy environments and legal status for service providers, or different regulatory requirements.

However, as shown in Table 4, there is still significant spread in the data for utilities within a region. This suggests other factors, including possibly different dates of establishment of formal service provision in a country, size of surveyed utilities, or other more local context factors could also be influencing the regional trends. These require further exploration.

Code

ggplot(score_overview_filtered, aes(
    x = indicator,
    y = mean_scores)) +
  geom_col() +
  geom_point(data = score_overview_filtered_regional, 
            aes(x = indicator, y = mean_scores, group = region, color = region)) +
  geom_line(data = score_overview_filtered_regional, 
            aes(x = indicator, y = mean_scores, group = region, color = region)) + 
  labs(title = "Comparison of mean scores across integrity assessment indicators from known respondents, according to region",
       x = "Indicator",
       y = "Mean scores", 
       color = "Region") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 80, hjust = 0.8)
        )

Figure 4: Comparison of mean scores across integrity assessment indicators from known respondents, according to region

Code

ia_region_wide |> 
    gt() |>
    tab_header(title = "Utility Integrity Assessment: Variation in regions") |>
    fmt_number(
      columns = everything(), 
      decimals = 2)

Table 4: Assessment score variation in regions

sd_Eastern and Southern Africa	sd_South Asia	sd_Latin America and Caribbean	sd_Western Europe
Utility Integrity Assessment: Variation in regions
irm11_short
1.54	1.03	1.93	0.00
irm12_short
1.36	0.41	1.85	0.76
irm21_short
1.64	0.75	1.69	0.00
irm31_short
1.66	0.52	1.25	0.76
irm32_short
1.32	0.82	1.49	0.79
irm33_short
1.22	1.52	1.16	0.69
irm34_short
1.50	0.84	0.64	1.13
irm35_short
1.67	0.82	1.06	1.21
irm36_short
1.39	0.82	0.89	0.49
irm37_short
1.59	0.84	0.99	0.00
irm41_short
1.39	0.55	1.19	0.79
irm42_short
1.56	0.75	1.41	1.60
irm51_short
1.12	0.75	1.75	0.79
irm52_short
1.27	0.55	1.41	1.21
irm53_short
1.59	0.84	0.74	0.00

Conclusions

The data from Utility Integrity Assessments is still too limited to confirm decisive trends in integrity management practices of water and sanitation service providers. However, the data does still suggest that integrity management practices could be improved, especially to mitigate conflicts of interest, take action against non-compliant contractors, and protect whistleblowers.
Many organisational and sector anti-corruption processes focus on procurement control. This is an important area of risk though it is not the weakest management area and some progress appears to already have been made.
The completeness of responses, the spread of scores, and the spread of scores even within regions suggest the tool is able to capture practices that are influenced by the management of individual utilities, and not just broad contextual trends. The tool can thus provide important insight for utility managers (and for regulatory authorities) looking to mitigate corruption and integrity risks and in this way strengthen performance and contribute to achieving SDG6.

References

Acuña Mantilla, K., & Vergara Stuardo, J. (2024). Poner manos a la obra con la integridad: herramientas para evaluar, gestionar y fortalecer la integridad en organizaciones del sector de agua y saneamiento en América Latina y el Caribe. Inter-American Development Bank. https://doi.org/10.18235/0012895

Avello, P., Allakulov, U., Barreto-Dillon, L., Das, B., Hermann-Friede, J., Hubendick, L., Feuerstein, L., & Jiménez Fdez de Palencia, A. (2023). The integrity management toolbox in action: A study of 22 urban water service cases. Journal of Water, Sanitation and Hygiene for Development, 13(12), 952–961. https://doi.org/10.2166/washdev.2023.137

Barreto-Dillon, L., Basani, M., Simone, F. D., & Cotlear, B. (2018). Transparencia: Impulsando eficiencia en empresas proveedoras de servicios de agua y saneamiento: Buenas prácticas en cuatro empresas de América Latina. IDB Publications. https://doi.org/10.18235/0001114

Bhatt, J. D. (2024). The Politics of Performance Benchmarking in Urban Water Supply: Sacrificing Equity on the Altar of Efficiency. 17(2).

Cubillo, F., Allakulov, U., Patiño Piñeros, D., & Basani, M. (2022). Análisis focalizado: Integridad empresarial en empresas prestadoras de servicios de agua y saneamiento. IDB Publications. https://doi.org/10.18235/0004230

Water Integrity Network (WIN). (2021). Water Integrity Global Outlook 2021: Water Integrity in Urban Water and Sanitation (2). WIN.