Integrity Practices of Global Water Service Providers

A Review of Utility Integrity Assessment Data

Author
Affiliation

Claire Grandadam

Published

January 31, 2026

Introduction

To reach national policy objectives for water and sanitation service delivery and SDG6, water service providers globally must increase service coverage and service quality in the face of growing challenges related to financing, population growth, and climate change. This has spurred high interest in understanding the drivers of service provider performance, particularly from service provider managers, policy-makers, regulators, banks and other funders, as well as sector analysts.

National and global utility benchmarking processes have been used widely for such analyses, to assess the way service providers operate, and to orient sector reforms. The most common benchmarking tools tend to focus on service quality, financial management, and operational efficiency. Recently, water sector utility benchmarking tools have also been evolving in part in response to two developments:

  • First, the recognition that performance is affected by governance issues and corruption, and the related evidence that anti-corruption and integrity management can support utility performance (Acuña Mantilla & Vergara Stuardo, 2024; Barreto-Dillon et al., 2018; Water Integrity Network (WIN), 2021). There is a need for better tools that can support service providers striving for integrity, equity, sustainability, and resilience.

  • Second, growing criticism of the limits and bias in major utility benchmarking tools that have primarily focused on efficiency, to the detriment of distributional equity (Bhatt, 2024).

Major utility benchmarking tools such as Aquarating and NewIBnet recently introduced new process indicators focusing on management practices (in 2022 and 2023 respectively) (Cubillo et al., 2022). There is however limited data available on their use. The Water Integrity Network (WIN) also developed a set of indicators in 2018, to assess integrity management practices of service providers.

This analysis aims to examine anonymised data from applications of the Water Integrity Network tool since 2018, to asses the tool’s relevance and to understand patterns of use, possible resistance and challenges to assessing integrity practices, and trends in integrity management practices.

Methods

WIN developed integrity indicators for utilities with support from the Inter-American Development Bank and in partnership with the Consortium for Water Integrity in Latin America (with SIWI and cewas). This set of indicators is generally referred to as the Utility Integrity Assessment. It is part of WIN’s InWASH integrity management toolbox for water and sanitation service providers, an intensive process for prioritising and managing the integrity risks service providers face in their daily operations (a process which can also be run without the Utility Integrity Assessment). The tool is also freely available separately as an online survey at https://www.waterintegritynetwork.net/inwash.

Early pilots and analyses by WIN indicated that there are patterns in how good governance and integrity practices are implemented in organisations in the water and sanitation sectors globally and that benchmarking can be useful for urban service providers of a certain size. This is the case even though local context and the local regulatory environment of course influence which governance and integrity practices are possible and effective for different utilities.

Based on these observations, the Utility Integrity Assessment was designed as a short survey examining 15 indicators categorised under 5 integrity principles:

  1. Tone at the top (with 2 indicators: on leadership and on codes of conduct);

  2. Integrity risk assessment (with 1 indicator);

  3. Integrity controls (with 7 indicators, on control of conflicts of interest, whistleblowing, recruitment, procurement, disclosure, participation, and financial management);

  4. Corrective action (with 2 indicators, on sanctions for staff and on sanctions for contractors);

  5. Monitoring (with 3 indicators, on risk monitoring, review of risk management functions, and responsiveness to external accountability mechanisms).

Service providers use the tool for a self-assessment. They score themselves for each indicator on a scale of 1 to 5, based on clear descriptions of the practices that they would have in place for each score. All indicators are formulated in similar ways. For example, the second indicator (IRM1.2) examines how effectively a service provider sets integrity standards and enforces them through a code of conduct or similar document. The 5 possible scores are:

  • Score 1: There is no written code of conduct (or similar document) outlining what the Utility expects from staff regarding values, rules, standards, and principles.

  • Score 2: Between 1 and 3.

  • Score 3: The code of conduct (or a similar document) contains most of the following elements (some elements may be missing): an ethical framework for decision making, generic examples of what constitute acceptable and unacceptable behaviour, guidelines on reporting problems anonymously, accountability and disciplinary policies for unethical behaviour, a list of ethics and compliance resources. The code of conduct was not revised in the last 3 calendar years. There is evidence that the Utility has organised a training on the code of conduct in the past, but the training is not routinely provided.

  • Score 4: Between 3 and 5.

  • Score 5: The code of conduct (or a similar document) contains ALL of the elements outlined in Level 3. The code of conduct was revised at most 3 years ago or more recently, after 1) the most recent changes to workplace profiles, including restructuring, relocation, changes in key roles or decision-making processes, or 2) the most recent changes in the external environment, including sector reform, new relevant legislation, changes in government strategies or in contractors’ business practices. Code of conduct training is routinely given to new employees as part of their induction programme.

This analysis looks at available data since 2018, from service providers that filled the survey online independently, and from service providers responding offline either as pilot users of the Utility Integrity Assesment or as participants of an InWASH process. The offline respondents received some support to fill in the survey and had to provide some justification for the scores (a brief explanation or link to evidence). WIN or partner expert or consultant also read and validated the scores.

The analysis looks at:

  • whether there are clear trends, strengths, and weaknesses across indicators and principles,

  • whether scores differ based on the way the the survey was taken,

  • whether there is significant variation in the practices of service providers across regions.

Code
library(tidyverse)
library(gt)
library(gtsummary)
library(knitr)
library(DT)

ia_data <- read_csv(here::here("data/processed/processed_data.csv"))

glimpse(ia_data)
Rows: 75
Columns: 30
$ on_off          <chr> "online", "online", "online", "online", "online", "onl…
$ id              <dbl> 20, 21, 26, 27, 29, 31, 33, 43, 47, 54, 49, 50, 51, 52…
$ date_submitted  <date> 2020-09-20, 2020-09-20, NA, NA, NA, 2030-09-20, NA, 2…
$ last_page       <dbl> 6, 6, 5, 4, 2, 6, 5, 6, 6, 6, 6, 2, 6, 6, 6, 6, 2, 5, …
$ date_started    <date> 2020-09-20, 2020-09-20, 2028-09-20, 2029-09-20, 2029-…
$ date_last_act   <date> 2020-09-20, 2020-09-20, 2028-09-20, 2029-09-20, 2029-…
$ irm11_short     <dbl> 5, 3, 1, 4, 5, 4, 3, 5, 5, 5, 5, 5, 5, 5, 5, 5, 1, 1, …
$ irm12_short     <dbl> 5, 3, 1, 4, 3, 4, 2, 4, 5, 3, 5, 5, 5, 5, 5, 5, 1, 1, …
$ mean_principle1 <dbl> 5.0, 3.0, 1.0, 4.0, 4.0, 4.0, 2.5, 4.5, 5.0, 4.0, 5.0,…
$ irm21_short     <dbl> 5, 1, 1, 3, 4, 5, 1, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 2, …
$ mean_principle2 <dbl> 5, 1, 1, 3, 4, 5, 1, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 2, …
$ irm31_short     <dbl> 4, 1, 1, 4, NA, 4, 4, 4, 5, 5, 5, NA, 5, 5, 3, 5, NA, …
$ irm32_short     <dbl> 4, 1, 1, 3, NA, 4, 4, 5, 4, 5, 4, NA, 5, 5, 3, 5, NA, …
$ irm33_short     <dbl> 3, 4, 3, 4, NA, 4, 3, 3, 4, 3, 4, NA, 5, 4, 3, 4, NA, …
$ irm34_short     <dbl> 5, 3, 1, 4, NA, 4, 2, 4, 5, 5, 5, NA, 5, 5, 2, 5, NA, …
$ irm35_short     <dbl> 5, 4, 2, 5, NA, 5, 2, 3, 5, 3, 5, NA, 5, 4, 2, 5, NA, …
$ irm36_short     <dbl> 4, 5, 3, 5, NA, 5, 2, 4, 5, 4, 5, NA, 5, 5, 4, 5, NA, …
$ irm37_short     <dbl> 4, 5, 1, 4, NA, 4, 5, 5, 5, 5, 5, NA, 5, 5, 5, 5, NA, …
$ mean_principle3 <dbl> 4.142857, 3.285714, 1.714286, 4.142857, NA, 4.285714, …
$ irm41_short     <dbl> 4, 3, 2, 4, NA, 4, 1, 4, 4, 5, 4, NA, 5, 5, 3, 5, NA, …
$ irm42_short     <dbl> 5, 4, 2, NA, NA, 4, 4, 4, 2, 4, 4, NA, 5, 1, 2, 5, NA,…
$ mean_principle4 <dbl> 4.5, 3.5, 2.0, 4.0, NA, 4.0, 2.5, 4.0, 3.0, 4.5, 4.0, …
$ irm51_short     <dbl> 3, 3, 1, NA, NA, 4, 1, 5, 5, 4, 5, NA, 5, 5, 3, 5, NA,…
$ irm52_short     <dbl> 4, 5, 2, NA, NA, 3, 4, 5, 5, 3, 4, NA, 5, 5, 2, 5, NA,…
$ irm53_short     <dbl> 4, 5, 1, NA, NA, 5, 4, 5, 5, 5, 5, NA, 5, 5, 5, 5, NA,…
$ mean_principle5 <dbl> 3.666667, 4.333333, 1.333333, NA, NA, 4.000000, 3.0000…
$ utility_code    <dbl> 1, 2, 3, NA, NA, 5, NA, 6, 7, 7, 8, NA, 7, 10, 11, 12,…
$ country_id      <dbl> 252, 252, 252, NA, NA, 252, NA, NA, 3, 3, 3, NA, 3, 3,…
$ region          <chr> "Eastern and Southern Africa", "Eastern and Southern A…
$ version         <chr> "current", "current", "current", "current", "current",…
Code
tbl_data_overview <- ia_data |>  
  summarise(
      online_count = sum(on_off == "online", na.rm = TRUE),
      offline_count = sum(on_off == "offline", na.rm = TRUE), 
      completed = sum(last_page > 4),
      incomplete = sum(last_page < 5),
      anon = sum(region == "UNKNOWN"),
      utilities = n_distinct(utility_code,  na.rm = TRUE),
      countries = n_distinct(country_id,  na.rm = TRUE)
  )

Table 1 shows the number of entries assessed, noting how many were submitted online and offline, how many were complete or not, how many were anonymous, and the number of utilities and countries represented.

Code
tbl_data_overview |>
    gt() |>
    tab_header(title = "Utility integrity assessment",
              subtitle = "Data from 75 surveys") |>
    cols_label(online_count = "Submitted online",
               offline_count = "Submitted offline",
               completed = "Complete surveys",
               incomplete = "Incomplete surveys",
               anon = "Anonymous entries",
               utilities = "Distinct utilities",
               countries = "Distinct countries")
Table 1: Overview of Utility Integrity Assessment data
Utility integrity assessment
Data from 75 surveys
Submitted online Submitted offline Complete surveys Incomplete surveys Anonymous entries Distinct utilities Distinct countries
62 13 55 20 44 27 11

Results

Survey completeness

Code
library(ggplot2)
library(ggthemes)
library(ggpattern)

ia_data <- ia_data |> mutate(status = case_when(
    last_page > 4 ~ "complete",
    last_page < 5 ~ "incomplete"
    ))

Most respondents were able to complete the survey (over 70%). This could indicate that the indicators are generally well understood and relevant for different service providers.

There are many possible reasons for not completing the survey, but it is possible that there are still categories of service providers for whom the indicators are not easy to follow, relevant, or applicable in their context. WIN indicates utility size is most likely a factor in how relevant the survey is. This could not be confirmed, nor could a threshold be specified with this data.

Figure 1 shows that all respondents using the offline tool (with support or some validation) completed the survey in full and provided contact information on page 6. Most online respondents also completed the survey, even without dedicated support, though 12 did abandon early on and provided information for only up to 3 indicators out of 15, on page 1 (indicator IRM1.1 and IRM1.2) and page 2 (IRM2.1).

Code
ggplot(ia_data, aes(
    x = last_page,
    fill = on_off)) +
  geom_bar_pattern(aes(pattern_fill = status),
                   pattern = 'circle') +
    labs(title = "Integrity assessment completeness by input method (online / offline)",
       x = "Last page acted on",
       y = "Count",
       fill = "Online / Offline",
       pattern_fill = "Status") +
  scale_fill_grey(start = 0.4, end = 0.8) +
  theme_minimal()
Figure 1: Integrity assessment completeness by input method (online/offline)

Impact of anonymity

Figure 2 shows the mean scores per principle for all respondents and indicates, with dots, the impact of anonymity on mean scores. Unknown respondents appear to have assessed themselves more severely across all indicators except indicators IRM3.1 (on control of conflicts of interest), IRM 3.2 (on whistleblower protection), IRM5.1 (on integrity risk monitoring), and IRM5.2 (on review of integrity risk management function and processes)

Code
ggplot(score_overview, aes(
    x = indicator,
    y = mean_scores)) +
  geom_col() +
  geom_point(data = score_overview_anon, 
            aes(x = indicator, y = mean_scores, group = anonymity, color = anonymity)) +
  labs(title = "Mean scores across integrity assessment indicators, with comparison of entries by known or anonymous respondents",
       x = "Indicator",
       y = "Mean scores", 
       color = "Data anonymity") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 80, hjust = 0.8)
        )
Figure 2: Mean scores across integrity assessment indicators, with comparison of entries by known or anonymous respondents

Impact of input method, support and survey validation

Figure 3 shows the differences between scores of known utilities that responded online and those of known utilities that responded offline, received some support, briefly justified their responses, and had their responses reviewed and commented on.

There is marked difference between scores. Offline, verified scores are significantly lower than online scores, except for indicator IRM5.3 on responsiveness to external accountability mechanisms. The difference is most pronounced for indicators IRM5.2 on review of integrity risk management function and processes, IRM3.1 on control of conflicts of interest, and IRM3.2 on whistleblowing. The validation process does therefore have an impact though it is possible that some of the differences in scores could also be tied to regional differences and different regulatory requirements on service providers (regional distribution is not the same across online and offline responses).

Code
ggplot(score_overview_filtered_source, aes(
    x = indicator,
    y = mean_scores, 
    fill = on_off)) +
  geom_col(position = "dodge") +
  labs(title = "Comparison of mean scores across integrity assessment indicators from known respondents, according to source (online/offline)",
       x = "Indicator",
       y = "Mean scores", 
       fill = "Source (online/offline)") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 80, hjust = 0.8)
        )
Figure 3: Comparison of mean scores across integrity assessment indicators from known respondents, according to source (online/offline)

Regional variations

Figure 4 shows mean scores of all known utilities and how these scores differ for utilities grouped by region. It is to be noted that some regions are not represented as widely as others. Utilities in Western Europe appear to have somewhat higher scores on many indicators but other scores are quite mixed. The main differences can possibly be explained by different policy environments and legal status for service providers, or different regulatory requirements.

However, as shown in Table 4, there is still significant spread in the data for utilities within a region. This suggests other factors, including possibly different dates of establishment of formal service provision in a country, size of surveyed utilities, or other more local context factors could also be influencing the regional trends. These require further exploration.

Code
ggplot(score_overview_filtered, aes(
    x = indicator,
    y = mean_scores)) +
  geom_col() +
  geom_point(data = score_overview_filtered_regional, 
            aes(x = indicator, y = mean_scores, group = region, color = region)) +
  geom_line(data = score_overview_filtered_regional, 
            aes(x = indicator, y = mean_scores, group = region, color = region)) + 
  labs(title = "Comparison of mean scores across integrity assessment indicators from known respondents, according to region",
       x = "Indicator",
       y = "Mean scores", 
       color = "Region") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 80, hjust = 0.8)
        )
Figure 4: Comparison of mean scores across integrity assessment indicators from known respondents, according to region
Code
ia_region_wide |> 
    gt() |>
    tab_header(title = "Utility Integrity Assessment: Variation in regions") |>
    fmt_number(
      columns = everything(), 
      decimals = 2)
Table 4: Assessment score variation in regions
Utility Integrity Assessment: Variation in regions
sd_Eastern and Southern Africa sd_South Asia sd_Latin America and Caribbean sd_Western Europe
irm11_short
1.54 1.03 1.93 0.00
irm12_short
1.36 0.41 1.85 0.76
irm21_short
1.64 0.75 1.69 0.00
irm31_short
1.66 0.52 1.25 0.76
irm32_short
1.32 0.82 1.49 0.79
irm33_short
1.22 1.52 1.16 0.69
irm34_short
1.50 0.84 0.64 1.13
irm35_short
1.67 0.82 1.06 1.21
irm36_short
1.39 0.82 0.89 0.49
irm37_short
1.59 0.84 0.99 0.00
irm41_short
1.39 0.55 1.19 0.79
irm42_short
1.56 0.75 1.41 1.60
irm51_short
1.12 0.75 1.75 0.79
irm52_short
1.27 0.55 1.41 1.21
irm53_short
1.59 0.84 0.74 0.00

Conclusions

  • The data from Utility Integrity Assessments is still too limited to confirm decisive trends in integrity management practices of water and sanitation service providers. However, the data does still suggest that integrity management practices could be improved, especially to mitigate conflicts of interest, take action against non-compliant contractors, and protect whistleblowers.

  • Many organisational and sector anti-corruption processes focus on procurement control. This is an important area of risk though it is not the weakest management area and some progress appears to already have been made.

  • The completeness of responses, the spread of scores, and the spread of scores even within regions suggest the tool is able to capture practices that are influenced by the management of individual utilities, and not just broad contextual trends. The tool can thus provide important insight for utility managers (and for regulatory authorities) looking to mitigate corruption and integrity risks and in this way strengthen performance and contribute to achieving SDG6.

References

Acuña Mantilla, K., & Vergara Stuardo, J. (2024). Poner manos a la obra con la integridad: herramientas para evaluar, gestionar y fortalecer la integridad en organizaciones del sector de agua y saneamiento en América Latina y el Caribe. Inter-American Development Bank. https://doi.org/10.18235/0012895
Avello, P., Allakulov, U., Barreto-Dillon, L., Das, B., Hermann-Friede, J., Hubendick, L., Feuerstein, L., & Jiménez Fdez de Palencia, A. (2023). The integrity management toolbox in action: A study of 22 urban water service cases. Journal of Water, Sanitation and Hygiene for Development, 13(12), 952–961. https://doi.org/10.2166/washdev.2023.137
Barreto-Dillon, L., Basani, M., Simone, F. D., & Cotlear, B. (2018). Transparencia: Impulsando eficiencia en empresas proveedoras de servicios de agua y saneamiento: Buenas prácticas en cuatro empresas de América Latina. IDB Publications. https://doi.org/10.18235/0001114
Bhatt, J. D. (2024). The Politics of Performance Benchmarking in Urban Water Supply: Sacrificing Equity on the Altar of Efficiency. 17(2).
Cubillo, F., Allakulov, U., Patiño Piñeros, D., & Basani, M. (2022). Análisis focalizado: Integridad empresarial en empresas prestadoras de servicios de agua y saneamiento. IDB Publications. https://doi.org/10.18235/0004230
Water Integrity Network (WIN). (2021). Water Integrity Global Outlook 2021: Water Integrity in Urban Water and Sanitation (2). WIN.