Worldwide Water Access: Tapping into a Well of Data

October 8th, 2019

This article is by Chandler McCann and originally appeared on the DataRobot Blog here:


Access to water is a fundamental human right, and it is one of UNICEF’s global sustainable development goals (SDG #6). Around the world, nearly one billion people (mostly in Africa and Asia) rely on rural water points, such as hand pumps or taps, for their daily water use. These water points are a big part of the community and an essential factor for life. Unfortunately, after about three years of service, these water points tend to break. In fact, it’s estimated that at any given time, roughly 25% of the world’s water points are not functioning.

In addition to health problems, the lack of access to drinkable water has huge negative impacts on other aspects of life. Without functioning water points, people are forced to walk long distances, over 30 minutes to an hour, to wait in line for their daily water supply and then carry it all the way back to their homes. This task usually falls to women, which has large impacts on gender equity and education, as this task can occupy a substantial part of their day.


The Data-Driven Solution

DataRobot’s customer, the Global Water Challenge, wanted to understand why these breaks were occurring, so they began gathering data for the first time. Although there had been massive investments in water point construction, no one had a complete picture of water point functionality. Data was scattered across multiple sources, even within one country, and generally collected in different formats and mediums.

Enter Brian Banks, Director of Strategic Initiatives at the Global Water Challenge. Brian wanted to harness the data that existed in a holistic way that was useful. But, what data is the right data to collect? Brian spent nearly two years traveling around the world asking experts this question: “How do we create a data standard for water points?”

Out of these conversations, the team built what is now known as The Water Point Data Exchange (WPDx), the first harmonized database of water points from around the world. WPDx allows countries and organizations to share their water data, resulting in a database that grew from tens of thousands of data points to over half a million today.

Consolidating the data was a huge task, and once complete, begged the question: What do we do with it? Brian is not a data scientist but knew there were useful insights in the data that were beyond simple dashboarding. Brian tried all of the ‘data for good’ routes available to him: free consulting, cloud resources, and even working for months to set up a hackathon. Some results were interesting, but none really had the impact he was hoping for, and in all cases (since Brian couldn’t code), he couldn’t work with the code-based products they left him.

When Brian started working with DataRobot, things changed. In a few hours, Brian was able to upload his data from WPDx and build a model to answer some of the important questions he’d been looking for, such as, “Can we predict which water point will be broken in the future?” In an afternoon, he was able to accomplish on his own what other groups had attempted to do over the course of a year.

Working with DataRobot, Brian built models for 13 countries and began integrating these predictions in to a web app that maps out which water points are working (or not working) along with meta-data around the type of water points, the water source, location, repair priority, and (crucially) which water points are likely to be broken in the future.