This article was written by Susan CS and originally appeared on the Alteryx Data Science Blog here: https://community.alteryx.com/t5/Data-Science/Locating-Sustainable-Living-with-Data-Science-an-Analytic-App/ba-p/706297
With more companies accepting that remote work is feasible for their employees, some formerly location-locked workers are exploring new possibilities for places to live.
Our very own @ewoodard suggested that we could use data and Alteryx to check out new options! We ran with her great idea and thought it would be interesting to see how different places measured up on aspects of sustainability. If sustainable living is important to you, why not pick a place that reflects those values?
While you can find plenty of lists of “green cities” on the interwebz, neighborhoods vary within cities. And since Alteryx lets us work through big datasets very quickly, why not consider, oh, every neighborhood in the U.S.? Mashing up public data with spatial information, and applying some analytic thinking and data science, all resulted in an analytic app you can try now in the Alteryx Analytics Gallery! (But first, read on to find out how it works …)
Gathering Neighborhood Data
I gathered data on the level of the census block group (CBG), which is the smallest unit for which the U.S. Census Bureau provides its sample data. Each CBG typically contains 600 to 3,000 people. CBGs may work a bit better than ZIP codes for grouping households in consistent ways. (Check out this article for a full discussion of some of the possible issues with ZIP codes.)
The analytic app uses data from these sources:
- The Environmental Protection Agency’s Walkability Index, Smart Location Database, EnviroAtlas, and Air Quality Index Reports
- The U.S. Department of Agriculture’s Food Environment Atlas and National Farmers Market Directory
- The U.S. Department of Energy’s Alternative Fueling Station Locator
Fortunately, many government agencies offer data at the CBG level, letting us drill down to small areas that best fit our sustainability criteria. When the agencies didn’t specify CBGs, I was able to use Alteryx’s Spatial Match and/or Allocate Append tools to identify them myself.
Finally, CBG-level data wasn’t available for air quality, so I assigned each neighborhood the air quality metrics for its metro area, when available. Air does move around, after all!
Using Clustering to Diversify Results
Unfortunately, it can be tricky to find neighborhoods that satisfy all of the sustainability criteria, especially if you prefer to be in a smaller city or outside of a metro area. I wanted the app to still offer something to the user who maybe only received a couple of matches to their criteria.
I used clustering (the K-Centroids Cluster Analysis and Append Clusters tools) to identify and assign groups to all of the neighborhoods. With the clusters identified, the app will offer not just the perfect matches for the user’s chosen criteria, but also offer five more neighborhoods from the same cluster.
In addition to ensuring every user sees more than just a few results, this approach also might spark new ideas for the user, in terms of thinking of new geographic possibilities, and perhaps even noticing patterns or rethinking their original criteria selections.
Building the App
With the data all tidied up and the clusters assigned to each CBG, the app was straightforward to construct (despite the fact that this is my first analytic app!). I found the resources that @WillM compiled for Santalytics 2020 to be quite helpful, so check those out if you’re a fellow app newbie.
The app allows the user to choose how important various criteria are for their location selection, and then filters the CBGs to find those meeting (or exceeding) the criteria. The app provides maps of the locations and a table of the key information for each, plus a link to Google Maps for each place so it’s easy to investigate further.
Are you ready for a greener new neighborhood? Try the app to see which places might fit you and your sustainability goals best.