Enriching Tax Data with Location Information

February 13th, 2020

This article is by Robert Lenius and originally appeared on the Alteryx Engine Works Blog here: https://community.alteryx.com/t5/Engine-Works-Blog/Enriching-Tax-Data-with-Location-Information/ba-p/509848


Working in finance, I’m somewhat bummed that I don’t often get to explore the possibilities within location-based data. That’s why I was thrilled when a project fell into my lap to visually analyze taxes paid across German villages. However, this excitement was short-lived after seeing the dataset I would be working with. See, the problem is spatial tools are easy to work with if you have data points such as zip codes, states, or countries, but trying to locate German villages within these tools simply doesn’t work. For that, we would need the latitude and longitude of the villages themselves, and this, unfortunately, didn’t exist in my data set.

Luckily for me, I stumbled across the organization GeoNames which provides an online database that “…covers all countries and contains over eleven million placements that are available for download free of charge.” Better yet, they offer a free API from which you can query their database. So, using Alteryx, I was able to build a macro that would work with this API to do location-based searches and return exact latitudes and longitudes. If you’re interested in using this macro, you can find it available here in Alteryx Gallery. Through this tool, I was able to enrich my dataset with the latitude and longitude for these German Cities, making it possible to provide visual analysis.

The setup is straight forward; I first use a unique tool to grab only the unique villages from my dataset and then run this list through the macro. I then bring everything back together and output it to a file for Tableau to pick up.


Now that the data has been successfully enriched, a visualization can be generated. My example set had 102 villages in it; I’d hate to imagine looking up the latitude and longitude of each one and manually keying them in!


Here’s the visualized portion of that data set, but we have a problem. Immediately I can see that there are seven null values—meaning that these locations were not found within the GeoNames database. What’s more interesting though, is I have datapoints showing in the United States and Africa even though my data set was intended for only Germany. What gives? Well, the problem is multiple places around the world have the same name. For example, when booking a trip to Dublin, you’d probably want to go to Dublin, Ireland, not Dublin, Ohio. We can correct this in our data set by making some modifications to the Alteryx Workflow.


Above is the modified workflow. Essentially the results are now sent to an Excel file where the locations can be manually adjusted if needed. I’ve also made a join on this file as well so that only new villages are run through the GeoNames macro. This also speeds up the workflow as web calls like this can take a while compared to normal data processing.


And here’s the result! With my fully enriched and cleansed data, I can now visually analyze the tax data. And better yet, I have a scalable process that can easily be maintained and run again as more data arises.


The Nerd Stuff

If you’re interested in how the macro works under the hood, this section’s for you!