This article is by Sydney Firmin and originally appeared on the Alteryx Data Science Blog here: https://community.alteryx.com/t5/Data-Science-Blog/Filling-in-the-Blanks-An-Introduction-to-Spatial-Interpolation/ba-p/336957
When there are missing values in a typical data set, you have a few options on how to handle them. You can create a new category for the missing values, you can remove the observations with missing values, or you can interpolate values for the missing observations.
But what about spatial data? What if you have a spatial data set for a continuous feature (e.g., annual rainfall), but that data set doesn't include a value for a point that you need.
This is a very common scenario for spatial data, particularly for environmental phenomena that are captured with sensors (e.g., precipitation, temperature, elevation, mineral concentration samples, etc.). Setting up sampling sites can be expensive, and it is often unreasonable to cover every square inch of a study site with sensors.
Scenarios like this are where spatial interpolation comes in handy. Spatial interpolation is the process of using points with known data values to estimate values at other, unknown points. Spatial interpolation is only relevant when there can be a meaningful value at every possible point in your study area (i.e., average rainfall is perfect for interpolation, the location of volcanoes is not). Most spatial interpolation methods take point data and create a continuous raster surface of values.
Spatial interpolation works because of Tobler’s first law of Geography, which states: “everything is related to everything else, but near things are more related than distant things.”
Tobler’s first law of Geography also implies the existence of spatial autocorrelation, which is a fundamental concept in the fields of GIS and spatial statistics. Autocorrelation (of any type) violates standard statistical techniques that assume independence among observations. However, the lack of independence between spatial points can be leveraged to perform a wide variety of spatial analysis.
Inverse Distance Weighting (IDW)
IDW is one of the most straightforward methods for spatial interpolation. It is a deterministic (meaning no randomness is incorporated into estimates) method, based on the assumption that the value of an unsampled point can be estimated as the weighted average of values of points close to the unknown point. Weights are inversely proportional as a function of distance (i.e., further away points have a lower influence on the estimated value).
To improve processing time, it is not uncommon to limit the number of points that have an influence on an unknown point calculation with either a search radius or a numeric cutoff (known as a variable search radius, where only the x closest points are considered).
Another specification for IDW is the power, which determines the distance decay function used to estimate the weights for points that are averaged to estimate the unknown value. Higher power values emphasize the influence of the points nearest to the unknown point, resulting in a more detailed and less smooth interpolated surface. A smaller power value gives more influence to distant points, and results in a more averaged and smoothed interpolated surface.
Things to keep in mind about IDW are that it will not estimate points outside of sample range, that it will not reproduce the local shape suggested by data values and create local extrema at the measured data points. IDW is an exact interpolation method, meaning that it will create values exactly equal to the observed values at all measured locations, which can result in jagged contour line or bull's eye surfaces. IDW treats all points that fall within the search radius the same way.
IDW is best for point data that is relatively equally distributed throughout the study area, and dense. IDW assumes a constant (monotonic) trend related to distance and will not account for trends that occur within the data.
Spatial Interpolation in Alteryx
If you'd like to start dabbling with spatial interpolation yourself, feel free to use the IDW tool that @DrDan and I worked on together as an Alteryx Innovation Days project. To use the IDW tool, you need to provide a series of points with values for the phenomena you would like to interpolate. You can also provide a mask shapefile to filter your interpolated values to (e.g., a state boundary or the boundaries of the study area). Currently, the IDW tool produces two outputs: an image of the interpolated raster surface, and a series of points with the estimated values for the centroids of each of the raster cells from the interpolated surface.
This tool is available for download here.