This article was written by Andrew Meyendorff and originally appeared on the Snowflake Blog here: https://www.snowflake.com/blog/how-infutor-uses-the-placekey-external-function-to-extend-the-power-of-snowflake/
The Snowflake Data Cloud provides the unique ability for anyone to join their own data sets with thousands of live third-party data sets near-instantly, securely, and without moving data. Businesses operating in the Data Cloud gain a huge advantage over their competitors who are stuck in data silos and struggling with stale data sets downloaded from their legacy data providers weeks, months, or years ago.
Consider how it would change your business if you could instantly analyze or enrich your own data with any other live data set you can imagine.
For example, location intelligence is a quickly growing segment in the world of analytics, and there is certainly no shortage of providers with interesting location data, including companies like SafeGraph. Partnering with these providers enables you to add a new dimension to your business’s analytics. However, location data such as coordinates and addresses from different sources can have minor differences despite referring to the same place. Placekey, a third-party address matching and entity resolution API service, solves this challenge by encoding the coordinates and other characteristics of a location into a single identifier. This gives you a simple and effective way to join location data sets together regardless of their source. Best of all, the Placekey API is free, so you can easily use it to enrich your own data or data you get from elsewhere.
Snowflake recently enhanced its platform’s flexibility with the introduction of the External Functions feature, which allows you to securely call third-party applications and custom code from within Snowflake. This feature simplifies accessing API services such as geocoders, your own machine learning models, and complex code that can be written in any language. External Functions allows you to simplify your data pipelines by eliminating the need to export and reimport data when using third-party services.
This blog post demonstrates how Infutor, a Snowflake customer and data provider in Snowflake Data Marketplace, leverages Placekey by using the External Functions framework. This use case is an example of how to join disparate data sets in the Data Cloud to derive useful business insights.
USE CASE OVERVIEW
Infutor is an identity resolution service that curates an identity graph containing over 260 million records. An important part of its sales and onboarding process is comparing Infutor’s identity data to data from potential customers to ascertain data coverage and quantify value. One way to accomplish this is by matching the physical addresses in the respective datasets. Since addresses are non standardized, Infutor and potential customers can match their data effectively by using the Placekey API, which assigns a unique identifier to each address and facilitates easy comparison.
PLACEKEY SNOWFLAKE EXTERNAL FUNCTION DEMO
The Placekey Snowflake external function enables Infutor and other Snowflake users to execute bulk queries to the Placekey API and directly append Placekeys to their database tables from a Snowflake worksheet. For more information, see the External Functions documentation.
To append Placekeys to your data using this Snowflake external function, follow these steps:
1. Get a free Placekey API key.
2. Sign in with an ACCOUNTADMIN role to your Snowflake instance. Alternatively, sign in with a role that has access to the CREATE INTEGRATION privilege.
3. Create the API Integration object:
CREATE OR REPLACE API INTEGRATION placekey_api_integration API_PROVIDER = aws_api_gateway API_AWS_ROLE_ARN = 'arn:aws:iam::886725170148:role/placekey-lambda-production' ENABLED = true API_ALLOWED_PREFIXES = ('https://lbdl9njufi.execute-api.us-east-1.amazonaws.com/api/') ;
4. Create the external function. Enter your API key into the indicated string below:
CREATE OR REPLACE EXTERNAL FUNCTION get_placekeys( id number, name varchar, street_address varchar, city varchar, state varchar, postal_code varchar, latitude varchar, longitude varchar, country varchar ) RETURNS variant API_INTEGRATION = placekey_api_integration HEADERS = ('api-key' = '') MAX_BATCH_ROWS = 1000 AS 'https://lbdl9njufi.execute-api.us-east-1.amazonaws.com/api/placekeys' ;
At this point, you can call the external function to append Placekeys to your data.
5. For the purposes of this example, create some testing data in a new table:
CREATE OR REPLACE TABLE test_addresses ( ID VARCHAR(16777216), NAME VARCHAR(16777216), STREETADDRESS VARCHAR(16777216), CITY VARCHAR(16777216), STATE VARCHAR(16777216), ZIPCODE VARCHAR(16777216), LATITUDE VARCHAR(16777216), LONGITUDE VARCHAR(16777216), COUNTRY VARCHAR(16777216), OTHER_COLUMN VARCHAR(16777216) ); INSERT INTO test_addresses VALUES ('0', 'Twin Peaks Petroleum', '598 Portola Dr', 'San Francisco', 'CA', '94131', '37.7371', '-122.44283', 'US', 'other_value_1'), ('1', null, null, null, null, null, '37.7371', '-122.44283', 'US', 'other_value_2'), ('2', 'Beretta', '1199 Valencia St', 'San Francisco', 'CA', '94110', null, null, 'US', 'other_value_3'), ('3', 'Tasty Hand Pulled Noodle', '1 Doyers St', 'New York', 'ny', '10013', null, null, 'US', 'other_value_4'), ('4', null, '1 Doyers St', 'New York', 'NY', '10013', null, null, null, null);
6. To use the external function directly, call it as shown here:
SELECT get_placekeys(joined.*) AS result FROM ( SELECT ID, NAME, STREETADDRESS, CITY, STATE, ZIPCODE, LATITUDE, LONGITUDE, COUNTRY FROM test_addresses ) AS joined;
The function returns the following:
You can join this result back to the original table using the first element in the result as the join key.