This article was written by Sri Chintala and originally appeared on the Snowflake Blog here: https://www.snowflake.com/blog/bringing-extensibility-to-data-pipelines-whats-new-with-snowflake-external-functions/
Modern analytical workloads often require complex transformations or augmentations that require using custom code or third-party services. However, using external services and libraries can often complicate data pipelines. To simplify using remote services, Snowflake created External Functions, which enables users to invoke external APIs and custom code from within Snowflake and blend the results into their query results.
This blog post provides an overview of the External Functions feature,and covers the latest announcements and updates.
Overview of External Functions
As shown in Figure 1, the Snowflake External Functions feature enables customers to securely call third-party applications and external custom code from within Snowflake via a cloud-native proxy, thus simplifying access to external API services such as geocoders, machine learning models, and custom code running outside of Snowflake.
Figure 1: Using external functions to call a remote service
External Functions support for AWS API Gateway & Azure API Management is now GA
Earlier this month Snowflake announced the general availability (GA) of external functions for AWS API Gateway and Azure API Management.
External functions feature was initially released into public preview in June 2020 and it is one of the key features for building extensible data pipelines in the Snowflake Data Cloud. We are thrilled to see all the varied and interesting use cases our customers and partners are using external functions for.
Customer Example: How Sainsbury’s Uses External Functions
Sainsbury’s is the UK’s second largest retailer with over 1,400 stores and a large digital presence. The Sainsbury’s teams have leveraged External Functions to enable different use cases. In this blog, we will walk through two use cases at a high level.
Sainsbury’s “Banana” squad have built continuous data pipelines using Snowpipe, tasks and streams with Snowflake. In order to make more robust pipelines and to take immediate actions when needed, the team wanted to monitor task executions and notify the operations team in case of a failure. This has helped reduce remediation time; crucial in a world where Sainsbury’s is ingesting millions of online events a day, billions per week; the aggregation of which contributes to key business metrics and subsequent decisioning. With External Functions, the team can now capture the reason for the task failure. The TASK_HISTORY table is also monitored to check whether the task itself fails. This External Function, calls a processing function through an API endpoint, which sends a notification to Slack.
“For example, when we were loading some transaction data from an external stage, into Snowflake, the task failed. This was a result of the definition of the underlying table having changed. Our notification system, based on External Functions, immediately notified us of the failure and we were able to fix the issue”, explained by Joan Fuerte, Data Engineering Manager.
“External Functions provide Sainsbury’s with an excellent way of communicating with the external world from Snowflake. This has opened up a range of possibilities regarding how we are able to integrate Snowflake more seamlessly into the wider data environment and improve our data workflows”, said Fuerte.
External Functions also allow the Sainsbury’s team to implement a HTTPS asynchronous query pattern and enable an event-based orchestrated workflow around Snowflake operations. External Functions are used to implement a call-back mechanism which is triggered once a query run against Snowflake completes.
Here are a few other common use cases on how customers are leveraging external functions:
Machine Learning Scoring
Customers can train machine learning (ML) models using a platform of their choice and deploy the trained model as a REST service, making it available as a remote service to Snowflake. Using external functions, the model endpoint can then be invoked from within Snowflake to score data. All of this is executed with familiar SQL statements and commands. With the use of external functions, customers no longer need to export data from Snowflake to score data. By calling the external function in SQL using the Snowflake user interface, customers can update tables with predictions directly in Snowflake, simplifying ML scoring pipelines.
See these blog posts for examples of using external functions with different ML providers:
Geocoding provides geographical context for a string that represents a location. The string input may be broad (“California”) or specific (“450 Concar Drive, San Mateo, CA, USA”).
A geocoder generally does three things:
- Parses the location string and attempts to “understand” what it means
- Maps the string to a geographic location (latitude/longitude)
- Returns location and geographic contextual information such as the city, county, and state
Geocoders typically have sophisticated logic and are complex to implement, so customers commonly use third-party APIs such as Google’s Geocoding API, Mapbox API, HERE Geocoder API etc. Using external functions, customers can call out to these third-party geocoding services and directly append the results to database tables from a Snowflake worksheet.
See these blog posts for examples of using external functions for geocoding:
- Infutor’s use of the Placekey external function
- Serverless plugin for adding driving times from Mapbox
Tokenization is a process of replacing sensitive elements (for example, social security numbers and credit card numbers) with unique symbols, referred to as tokens. Tokens by themselves don’t have any exploitable value, and can be mapped back to sensitive elements by the tokenization system or service. External tokenization enables organizations to tokenize sensitive data externally using partner solutions or in-house customer-developed solutions before loading that data into Snowflake. Organizations can then dynamically detokenize data at query runtime for authorized users through masking policies that call the external tokenization service using external functions. This provides enhanced data security to the most sensitive data in an organization.
Multiple partners, including Protegrity and Microfocus Voltage, have integrated their solutions with Snowflake to provide external tokenization services.
See the Snowflake documentation or Protegrity for Snowflake solution brief for more information.
Custom business logic
Customers can write and call their own remote services with external functions. These remote services can be written using any HTTP server stack, including cloud serverless compute services such as AWS Lambda and Azure Functions, and the programming language of choice, including Python, C#, Go, and others, making it easier to write custom business logic.
External Functions Support for Amazon API Gateway Private Endpoints Now in Public Preview
In January 2021, Snowflake announced the public preview of external functions support for Amazon API Gateway Private Endpoints. This enables Snowflake customers with higher security needs to communicate between virtual private clouds via PrivateLink. This feature requires Business Critical edition (or higher).
For more information about Private Endpoints support see the Snowflake documentation.
Support for Asynchronous External Functions
In December 2020, Snowflake released support for asynchronous external functions, which enables users to write remote services that process requests without blocking on the initial request, and return the results in a future response.
Asynchronous functions use polling to reduce timeout errors, which can occur when remote services are heavily loaded, data volumes are large, or remote computations take a long time to run. This feature is particularly useful for users who might call long-running remote services, such as translation services, and run into the cloud-native gateway’s service timeouts. For example, AWS API Gateway has a maximum 30-second response timeout limit.
For more information about asynchronous external functions, see the Snowflake Documentation.
Templates and Examples
Several templates and examples are available to help you get started with External Features, as described in this section.
Cloud deployment templates
External functions require you to set up the cloud-native proxy service, security roles, and often either a Lambda or Azure function for JSON data formatting before requests can be forwarded to an external API. To make it easier to get started with external functions, we have added documentation and sample templates for deploying resources on AWS and Azure.
Snowflake External Functions github repository
In addition to the basic examples provided as part of the Snowflake documentation, see the Snowflake External Functions repository on github for additional real-world examples to help you get started. Bookmark the link and check back as the External Functions team will continue to add new examples.
The Snowflake External Functions feature effectively eliminates the need to export and reimport data when using third-party services, making your data pipelines more effective.