This article was written by David Ha and originally appeared on the Alteryx Enging Works Blog here: https://community.alteryx.com/t5/Engine-Works/AMPlify-your-Workflows/ba-p/617590#
In 2020.2, Alteryx released the new Alteryx Multi-threaded Processing (AMP) Engine. The AMP engine is purpose-built to enable lightning fast analytic execution. There are many resources available to provide an overview of the AMP engine, its purpose, capabilities, usage, and recommendations. Here are a few I'd recommend starting with:
- Alteryx AMP Engine
- Alteryx Engine and AMP: Main Differences
- AMP Memory Use
- AMP Engine Technical Deep Dive | Part 1 | Why AMP?
- The Alteryx AMP Engine: Explained
The purpose of this blog is to explore what kind of performance benefits one might see by switching to the AMP Engine! Please note, this is NOT a performance benchmarking paper. These are observations meant to show you possible performance improvements that can be seen by using the AMP Engine. The exact performance differences will be determined by the tools used in the workflows, data sizes being analyzed by the workflow, and underlying hardware.
In order to ensure results were consistent and repeatable, I didn't want to use my laptop where results could be impacted by other applications. So I went to AWS and created two EC2 instances, each with 4 cores (8 vCPUs) and 16 GB of RAM. I installed Alteryx Server 2020.2 and configured them per the diagram below, with one machine serving as the Controller & Gallery, and the other machine serving as a dedicated Worker. This provided a controlled testing environment where the Worker was isolated, and the only variable being modified was from the E1 engine to the AMP engine.
In order to evaluate different workflow patterns, I settled in on 3 different workflows, a traditional Prep & Blend workflow, a spatial analysis workflow, and a predictive model building workflow.
Workflow #1 - Prep & Blend
The Prep & Blend workflow is a familiar one that joins two data sets then sorts and summarizes the output. These types of workflows are typically memory intensive as they require all records to be read in before the sorts, joins, or summarizations can be performed.
Workflow #2 - Spatial
The spatial workflow uses some of the spatial tools, which can be CPU intensive.
Workflow #3 - Predictive
The predictive workflow uses the R-based predictive tools to build two models (logistic regression and boosted), then uses the Model Comparison tool to determine the champion model.
The test executed each workflow using both the E1 engine and the AMP engine several times, ensuring consistently repeatable results. The average execution times are shown below. It should also be noted that the workflows when executed with E1 and with AMP produced the same outputs.
The results show a staggering 98x faster execution time for the Prep & Blend workflow, a 5x faster execution time for the spatial workflow, and no change for the predictive workflow. These results will be explained in more detail below.
Prep & Blend
- The Sort & Join tools with the E1 engine are singled threaded processes when they get to the final merge. This means on the machine with 8 logical processors, only one of them was doing any work. (12.5% CPU utilization). However, with the AMP engine, the Sort & Join tools are able to utilize all 8 of the logical processors, potentially increasing the amount of work we can accomplish by a factor of 8 (100% CPU utilization).
- The AMP engine makes use of data much more efficiently through the sharing of records and use of 4MB packets. This can easily by seen by looking at the amount of data that passed through the Join tool in the E1 workflow execution (9.4 GB) compared to in the AMP workflow execution (152 MB). The AMP execution used only 1.57% of the data that the E1 engine used.
The spatial workflow saw a substantial benefit from the AMP engine. This can be easily explained by looking at the list of converted tools on AMP. (Tool Use with AMP). The Spatial Info tool has been fully converted providing a multi-threaded execution benefit. The Find Nearest tool has been partially converted as the drive time/distance calculations still use the original E1 engine. However, the configuration being used in this test was not using drive time so the full benefit of AMP was realized.
The predictive workflow execution time was the same with the E1 and AMP engines. This was expected as the predictive tools have not been converted to use the AMP engine. A majority of the predictive workflow execution time is consumed by the R processes, which are externally launched and executed outside of the control of the engine.
This article has shown some of the performance benefits that might be seen from using the AMP engine. The important takeaways here are that:
- The most commonly used tools will perform best on AMP. See the Tool Use with AMP article for the full list.
- The benefit of AMP typically increases as data sizes become larger.
- Mileage will vary based on data sizes, underlying hardware, and workflow construction.