This article was written by Megan Dibble and originally appeared on the Alteryx Engine Works Blog here: https://community.alteryx.com/t5/Engine-Works/7-Alteryx-Best-Practices-to-Simplify-Your-Life/ba-p/805418
I often give new Alteryx users the same pieces of advice. So I thought why not write an article with those tips? Then even more people can improve and simplify their Alteryx development process.
So here it is— 7 Alteryx best practices to make your life easier.
(A big thank you to my manager, Shalini Polimetla, for teaching me many of these best practices.)
1. Keep Your Workflow Organized
As your Alteryx workflows get more complex, it gets harder to follow the train of logic. Luckily, there are several tools that can help you keep your workflows straightforward:
- Tool containers: You can use containers to separate out sections of the workflow. This allows you to group the process into logical sections, and it helps others understand your process faster. Tool containers can also be minimized so that a particular section of the workflow does not run, which can be useful to speed up run time during development.
- Comment boxes: These are great to track your own notes and questions during the development of your workflow. Once the workflow is ready to productionalize or send to other teammates, notes can help others understand the business logic or reasons behind different data processing steps.
Image by Author. Example use of containers, comments, and wireless tool connections for workflow organization.
- Wireless tool connections: You can right-click on any tool and select “Make outgoing connections wireless.” This will reduce the visual clutter caused by a tool where the outgoing data branches out to many other tools.
2. Package Workflows Before Sharing
Packaging a workflow is very simple, and it could save the recipient of the workflow a headache too. In Alteryx, simply go to Options → Export Workflow. This will create an Alteryx Package file (.yxzp) which behaves like a zipped file. The great thing about the packaged workflow is that it includes all of the input files needed to run a workflow, so users do not have to search for files or re-do all of the input paths. The zipped folder also includes any supporting macros that are used in the workflow.
3. Use the Record ID and Unique Tools When Joining Data
This may be the most important tip I have learned over the past year of using Alteryx. Essentially, data joins can get messy. If you are pulling together a variety of data sources, it can be easy to get lost in the joins, and then all of the sudden you have an issue with duplicate records. Or, if you are working with a dataset for the first time, this technique can help you understand the primary keys of your tables and figure out how you should join them together.
Image by Author.
The technique is simple: place a RecordID tool on the left side of the data, and then place a unique tool, with the RecordID field selected, right after the inner join output (J). This will ensure that none of the records in the original data stream (the left side) are duplicated. If you do see records coming out of the duplicate (D) output of the unique tool, it’s important that you understand why this is occurring. It could be that you have not structured your join correctly for your two tables.
4. Make Macros to Simplify Repetitive Processes
If you find yourself duplicating a process in a workflow by copying tools over and over, you likely have an opportunity to simplify your workflow by turning these processes into a batch macro. To learn more about batch macros, take a look at Alteryx’s “Getting Started with Batch Macros” post.
5. Save Your Workflow Before Running the First Time
This was an insight I learned from my manager: if you run a workflow without saving first (i.e. it will show as NewWorkflow1), it uses the processing power of the temp drive. If you save and then run, it uses the processing power on your C drive. I have seen this take a 10 minute run time down to 7 minutes or less.
6. Eliminate Browse Tools & Unnecessary Tools When Productionalizing
Depending on the size of your dataset, having browse tools can significantly slow down the processing time of your workflow. Once your workflow is in a steady state, you should go back through it, eliminate browse tools, and also re-evaluate whether all tools are necessary.
When I go back through my workflows, if I was under a time crunch to develop them, I will realize that I did not design them in the most efficient way. Or, perhaps the requirements changed over the duration of the project so I can now delete some sections or tools in the workflow that no longer have a purpose.
7. Offload Data or Use In-Database Tools for Database Queries
Image from Alteryx.
In-Database tools allow you to perform data cleaning/data blending activities without moving the data out of the database. This can make your workflow run significantly faster when you are working with large queries.
For a comprehensive overview of In-Database tools, you can refer to this Alteryx documentation page.
Additionally, when I am developing a workflow, I frequently query databases and store the results in a .yxdb (Alteryx database file format). I then use the .yxdb files as inputs to my workflow so that: 1. The workflow runs faster and 2. I am not hitting the database with tons of queries as I re-work and test my process.
That’s all I have for you today on Alteryx best practices, although I am sure there are more that I did not cover. If you have a strategy you have learned from experience, feel free to leave a comment with your expertise!