This article is by Joshua Burkhow and originally appeared on the Alteryx Engine Works Blog here: https://community.alteryx.com/t5/Engine-Works-Blog/5-MORE-Useful-Design-Patterns-in-Alteryx-You-Need-to-Master/ba-p/520943
Solution #6: Keeping Sort Order After a Summarize, Unique or Other Tools
You have data that you want to summarize or get rid of duplicates and in developing a workflow realize that the data gets automatically sorted on the output, but you would like to keep the order that the data is in or some order other than what is given, even with aggregation!
You can simply do this by adding a Record ID tool, and in the case of a Summarize tool, just use the Sort Order column as a ‘First’ or in the case of using the Unique tool just sort on that column after the Unique tool. Now you are not at the mercy of the sort given to you.
- Input Data
- Add Record ID (I name mine "Sort Order")
- Add Summarize, Unique or any other tool (that has the sorting embedded)
- For Summarize - use the "Sort Order" column with ‘Action’ dropdown set to ‘First’
- For Unique - Add a sort after the Unique tool
Solution #7: Parsing HTML Tables
I have spoken with many users tasked with parsing HTML or XML data, serialized data, or even unstructured data, and many of them end up doing this task with 10-20 tools just to data munge it down to the values that they want. I have found that many of these users go down this road of using so many tools basically because they either:
- a) Haven't seen the true power of RegEx and its capabilities, or
- b) It scares them.... like scary monsters under your bed type of scary.
Hopefully, I can help or at least challenge you not to shy away. I fully realize I am likely going to start a full-scale street riot amongst a few of you by going so far in saying this, but I have learned that RegEx is actually quite easy. Yep, I just said that.
If there is any interest, I could expand on this more in the future, but there are many posts and threads already on the community and great training videos as well. Now for those of you still with me, I am going to show you a great design pattern example having to do with tables that reside on a webpage.
The main idea is super simple:
- Get your data (the HTML)
- Parse to Rows (while also filtering down to only what you want)
- Parse to Columns (while also removing any of the tags and stuff that is not the data)
- Clean up rows
- Clean up columns
- Input Data with only URL
- Use Download tool to 'scrape' HTML from webpage
- Use the RegEx tool to parse out using the <tr> tags. (Notice I am splitting to rows, not columns)
- Use the RegEx tool to pull out only the values you want
NOTE: Steps 3 and 4 are just great examples of why the RegEx tool is so powerful. Also pay attention to the fact that I figured out what the RegEx expression is to parse out all the pieces, but more importantly, I put parentheses around only what I wanted to "Parse" out into the columns. I use https://regex101.com/ every time I am building the expression, then just pop it in the RegEx tool. Looks something like this:
Solution #8: Tagging or Flagging What Rows to Keep or Discard
I have come across many scenarios where there just isn't a nice simple clean way to filter on numerous columns AND records. Now if you read my previous post you learned about how to filter many different columns but what if you have a situation where there are a group of columns along with the need to look backward or forward (i.e. up or down records) and in which TOGETHER these elements provide the base to build logic for whether you want it in the data or not. The alternative being that you have to create multiple filters and or many tools to get to the same spot.
[Tomorrow] != 'Monday' AND
[AlcoholAvailable] = 'Y' AND
[Time] > 6pm
[Start Drinking] = 'Y'
[Start Drinking] = ‘N’
Now, my friends, we have a good situation to use the Multi-Row Formula tool specifically for the purpose of tagging or flagging records for filtering based on relatively or truly complex logic. To be honest, it doesn't even solely pertain to just using the multi-row formula tool because using the formula tool would work as well in many cases.
The important part here is making clear 'flags' to delineate your data. I can think of many useful examples where you would use the RegEx "Match" function even to look for something and then flag it for review, deeper inspection, or again for deletion.
- Input Data
- Create logic you want to properly set a column value to T or F, Y or N, or even a value from a list of many
- Filter using that Flag you set
Solution #9: Record ID and Text to Rows
Another useful design pattern to have at your fingertips is the ability to parse out a single row of a value(s) into multiple rows BUT—and this is the caveat that makes it all worth it—we need to keep track of the initial row so that when we're done cleaning and parsing out what we need then we can bring it all back together! Just like that time when I broke my mom's favorite coffee cup and then used wood glue to make it look like it never even happened......sort of.
One thing to note: in my example, I use the Find and Replace tool, but this is one of a million other possibilities based on what you need to have happen. One could use the formula tool, the multi-row tool, the multi-field tool, or whatever else gets the job done.
- Input Data
- Add RecordID Tool
- Data parsing & cleanup
- Use the Text to Columns tool (with "Split to Rows" selected)
- Tie up data to look the way you need
Solution #10: Appending Aggregations
I often get asked how I can add in Weekly, Monthly, Yearly averages, and I usually point them to this design pattern. The power in this that many people miss is that by doing the aggregations separately like this, you are actually creating a 'dynamic' functionality in that the Summarize tools will always look at ALL the data in order to create the aggregation function (i.e., Average or Sum). If next week you run this and you have much more data, this will take that into consideration.
The appending part is just using joins and that's because we want to do a match append where the appropriate aggregations get aligned with the appropriate buckets (i.e., Weekly, Monthly, Yearly).
- Input Data
- Add the aggregations you want
- Use Joins to 'append' each aggregation to your original dataset
Again, I hope these help you with your day to day development! I have attached the updated workflow which also includes the 5 examples from the previous post (5 Useful Design Patterns in Alteryx You Need to Master) so you can walk through the configurations as well. If you use any of these already, let me know! Also, if you have any other design patterns you love using, please share below!