What is Assisted Modeling?

August 3rd, 2020

This article is by Jand originally appeared on the Alteryx Analytics Blog here: https://community.alteryx.com/t5/Analytics-Blog/What-is-Assisted-Modeling/ba-p/591359



There has been a lot of excitement abuzz in the industry and the Alteryx community over the new Assisted Modeling features in Designer. As we start diving into the analytics journey with Assisted Modeling, it may help to understand what Assisted Modeling is and what it can do.

Assisted Modeling, as part of the new Alteryx Intelligence Suite, is very powerful and new features are being added as we speak so we will not cover every aspect, but we will cover everything essential to get you started on your Assisted Modeling workflows.


Good news! There are not many pre-requisites for using Assisted Modeling. Assisted Modeling was built with the intent of being educational, effective, and to simplify the machine learning experience all within Alteryx Designer. Therefore, you do not need any machine learning experience or depth of knowledge on predictive analytics in order to get started! All you need is a licensed Designer application and an Assisted Modeling license. You can also sign up for the Beta Program to test new features.


What is Assisted Modeling?

Assisted Modeling is a new feature in Designer that allows our customers to build state-of-the-art Machine Learning algorithms for predicting outcomes based on historical data.

The analytic journey for many often ends at the start of predictive modeling due to the steep learning curve and the significant amount of knowledge required to understand and implement. Alteryx has flattened that curve by introducing the Assisted Modeling tools to help guide citizen data scientists in constructing an effective model for predicting on their data.

Let's say you work in retail distribution and your manager has asked you to put together a workflow to predict when inventory needs to be stocked at different store locations based on trends in sales for that inventory in the past. With Assisted Modeling, any citizen data scientist can build such a predictive model with excellent accuracy and little difficulty.

As you build and train your first model, you can learn more about what Assisted Modeling is doing behind the scenes and use this knowledge to improve your machine learning models in the future.


What are the Steps to Assisted Modeling?

Assisted Modeling will guide you through the steps to build your Machine Learning model. Each step helps add context and capability to the Machine Learning model that is created at the end. The Assisted Modeling teams are constantly working to expand and improve the steps but we'll discuss a few of the basic steps used to help give your model the knowledge and tools necessary to predict as accurately as possible.

The first prompt you will encounter in Assisted Modeling will ask you to choose a target variable. The target variable is the piece of data you would like to predict. This could be a projected stock price, an estimate of inventory in the future, or a predicted category for a given row of data. Based on the target variable, Assisted Modeling will help guide the decision on whether you should use a classification model (classifying data into categories) or a regression model (continuous numeric data).

The second step goes through the datatypes of each column in the dataset you are using to build your model. Assisted Modeling will attempt to infer the datatypes but will ask for you to verify that they are all correct. There are generally three data types Assisted Modeling is concerned with: categorical, numeric, and ID. Categorical data represents any data with a discrete, and reasonably small, number of values. For example, gender is typically considered categorical data because the number of values used to define gender is reasonably small. Numeric data represents continuous numeric values. A person's age would be considered a continuous numeric data type. The last data type is called ID. This represents data that may be unique to the row and not helpful in making predictions. ID values include any categorical data with a significant amount of discrete values (or the values are not guaranteed to be discrete), such as a person's name, or a primary key from a database where every value is unique. ID data types essentially denote data that is not conducive to predicting the target variable defined in the first step and will, therefore, be dropped from the columns of data used in training the model.




The third step you will encounter on your journey to build a Machine Learning model is to clean up any missing values from any of the data. When training a Machine Learning model, you cannot have any null or missing column data because the training algorithms do not know how to handle those missing values. So in this step Assisted Modeling will identify which columns have missing data, how much of the column's data is missing, and provide suggestions on how to proceed. The suggestions depend on the type of data we defined in the previous step. For categorical data, you can choose to replace the missing values with a constant. For numeric data, you can replace missing values with the median or mode of the numeric data. You also have the option to drop a column as well, and if there are a significant amount of missing values in a column, Assisted Modeling will recommend that the column be dropped.



The last step we'll discuss in the Assisted Modeling journey is the feature selection step. In this step, you select the final features you will use in training your Machine Learning model. Assisted Modeling is exceptionally helpful in this step. Assisted Modeling will analyze all of the remaining columns of data (not already dropped in previous steps) and use complex data science algorithms to determine the overall efficacy of each column. Some columns may be too closely associated with the data you are trying to predict. Other columns may be too weakly associated with the target variable and only add unnecessary noise for the Machine Learning algorithm to sift through. Assisted Modeling will provide recommendations for each column to maximize the efficiency of your Machine Learning model.



What are the Machine Learning tools?

Building and training a machine learning model is streamlined through Assisted Modeling. After we have a model we're happy with, we want to incorporate the model into our workflow. To do this, we have built new tools for Designer that mirror the Assisted Modeling experience and create tools right on your canvas with your guided modeling decisions laid out and easy to understand.

If you want to make adjustments to your model as you predict and test with more data, each tool can be modified on the canvas to allow you to customize your model to meet the needs of your use cases.

If you're familiar with machine learning and do not want to follow the guided Assisted Modeling flow, never fear! Using Expert Mode in the Code Free Machine Learning tools lets you create your own Machine Learning pipeline right on the canvas.


How does it all fit together?

So how does Assisted Modeling fit into the analytic experience? Inferring results from historical data has become ubiquitous across businesses in every industry. From inventory management to predicting weather patterns, Machine Learning will continue to transform the way we view and utilize data.

Now you can be at the forefront of these changes! When you have finished prepping and blending your data with Alteryx Designer, all you need to do is add a Modeling Tool to the canvas, follow the guided Assisted Modeling journey, and output your favorite models to the canvas.

With your model on the canvas, connect your new data to the Predict tool to start making predictions based on your data. It's as easy as that! You can even output your predictions to a file or database to present later, or store and compare the results of your prediction to the actual values when you receive them.




What's Next?

You should have a pretty good understanding of what Assisted Modeling is now. But you probably want to see how it's used in the real world. Next in the Assisted Modeling Blog Series, we'll start diving into Kaggle Competitions and other Machine Learning challenges that Assisted Modeling can solve and we'll compare benchmarks to assess how effective our models are at predicting on new data.

For more information, check out Is Assisted Modeling Right for Me? and Using Assisted Modeling in the Alteryx Academy.  Or check out this demo - https://www.youtube.com/watch?v=k0K6hrWpVGs&feature=emb_title