De-Confusing the Matrix

This article was written by David P and originally appeared on the Alteryx Engine Works Blog here:


“Confusion Matrix” is not just a term for how we all felt in 1999 after watching the Wachowski’s masterpiece for the first time. A confusion matrix is also a way of looking at the effectiveness of a model’s predictive power. So jack in as I use this blog as an excuse to write about my favourite film. 


Hang on, what is a Confusion Matrix? 

A confusion matrix is simply a table that shows the number of members of the actual categories in a data set, compared to the number of members in the categories predicted by a model. A generic example of a binary category would look like this: 

  Actual True  Actual False 
Predicted True  True Positives  False Positives 
Predicted False  False Negatives  True Negatives 


That’s all very nice, but why should I care? 

Why should you use a confusion matrix? Let’s imagine Morpheus is trying to find The One. He’s given his spiel to tons people, but most of them end up choosing the blue pill. The one thing he knows about The One is that they will definitely take the red pill, so based on some other information he’s gathered on potential candidates he wants to predict which pill they will take and limit his time talking to unviable candidates.  

With a dataset that has 92 Blue Pill takers and Red Pill takers Morpheus’ first model for predicting who will take the Blue Pill looks like this: 


Morpheus looks at some stats for this model and sees that the model is 92% accurate (<Correct Predictions> / <Total Predictions>) but if we look at the confusion matrix we see that this stat doesn’t tell the whole story.  

  Actual Blue  Actual Red  Total 
Predicted Blue  92  8  100 
Predicted Red  0  0  0 
Total  92  8  100 


This ‘model’ successfully predicted all of the Blue Pill takers, but none of the Red Pill takers. Other stats will also miss out on this model’s shortfall:  

  • Precision (<True Blues> / <Total Predicted Blues>) = 100%  
  • Recall (<True Blues> / <Total Actual Blues>) = 92% 
  • F1 Score (a harmonic mean of Precision and Recall) = 95.83% 

Based on these stats one might think that this is a great model, but actually it’s just reflecting the imbalance of the data set. From looking at the confusion matrix however, Morpheus can immediately see the problem with the ‘model’ and goes in search of a better one.  

Nice, so how does it work in practice? 

This time Morpheus is going to use his full dataset of 1000 candidates he’s previously met. He’s got some data about them (Age, Sex, Salary, Job Type) and he’s also logged which pill they took.  


Like a good data analyst, Morpheus has cleaned and normalised his variables, and split this data set into a 70/30 split to train a number of models. He’s opted for: 

  • Decision Tree 
  • Forest Model 
  • Logistic Regression 
  • Boosted Model

And he’s using the Model Comparison Community Macro to evaluate which model to use: 



The accuracy of the 4 models is as follows: 

Model  Decision Tree  Forest Model  Logistic  Boosted 
Accuracy  77.33%  79.67%  77.67%  78.33% 

However, as we’ve seen this stat isn’t the be-all-and-end-all. If we look at the confusion matrices for the 4 models, we get a more nuanced picture: 


From these we can see that, despite the Forest Model having the highest overall accuracy, the Logistic Regression was better at predicting who took the Red Pill. As that is what Morpheus is actually interested in, he decides to take his Logistic Regression and use it on his dataset of potential candidates.  


Now he can filter his list of potential candidates only to those who were more likely to take the Red Pill than the Blue Pill. This means that he can spare the majority of his list from having their entire world view destroyed.