Automated Labeling of Training Data: All About Compose

This article was written by Kalyanv and originally appeared on the Alteryx Data Science Blog here:


A constant challenge of applying machine learning is defining the outcome to predict, and labeling the data to find the right training examples to use for modeling. You also need a large enough number of training examples to train a model that performs well.


Fortunately, Compose, one of our Alteryx Open Source Python libraries, can help you label your data and automatically extract training examples for your modeling problem. It can also adapt easily to domain-specific needs (for example, you want to predict an outcome five hours ahead instead of one hour ahead), can reduce biases in training examples, and so much more.


I spoke at the Alteryx Virtual Global Inspire conference about Compose. Check out the video below to learn more details; I also demonstrate labeling data to use in building predictive models for healthcare.

Compose integrates well with Featuretools for automated feature engineering and EvalML for automated machine learning, so your workflow can look like the below:




Thanks for taking the time to learn about Compose. In my group at MIT, we use Compose to solve a variety of industrial problems ranging from health care to predictive maintenance. I hope you’re intrigued and ready to put it to work. We look forward to hearing about your projects and getting your feedback.