There is a right way to automate machine learning. And, there are many different wrong ways. We created a guide that helps you learn about the ten critical components for successful automation.
But first, here are the most common wrong ways to automate machine learning.
Some people seem confused about what it means to automate something.
One vendor, for example, offers a drag-and-drop tool for analysis and calls it “automation.” Drag-and-drop UIs are nice. It’s a lot easier to drag-and-drop than it is to write good Python code. But users still have to know what to drag and where to drop it. That takes knowledge. Drag-and-drop tools won’t help you “democratize” machine learning if that’s your goal.
Another company claims that batch scoring jobs running under a scheduler “automate” machine learning. Schedulers are handy. They automate routine production jobs. That’s great. But they don’t automate the hard parts of machine learning. Someone still has to train and validate a model.
DataRobot automates the hard things, like model training and validation.
If you want to fool people, tell them you have a special algorithm. It does everything, you claim, so there’s no need to use those other algorithms that most data scientists use. Show customers a white paper that explains why your algorithm is special.
To pull this off, you may need one or two university professors to help explain your algorithm’s specialness.
There are two problems with this approach.
First, there is no such thing as an algorithm that outperforms all others on all problems. Machine learning developers deal in tradeoffs. You build an algorithm that works well on some problems, at the expense of good performance on others.
The second problem is transparency. If you use an algorithm outside the mainstream, few people will understand how to use it. Your customers will have a hard time finding and hiring people to work with your tools.
Of course, if your goal is to lock in customers, that’s a feature, not a bug.
The One-Trick Pony
Some vendors build an automated machine learning engine on a single algorithm. They claim that one algorithm is all you need. You just need to engineer features and tune the model properly.
This is nonsense. One algorithm can outperform others on one use case, but it won’t outperform others on all use cases. For consistent quality across diverse use cases, you have to try many different algorithms.
Several vendors use nothing but deep learning. Deep learning is cool. For massively featured problems like image recognition, it’s often the best technique to use. DataRobot uses deep learning, together with many other techniques.
Why do vendors trust in a single algorithm? Sometimes, it’s blind faith. In the machine learning community, some people prefer to specialize in one technique, such as deep learning.
Delivering software is easier if you use just one algorithm. Machine learning is messy. If you can convince customers that one algorithm is all you need, you can save on software engineering, testing, and product development. You don’t have to build tools that automatically compare algorithms, because there’s nothing to compare!
Old Wine in New Bottles
Machine learning is just 8-10 years old. Tools that are older than that run on one server only. If your computing problem exceeds the capacity of one server -- well, that’s just too bad.
Legacy software vendors figure they can build an automated machine learning engine that runs on top of their existing software. That approach rarely works well. Automated machine learning engines run a lot of experiments. You have to run those experiments in parallel, or you will wait a long time to see results.
Half a Bridge
Automated machine learning is powerful because it helps you bring new users into the process. With built-in quality assurance, you can trust novice users to build reliable models. Your most valuable experts can take on coaching and advisory roles, or they can work on the most challenging models.
You can’t do that with partially automated tools. If any part of the process is manual, your expert users must perform every task. Otherwise, there is too much risk that novices will make mistakes on the manual parts. As a result, the capacity of your expert data scientists limits your entire machine learning program.
How to Automate Machine Learning
Automate the hard things.
Use mainstream algorithms.
Use diverse algorithms.
Build an engine that works out of the box.
Build for high performance and scalability.
Automate the complete machine learning workflow.