This article was written by Peter Simon and originally appeared on the DataRobot Blog here: https://www.datarobot.com/blog/ai-in-financial-markets-beyond-the-market-predicting-magic-box/
Financial markets are the one area of life where it’s reasonable to expect different results from doing the same thing over and over again, and it’s not considered a definition of madness. So, it shouldn’t be a surprise that there is no such thing as a market-predicting magic machine learning box that ingests markets data and spits out lots of money. After all, it’s next to impossible for an AI system (or a human, for that matter) to learn from behaviors that don’t have consistent outcomes. That said, recent developments in automating machine learning have already helped participants in financial markets build, test, and understand powerful AI models that support and enhance their investment processes and many other areas of their businesses. But can AI and data science be used to generate alpha?
While machine learning techniques cannot magically make “signal” appear out of thin air, nor make unstable factors more stable, they have a number of useful characteristics that set them apart from the traditional quant investing toolbox. Machine learning models are often non-linear and don’t assume a prior statistical distribution, focusing on the quality of actual predictions rather than statistical measures of fit and significance. Machine learning techniques are also more flexible when it comes to the input data: they are better at dealing with multiple highly correlated variables, more robust with respect to missing values and can be very good at extracting value from non-traditional forms of data. They’re also surprisingly easy to understand and interpret—and definitely no longer deserve their bad reputation of being black boxes.
With automated machine learning, you no longer need armies of rocket scientists and Ph.Ds in order to participate in the AI revolution. Instead, the competitive edge in quant finance today comes from a deep understanding of the domain and the data being used, not which particularly innovative machine learning algorithm is being deployed, the specific data preparation steps needed for it, or the nuances which make it “better” than others. Modern automated machine learning platforms, such as DataRobot, hold benefits for users at a variety of skill levels and backgrounds, making the technology accessible to users who might previously not have had the requisite experience. Ultimately, it’s about giving those who understand their domain and the data well the ability to massively expand the questions that can be searched on — the problem space — and focus on adding value from their expertise and edge.
At its heart, DataRobot is an enterprise AI platform for the automated building of machine learning models to address two very common classes of AI problems:
● Supervised machine learning. You have a body of historical observations (data), you know various things about them (variables/features), and you know the outcomes of these observations (the target variable).
● Unsupervised anomaly detection. You have a body of historical observations (data), you know various things about them (variables/features) and, as new observations occur, the ability to score how similar or different they are to historical observations is valuable.
The ability to solve both these classes of machine learning problems is becoming increasingly commoditized, and it makes sense to automate it. This frees up the humans in the loop to focus on areas that can’t be automated and where the ultimate value-add will come — the domain and the data — and ensuring that the models they build will be robust and work well on new data after models are deployed. Here are a few best practices to ensure that this happens:
- Specificity rules. Be as specific as possible with your problem statement. Don’t test so much that you confuse random luck with skill.
- Unseen data is your best friend. There is no such thing as too much out-of-sample testing. (On the other hand, too little, or badly designed, out-of-sample testing is possibly the single biggest source of career risk in quantitative finance.)
- “One model for all seasons” is not an actual thing. Economies and markets evolve over time. Be ready for these changes and responsive when they occur.
- Backtest for stability and decay. Simulate what the model maintenance and refresh process will look like in production. Will your methodology still work after a few updates?
- Sometimes less data is more. Especially when behaviors vary over time. Focus on shorter periods of more consistent behavior for better results.
- Smell tests and sense checks are important. If you can’t fully understand the behavior that your model has found, or find a sensible explanation that matches up with your domain knowledge, chances are you’ve found a statistical artefact that won’t persist.
- Algorithmic diversity is useful for sense checking. Especially if you can’t replicate your results over multiple different machine learning approaches.
- Naïveté is a virtue. Don’t forget to compare the models you build to naïve baselines (such as always predicting the same thing)—they might not look so great afterwards.
- Don’t expect to shoot the lights out. It pays to be realistic about outcomes. And you don’t need to shoot the lights out to be profitable.
- Self-awareness is paramount. If it looks too good to be true, it probably is. If your model makes you feel like you’re a genius, spend some time actively trying to make your results worse (by identifying and squashing data leakage, that comes in many guises)—if you can’t, you’re on to something.
Machine Learning vs. Quants
So, it’s not a magic AI box, but machine learning can be a versatile, useful addition to the traditional quant’s toolbox of mathematical and statistical techniques. Download our ebook to dive deeper into why and how increasing numbers of sell-side and buy-side professionals are harnessing the power of automated machine learning in their daily work and using DataRobot to build, deploy, and monitor sophisticated models that generate millions of dollars annually.