Detect Accounting Fraud with AI

This article was written by Wei Zhong Toh, Nina Xing, Clifton Phua and originally appeared on the DataRobot Blog here:


Accounting fraud is the manipulation or misstatement of a company’s financial statements to create an illusion of strong financial health. This type of fraud boosts investor or shareholder confidence—when in fact the company could be in poor financial health with liquidity or solvency issues.

There are several ways of falsifying financial statements to paint a more rosy picture such as overstating revenue and/or assets, not recording expenses, or understating liabilities. The results of these falsifications mislead investors to believe a company is making money or has sufficient liquidity, which also leads to overinflated share prices.

The most famous example of accounting fraud is the Enron scandal where share prices plummeted to a measly few cents after the company’s fraudulent practices were uncovered.

Fortunately, regulators now have an additional tool to fight accounting fraud—artificial intelligence. Using the right financial datasets and models, fraudulent account practices can be accurately detected and reasonably explained.

For example, we can use data entries in balance sheets and profit-and-loss statements as features in a machine learning model to predict accounting fraud such as total current assets, cost of goods sold (COGS), or total debt in current liabilities.

Along with raw data entries in these statements, additional financial ratios such as year-on-year changes in return on assets or book-to-market value are useful machine learning features as well.

The dataset used in the following example was published in the Journal of Accounting Research.

Using the right financial datasets and models, fraudulent account practices can be accurately detected and reasonably explained.

Note that DataRobot also automatically runs Data Quality Assessments on the dataset to identify and remedy potential data quality issues.

Using this dataset, DataRobot’s Automated Machine Learning has built over 100 different machine learning models to identify the best performing model in the project and to recommend the model for deployment. The following is our best model, a Light Gradient Boosted Trees Classifier with Early Stopping, which has an Area-Under-Curve (AUC) score of over 80 percent.

Light Gradient Boosted Trees Classifier with Early Stopping - DataRobot AutoMLThrough Feature Impact, DataRobot also informs us of the top features in the dataset that contributes significantly to the model’s predictive accuracy. The top features are % Soft AssetsSale of Common and Preferred StockTotal Receivables, and others. Using this model, we can then make predictions of the risk of accounting fraud for a given company based on the company’s financial statements and characteristics. The following shows how DataRobot’s No-Code App Builder allows users to design applications in a code-free fashion, query a given model, and generate predictions.

DataRobot No-Code App Builder

By some accounts, when executed properly, AI fraud detection systems can reduce fraud by 95 percent while also reducing the associated costs. These results provide significant value for public agencies and regulators. If you’re interested in using AI, sign up for the DataRobot AI Platform Free Trial. To request a fraud detection demo shown in this blog post, contact us and we’ll explore a proof of value together.