This article was written by Sonia Prakasam and originally appeared on the Alteryx Analytics Blog here: https://community.alteryx.com/t5/Data-Science/Unlocking-Insights-from-Images-using-Computer-Vision/ba-p/754517
Of all the human senses, vision tends to be the one that we process the most amount of information with, the quickest. Computer Vision helps give technology a similar ability to digest information quickly. That’s why we’ve added a new Computer Vision tool group to Intelligence Suite—to help you process large sets of documents in a quick and automated fashion. The new Computer Vision tools use machine learning models to help you pull information out of documents and images.
But a model is only as good as its dataset. You cannot show an algorithm 500 images of a dog and ask it what a cat looks like. Poor image quality, bad orientations, varying formats – all get in the way of acquiring speedy insights from large and complex datasets – from financial statements to healthcare claims.
As you take away structure from these documents, the amount of manual intervention required to extract meaningful data increases exponentially. The average data scientist spends 80% of their time prepping data and merely 20% actually building the model.
I think of every time I walk into a medical office, and someone hands me a clipboard with a form and a pen. There’s always one too few boxes to fit my name or my address, and something eventually spills over. On the one hand, I cannot keep changing my name and address, while on the other, I do expect that there must be some structure and limit to the size of the forms.
The Alteryx team believes algorithms should be flexible enough to work around humans and not the other way around. And so, the models you’ll find in Alteryx hold these values as well.
Ingest Almost Any Image with Image Input
With the upgraded Image Input tool, you can handle a variety of image formats, be it your PDFs, or other standard image formats like JPEG, PNG, and Bitmap. Yes, this means the PDF Input tool you’ve grown to know and love has a new home! It’s moving from the Text Mining tool group to the Computer Vision tool group and has become part of the new Image Input tool.
And more good news—no need to update your existing workflows! The Image Input tool is built with backward compatibility. When you upgrade to 21.2 version of the Alteryx Intelligence Suite, your workflows will be updated to the new tool and will run seamlessly.
Extract Every Word with Image Processing
One of the main factors for the success and accuracy of Optical Character Recognition (OCR) is the quality of the image. Low contrast, blurry images make character recognition tough. The closer you can get an image to its original printed form, the easier it will be to figure what is in it. The solution to this (and a few other issues!) is processing the image prior to running it through the OCR engine.
The new Image Processing tool does exactly that and a lot more. The tool helps you quickly perform the steps typically used for improving image quality. The tool lets you align, threshold, scale, and crop images. You can also balance their brightness and convert them to grayscale. All these steps are integral to improving the quality of text recognition.
What’s extra neat is that the order of execution of the various steps follows the order in which you add them. So, you can always re-order them by dragging the widgets around in the Configuration window! We’ve all had that experience where we’ve applied one too many filters to our Instagram pictures … right? Well, we’ve got you covered there. As a design philosophy, all steps come with intuitive options to reset and remove them as needed.
The motivation behind the Image Processing tool is to make it as simple as possible for you to take an image captured under various circumstances (say, for example, you have a shaky clicker finger, or are struggling against bad lighting conditions) and extract meaningful data out of it.
Now that you have a pre-processed image, it can be run through the algorithms to translate your image from a quick snap on your phone to a digitally editable list of data. Once you have these digitally translated images, the possibility for insights is endless.
Gain Ultimate Flexibility with Automatic Table Detection
You’ve asked, and we’ve heard you. We understand that trying to perform OCR or even utilizing it with Machine Learning (ML) has its challenges, and we want to ensure you get maximum flexibility with minimum manual work. Pulling tables out of an intricate document like the one shown above is laborious, especially when it has complex structures that vary on every page. If you work with such documents today, you probably have to spend hours contending with changes in table layouts, both big and small, all while doing extensive reworking to maintain consistent data formats.
With the inception of automatic table detection, you can now extract data from unstructured images and documents. The ML model finds, cleans, and extracts information entirely without needing any template. All you do is connect the output of the Image Input tool to the optional input anchor of the Image Template tool.
Now the UI (User Interface) of the Image Template tool will change to tell you that you’re running in automatic table detection mode, and you’ve correctly configured this tool to detect tables automatically.
You can now automate this process to interpret things like invoices from Walmart, Costco, or even your local healthcare providers. Alteryx Intelligence Suite’s tools are precisely designed to do all this at scale. And even though Intelligence Suite does this at scale, it still offers you the flexibility to choose specific areas of focus when you are extracting information out of detailed documents like financial statements. Intelligence Suite does a lot of this work with OCR and extensions of OCR with machine learning.