The Hole Story and Bias in AI

October 10th, 2019

This article is by Elif Tutuk and originally appeared on the Qlik Blog here:


Bias in AI

According to a new research commissioned by Qlik, surveying over 2,000 UK citizens, there is a shift in the ‘AI debate’ away from just fears over job loss, to a fresh debate over the role of humans in AI programming, the potential for bias and where accountability should lie for alleviating that bias. Over a third (41%) of the respondents stated that AI in its current state is biased and as a result, are worried about its impact. However, there’s a misconception among the UK public that it’s the human analysis or misinterpretation of the data (rather than the data itself), that can cause such bias. After all, data gives AI sustenance, including its ability to learn at rates far faster than humans. And the data that AI systems use as input can have hidden biases.


Causes for Hidden Biases

Bias is often caused by incomplete data sets, and perhaps most importantly, a lack of context around those data sets. For example, when we ask a question as a human, we ask it based on a hypothesis, which makes that question inherently biased from the get-go. That is why, AI has to have the capability to have context ‘built in’ to analyze all of the data on behalf of humans and provide more objective outcomes.

I would like to give you an example from Word War II showing how incomplete data can cause biased results.

During WWII, Hungarian-born mathematician Abraham Wald undertook a study with the British Air Ministry to use statistical analysis to help protect bombers flying over enemy territory. The data to be crunched included the number and location of bullet holes on returning aircraft, and the goal was to use this information to determine where to best add armor to the plane's structure.

This information was laid out visually to better understand the data, showing where the maximum number of bullet holes were located on returning aircraft.

This chart showed the greatest damage not on the main wing and tail spars, engines, and core fuselage areas, but rather on the aircraft extremities. Based on this, the Air Ministry suggested adding armor to those extremities.

But Wald suggested they were dead wrong. He said more armour should go on the places that had the least holes as he realized that they were forgetting that their data did not include the planes that had been lost. If the returning planes had no holes in their wing spars and engines, the better assumption to make is that even a few holes in those places were deadly: no damage was recorded in those areas because those planes were the ones that had crashed. Wald recommended more armor in those “data-free” areas.

The lesson: the data that isn't there may tell as important a story as the data that is.

At Qlik, we often talk about the power of our associative difference, which understands the entire set of data so users can see what is and is not happening in any selection of data. This can often prompt users to ask questions they may have not thought to ask or go down paths of inquiry they may have not realized were important. If the data from our example in World War II was put into Qlik then the analysis may have looked like this:

By having the full context of the data being examined and seeing what data is excluded, one can quickly understand that damage is concentrated on the wing tips and central body and also quickly understand that some planes are excluded from that data set. Specifically, those excluded planes have no holes and/or were shot down (the data in gray).

This powerful one-of-a kind associative difference also enables Qlik Cognitive Engine, Qlik’s AI framework, to learn from all of the data, with the full context ‘built in’.