Second Pillar of Mapping Data to Visualizations: Visual Encoding

June 14th, 2018

You may remember my last post on the First Pillar of Mapping Data to Visualizations: Data Attributes. Following the ordinal pattern we established, the next post topic I would like to discuss is the second pillar of mapping data to visualizations – the process of visual encoding.








We have already identified a process to determine what data type it is you have (nominal, ordinal, interval, ratio) and the axis to map it on. Now we need to figure out how to best visually display that data using colors, shapes, sizes and position.

For proper perspective on the subject, in 1984 William S. Cleveland and Robert McGill published a landmark piece of research on graphical perception that articulated the standards that many data visualizations abide by today. Their research, which was published in the Journal of American Statistical Association, concluded that everyone has different perceptions of visualizations but there are a few simple steps that everyone can follow. Cleveland and McGill tested a series of visual encoding theories through experimentation and established a series of guidelines based on which visual marker is more accurate vs. less accurate.

For all data to be mapped to a visualization, these are your basic options of display:

For example, if we look at examples of ratio data the difference between the data points is of most importance. Hence, we should use the visual markers that are the most accurate.

From the paper by Cleveland and McGill we can the order of accuracy for these markers like this:

In this case, position is the most accurate marker followed by length and angle, which makes sense if you are mapping data points that we identified in the prior post (cost, age). Similarly, if you tried to map those examples using color – how would you determine the value of the dark green color if I told you that the light one represents $1,000?

On the subject of position-based or length-based charts, as Alberto Cairo noted in his most recent book The Functional Art, top charts should include anything that can be measured on the X-axis. This is illustrated very well by a chart in his book displaying obesity per state in the United States. To map obesity per state, it makes sense to use position. To compare the states obesity vs their neighbors, it makes sense to use color shading.

Learn about the 2nd pillar of mapping data to visualizations on the Qlik Blog #dataviz

This is just one example, but if you have other types of data, you will need a guide in order to determine which visual encoding method is best for you. Take a look at the image below, it provides a neat priority guideline by which your data should be mapped.

Across the board, any time you can use positional data it is in your best interest. However, positional data must not be taken lightly as you can see in the example below. In the first chart, we see a visualization trying to indicate cars being sold across various countries, but there is a problem. In this case, a nominal attribute (country) being mapped by length, which does not help us understand the data very well. Let’s try mapping this data another way.

Below, you can see that both attributes have been mapped by position, which allows us to learn more about the data. This is much better. It also allows the reader to interpret new possibilities, unlike our previous example, which is always a good thing.

One other asset you may be familiar with is our guide to choosing the right visualizationfrom my first blog post. For a popular chart like a scatterplot, if you were to map data this way, it would make more sense (using the data guide you see three images above) to utilize the size of the dots over multiple colors when looking at interval/ratio data. There are many more other factors to consider, but you will be in good shape if you remember the following:

For Nominal data: No one value is more important than the next: while position is best, circles and squares will can be helpful to display your data.

For Ordinal data: Because you are trying to map data with an inherent ranking, the light and dark tones of shading will further emphasize your data’s importance.

For Interval/Ratio data: You are looking to map numerical values, therefore the best way to measure those values is through position or length.

I hope that these guides and graphics have been helpful for you. Be sure to stay on the lookout for my next post that addresses the Third (and final) Pillar of Mapping Data to Visualizations: Usage.