Principles and practice of data visualisation

Autor

Jaromír Mazák, Dpt. Sociology, Faculty of Art, CU, jaromir.mazak@ff.cuni.cz

Publikováno

15. května 2022

What to expect

  • Why visualizations rather than text
  • General principles of visualization
  • Grammar of graphics

Why visualizations rather than text

Florence Nightingale and the Crimean War (1850s)

Source Highcharts.com

Blue diesease from poor hygiene. Red battle wounds, Black other causes. Nightingale: first female fellow of the Royal Satistical Society, teamed up with a brilliant statistician of poor parents William Farr who lacked Nightingale’s fame and political connections. About lobbying with Queen Victoria: ‘She may look at it because it has pictures.’ 16 000 of the 18 000 deaths were from preventable disease.

John Snow and the cholera epidemy on London

Source Highcharts.com

Thanks to Snow’s visualizations, suspicion fell on water sources rather than ‘bad air’ which was the main suspect of causing cholera until then.

Count all the threes

Source Ware (2012)

Count all the threes now

Source Ware (2012)

We remember better visually

Source Medina (2014)

Sight is our main sense, we see the world through images. We have to decode the text first and then imagine what it means in the abstract.

General principles of visualization

Eduard Tufte

A key figure in the modern approach to visualization of information.

Chartjunk; data : ink ratio; data density; micro - macro reading.

Chartjunk unnecessary decorations, data : ink ratio - trying to maximize, i.e. capture a lot of data using little ink; data density - trying to display as much data as possible while emphasizing the main trends; micro-macro reading from the chart are clear trends, but at the same time offers the possibility of detailed inspection.

General principles of visualization

  1. Emphasis on data
  2. Readability
  3. Integrity

PRINCIPLE 1: Emphasis on data

PRINCIPLE 1: Emphasis on data

Less is more. Graphs are meant to communicate information effectively, design is meant to support that goal, not obscure it.

Source Harford (2021)

DATA : INK ratio

“Above all else show the data.” (Eduard Tufte)

Source

More is less

Source

Sometimes a little extra ink is worth it…

Source

Do not use 3D charts

Source

This is not just an unnecessary effect, but it actively harms

Source

Do not rely on default

Excel pie chart

Source of funding [thousands CZK]

Excel pie chart - emphasis

Financování dle zdroje, v tisících Kč

Excel bar chart

BEFORE

AFTER

Excel bar chart - time series

BEFORE

AFTER

Excel Line Chart

BEFORE

AFTER

Excel Likert scale (diverging chart)

BEFORE

AFTER

Excel Likert scale - alternative version (diverging chart)

BEFORE

AFTER

PRINCIPLE 2: Readability

PRINCIPLE 2: Readability

Respect for how human cognition works.

Source Christopher G. Healey

Pie charts are not suitable for making comparisons

Source Wiki

% university-educated in new EU members

Example of improved readability and emphasis

ACTUAL PUBLICATION

SUGGESTION FOR IMPROVEMENT

Schwabish, J. A. (2014) An Economist’s Guide to Visualizing Data

“Small multiples” improve readability of time series

ACTUAL PUBLICATION

SUGGESTION FOR IMPROVEMENT

Schwabish, J. A. (2014) An Economist’s Guide to Visualizing Data

If you have a flexible tool, you can be creative…

Schwabish, J. A. (2014) An Economist’s Guide to Visualizing Data

Careful with this one…

May be useful for two categories

Well-managed data density

Source Financial Times

PRINCIPLE 3: Integrity

PRINCIPLE 3: Integrity

You decide what message visualization brings to the forefront. But you are also responsible for possible distortions or manipulations.

How much are the prices of flats rising?

The big problem with the y axis

axis y in -20 mil. (top), in 0 (bottom)

axis y in -20 mil. (top), in 0 (bottom)

Sometimes the y-axis is arbitrary

Sometimes, we just need to “zoom”

SO if the y-axis does not start at 0 …

  • … use “line chart” rather than “bar chart”
  • … highlight the fact

Source Highcharts.com

Should we be the least worried about poverty of all European countries?

Source Eurostat

Visualizing uncertainty

Data from July 2021

Uncertainty can also be visualized in model estimates

A true visualization, BUT…

Source

General principles of visualization - SUMMARY

Emphasis on data

  • Default settings often need to be changed
  • Keep only those chart elements that have an informational value
  • Do not use 3D charts
  • Think about what you want the chart to say

Readability

  • Respect human cognition
  • Horizontal chart labels are better than vertical
  • Think about the context in which the reader encounters the chart
  • Be inspired by creative approaches

Integrity

  • Be careful with the y-axis
  • Communicate the meaning of what you visualize
  • Take into account the degree of uncertainty

Emphasis on data The data does not speak for itself, you decide what you want visualization to highlight (but you must not manipulate); use colors as a carrier of information, not as an ornament; Readability If possible, describe the data directly in the chart; minimize the use of pie charts); Customize your visualization to your purpose (easier for presentations, more complex to articles); The figure caption on the slide does not have to just name the displayed topic, it can tell a story. In the article, we mostly tell the story in text.

Visualization architecture (grammar of graphics)

Leland Wilkinson and ‘The Grammar of Graphics’ (book)

What makes a good visualization? Individual components…

  1. Data
  2. Variables
  3. Algebra
  4. Scale
  5. Geometry (line chart, bar chart, …)
  6. “Aesthetics” (colors, shapes, saturation, …)

Hadley Wickham and developing a software solution of Wilkinson’s ideas

ggplot2

Seven chart layers. Three required:

  1. Data

  2. Aesthetics - mapping information to color, shape, saturation, …

  3. Geometry - graphic elements that represent data

Four “extra”:

  1. Facets (small multiples)

  2. Aggregated statistics (e.g. regression curve)

  3. Coordinate editing (e.g. logarithmic scale)

  4. Theme (theme) - chart design

Data

# A tibble: 6 × 8
  species island    bill_length_mm bill_depth_mm flipper_l…¹ body_…² sex    year
  <fct>   <fct>              <dbl>         <dbl>       <int>   <int> <fct> <int>
1 Adelie  Torgersen           39.1          18.7         181    3750 male   2007
2 Adelie  Torgersen           39.5          17.4         186    3800 fema…  2007
3 Adelie  Torgersen           40.3          18           195    3250 fema…  2007
4 Adelie  Torgersen           NA            NA            NA      NA <NA>   2007
5 Adelie  Torgersen           36.7          19.3         193    3450 fema…  2007
6 Adelie  Torgersen           39.3          20.6         190    3650 male   2007
# … with abbreviated variable names ¹​flipper_length_mm, ²​body_mass_g
ggplot(data = penguins)

Aesthetics

  • Axes
  • Outline
  • Fill
  • Size
  • Transparency
  • Shape

ggplot(data = penguins, 
       aes(x = sex))

Geometry

  • lines
  • points
  • columns
  • histogram
  • boxplot

ggplot(data = penguins, 
       aes(x = sex)) + 
  geom_bar()

Geometry 2

  • lines
  • points
  • columns
  • histogram
  • boxplot

ggplot(data = penguins %>% 
         filter(!is.na(sex)), 
       aes(x = sex,
           y = bill_length_mm)) + 
  geom_boxplot() +
  theme_classic()

Galery 1

Galerie ggplot2

Galery 2

Galerie ggplot2

Galery 3

Galerie ggplot2

Galery 4

Galerie ggplot2

Galerie 5

Galerie ggplot2

Courtesy

This presentation naturally draws on a hard-to-imagine volume of work of a hard-to-imagine number of people.

Nevertheless, I would especially like to thank Petr Bouchal. With him, in 2016, we prepared a course on the methodology of science at the summer academy for high school students Discover, where we devoted a lot of space to visualization. Petr was also a guest lecturer in my courses at Faculty of Arts, CU, and it was only during his lectures that I fully appreciated the value of seeing visualization as a full-fledged auxiliary scientific discipline. I became acquainted with a number of examples in this presentation thanks to Petr

Additional resources - principles and applications

  1. Jonathan Schwabish - blog
  2. The Economist and the daily charts
  3. Hans Rosling’s Gapminder
  4. Office for National Statistics - Presenting data
  5. (CZ language) Six tips for good visualizations by Průvodce evaluátora - Collection of evaluation tips and recommendations
  6. The Data Visualization Checklist
  7. Selected principles discussed at the blog by a data journalis at the Economist
  8. Makeover Monday
  9. Excel charts
  10. Visualization fuck-ups - for a laugh

Additional resources - working with ggplot2

  1. Some lectures of the course I co-teach Introduction to data analysis in R
  2. Chapter Graphics for Communication in the book R for Data Science
  3. The book ggplot2 by Hadleyho Wickhama
  4. Big galery of charts in ggplot2 including the code how they were created
  5. As part of the galery website mentioned in the bullet above, you can also find an interesting overview of theoretical visualizations tips

Referenced literature and other sources

If the resources referenced in the presentation are not interactive (they do not contain a link directly to their location), you can find them in the list here:

Reference

Harford, Tim. 2021. How to Make the World Add Up: Ten Rules for Thinking Differently About Numbers. 1st edition. London: The Bridge Street Press.
Medina, John. 2014. Brain Rules (Updated and Expanded): 12 Principles for Surviving and Thriving at Work, Home, and School. Second edition. Seattle, WA: Pear Press.
Ware, Colin. 2012. Information Visualization: Perception for Design. 3rd edition. Waltham, MA: Morgan Kaufmann.