ScatterPlots: A Love Story

Ravi Mistry
4 min readJan 11, 2019

If you were to ask me my favourite chart type — perhaps we’re in a bar, or we’re travelling somewhere and we need to make some sort of conversation so the silence is neither stale nor awkward — then I would probably say the scatterplot.

Scatterplots do two things which I find fascinating about data analysis. It highlights outliers, which is the aim of the game to find interesting things; and it maps relationships. And these two elements combined mean that scatterplots are super versatile as a chart type.

They also sit alongside the bar chart, line chart and pie chart as charts that people just understand. You don’t need to think of how to read a scatterplot, you just start looking for the story. Typically what you look for is the measure you are comparing, and what an individual dot represents.

Data from WhoScored.com

However, something which I’ve begun to see on Twitter is various scatterplots which have irked me somewhat, and it mainly has to do with scaling.

Typically, when creating a scatterplot the shape of the canvas that the axes are plotted on is square. The reason for this is scaling. If you are looking to plot points to compare measures on a similar scale, then this matters a lot. The chart above is comparing the relationship between Fouls per 90 of Championship footballers to Turnovers per 90 (Data from 2014–15). But how would this change if the chart was rectangular, not square?

Have the outliers changed? Not really. Kazenga LuaLua (the outlier to the far right) is still a very prominent data point on both charts above. However, by elongating the x-axis (side note: I work with data mostly daily, and I still hesitate saying x or y axis and think about the ‘x is a cross, and y’s up/wise up’) — anyway, by elongating the x-axis, we almost add higher importance to the Fouls per 90 metric.

So does the pattern change if we swap the axes?

So! This is telling a wholly different story. This time my eyes were drawn to Kazenga LuaLua (up top, in the middle) but more so to the bottom right, where Wes Thomas, Callum Harriott and Eoin Doyle reside. Has the changing of the axis meant the Turnovers per 90 metric has higher importance placed upon it? I’d argue yes.

Hold on a second matey boy. You do realise Hans Rosling, one of the pioneers of data storytelling, used a wide scatter-plot for his most famous piece of work?

Screen cap taken from https://www.youtube.com/watch?v=jbkSRLYSojo

To this, I could default to the field of data visualisation’s favourite term — it depends… but I also want to mention the small matter of scale. I mentioned earlier in this piece about comparing measures on a similar scale. For Rosling’s piece, for example, it does make sense to use a wider plot as it aids the story he is telling. The importance is to highlight that rising income in all of these countries has improved the world in that people are living longer.

Remember, the scatterplot is one of the most recognisable visualisations to interpret and understand. Quadrant analysis for example, promotes the top right as the nirvana… for Rosling, the top right was where the correlation was highest.

Conclusions

To end with it depends feels a little too easy.

I feel this argument is similar to the axis truncation question from a year or so back, that in most cases truncating of an axis can be misleading.

Similarly, when comparing similar metrics, or even different metrics, the default should surely be to square off the scatter plot to allow equal importance to each measure. But of course, it all depends on what the designer is emphasising and the story they are looking to tell.

I’d love to hear the wider thoughts on this — leave a comment below, or tweet me @Scribblr_42

--

--