The Dangers of Working with ‘Averages’
As infographic design and data visualisation are a major part of our day to day business, we’re constantly working with huge amounts of information. Whether we’re analysing raw data on behalf of a client, visualising the results of a survey or gathering information via our research department, we’re always working with numbers, percentages and in-depth figures in one form or another.
As a result, we constantly come across the use of ‘averages’ (and if you’re a regular consumer of infographics, you’ll have seen them used a lot too). In some instances their use is perfectly valid and revealing, but in many cases, they are completely useless or in the worst examples, altogether misleading.
The problem with averages is that outliers can completely throw off an entire data set, rendering the average figure entirely meaningless. Let’s illustrate this point with an example. Say we have a room containing 100 randomly selected people, and we want to calculate their average annual salary. 99 of the people in the room earn between £10,000 and £30,000 per year and those salaries are evenly spread, giving us an average of around £20,000 per year. The 100th person however just happens to be a professional footballer and earns £250,000 per week – how will this affect the average? Suddenly the average salary in the room has increased substantially, giving us an overall figure that is completely disproportionate; the 100th person, the outlier, has made the average value completely useless.
Now if we can highlight and remove this outlier from the data set, then the resulting average would be more appropriate – the figure would have some value. But how often can this be achieved? We rarely get to see the raw data when we’re looking at an infographic or being given a presentation, so how can we be sure that an outlier (or several outliers) are not throwing off the average value? How do we know that the average being used is in any way representative?
Even if we know of an underlying distribution that is dramatically affecting the average, we can’t always remove the outlier from the data set. An excellent example of this is given by Nassim Taleb, author of The Black Swan. You might know the average depth of a river is 4ft deep, and therefore assume it is safe for you (or you children) to swim in – a somewhat reasonable assumption for the majority of people. However, the reality might be that the river is only inches deep for very long stretches, before reaching a 20ft deep section in which you could easily drown. In this example the underlying distribution gives us an average that is completely meaningless; it gives us no useful information whatsoever, but it is essentially correct because the extremes cannot be removed.
This phenomenon is known by mathematicians as the power law – a situation where a handful of extremes (or even a single extreme) controls the distribution, making the term ‘average’ entirely useless.
There is another effect that can make averages a dangerous metric on which to base your reasoning, referred to as stage migration. This effect can occur in any industry or walk of life, but is most worrying when it appears in the financial or medical fields.
As an example of how this effect works, let’s say you’re a company director tasked with solving the following business issue:
There are two sales departments in the business; one (department A) is performing admirably, the other (department B) is consistently bringing in low numbers. You want to bring up the average sales figures for department B, so you hire an expensive consultant to deal with the problem. The consultant finds that the lowest performing salesman in department A is selling considerably more than the best performer in department B, and simply transfers him from one department to the other. In one swift move, the consultant has not only improved the average sales figures of department B, but has also managed to improve those of the already high-performing department A. In this situation, the consultant will have achieved exactly what they were tasked with (and more), and you as the company director would feel validated in paying his fee (and likely receive a pat on the back from the investors). Everyone is happy with the outcome, despite the fact that not only have the overall sales figures remained exactly the same, but the company is out of pocket from paying the consultant. Here the average values have completely misrepresented the situation and skewed the outcome entirely, despite them being 100% accurate. This is a perfect example of the dangers of stage migration.
Of course this isn’t always the case, and in many circumstances the average value gives us an pertinent insight into the situation. The takeaway lesson however is that dealing with averages – always taking them at face value – is a risky and often flawed undertaking. So next time you read about the average value, the average IQ, the average weight or the average anything else, try and find out the underlying distribution before you use that information to inform your decision making.
Did you enjoy this post? Please consider sharing it using the social media buttons to the left, or better year, Like us on Facebook: