You may well seen the below viral image of a weight stack over the years. It’s often shared as evidence that the normal distribution (i.e. bell curve) is everywhere in daily life.
There’s just one problem: it doesn’t show a normal distribution.
A normal distribution often appears when we sample repeatedly from the same population and take the average each time (this is thanks to the central limit theorem). For example, if we calculate the average height of members at a series of different gyms, we’d expect the distribution of these averages to follow a normal curve if we included lots of gyms in the analysis.
But that’s not what happens if we look at a weight stack.
The wear marks on a stack show the distribution of individual efforts over time. And like many measures of individual performance, it has a long tail.
Just look at the image. The wear is concentrated at weights around 40-50. But some people have lifted well over 100.
If the distribution was truly normal, it would be symmetrical. But the marks in the image are not symmetrical; they are skewed to the right.
If we fit a normal curve to the data (well, the approximate area of wear estimated by an AI vision model), we therefore end up predicting negative weights. Which, of course, isn’t possible:
Just because observations are distributed around an average value, it doesn’t mean they automatically follow a textbook normal distribution.
If we’re focusing on simple distributions, something like a Gamma distribution – which is asymmetric and only generates positive values – gives a better overall fit to the observed data. It’s not perfect, and a more complex shape would do better (Gamma, like normal, has only two parameters) but at least it doesn’t predict a bunch of implausible negative values:
This is the problem with telling people that normal distributions exist in places where they don’t: it creates a temptation to use distributions that are familiar rather than realistic.
We saw this happen a lot during the early stages of the COVID pandemic, when people would fit simplistic bell-shaped curves to outbreaks with very different causes and dynamics, then claim that one could predict another. For example, outbreaks of food poisoning produce a curve that goes up then comes down, but that doesn’t mean they can tell us much useful about a major respiratory epidemic.
We also saw this happen dramatically in the run up to the 2008 financial crisis, when banks used overly simple distributions to model correlations between mortgage risks. One leading hedge fund reportedly kept an abacus in one of its conference rooms; there was a label next to it that read ‘correlation model’.
From racking up weights to racking up debts, it’s a good reminder that we shouldn’t rely too much on inappropriate distributions that don’t capture what’s really generating the patterns we observe.