C

At some point, we've all likely heard the cautionary assertion that correlation is not causation. It sounds reasonable so we tend to accept the assertion, but what does it really mean? And is it always true?To answer these questions, we first need to understand what the terms mean and how they are distinguished from one another. Correlation is a mathematical representation that summarizes the measured association between variables. In simpler terms, it's a number between -1 and 1 that describes what happens to one variable (let's call this variable y) when another variable changes (let's call this one x). Causation takes correlation a bit further by demanding more from our variables than a basic association. Causation requires that at least part of the the change we see in variable y is actually due to changes in variable x. In other words, a change in one variable has actually caused a change in the other, hence the term causal.First, let's look at how correlations between variables can be misleading. The scatterplot in Fig. 1 shows simulated data from a sample of 50 elementary students, grades 1-6. The plot shows two variables for each student: a measure of shoe size along the x-axis (var.x) and performance on a common math test along the y-axis (var.y). Each point in the plot represents the intersection between those variables for each student in our simulated sample. The association between these variables is clear, as shoe size (x) increases, so do our math scores (y). There is a rather wide range in math scores across shoe sizes, but this range doesn't throw off the overall association demonstrated by the linear increase indicated by the blue line of best fit. To further reinforce this association, we can look at the calculated correlation statistic between shoe size and math performance [r(xy)=.74]. [If this statistic is unfamiliar, see Linear Association and Correlation.] This is a strong correlation, certainly something to take notice of, and provides further evidence for the association between shoe size and math performance within our sample.