Cum hoc ergo propter hoc: Wikis

Note: Many of our articles have direct quotes from sources you can cite, within the Wikipedia article! This article doesn't yet, but we're working on it! See more info or our list of citable articles.


(Redirected to Correlation does not imply causation article)

From Wikipedia, the free encyclopedia

"Correlation does not imply causation" is a phrase used in science and statistics to emphasize that correlation between two variables does not automatically imply that one causes the other (though it does not remove the fact that correlation can still be a hint, whether powerful or otherwise[1][2]). The opposite phrase, correlation proves causation, is a logical fallacy by which two events that occur together are claimed to have a cause-and-effect relationship. The fallacy is also known as cum hoc ergo propter hoc (Latin for "with this, therefore because of this") and false cause. By contrast, the fallacy post hoc ergo propter hoc requires that one event occur before the other and so may be considered a type of cum hoc.

In a widely-studied example, numerous epidemiological studies showed that women who were taking combined hormone replacement therapy (HRT) also had a lower-than-average incidence of coronary heart disease (CHD), leading doctors to propose that HRT was protective against CHD. But randomized controlled trials showed that HRT caused a small and significant increase in risk of CHD. Re-analysis of the data from the epidemiological studies showed that women undertaking HRT were more likely to be from higher socio-economic groups (ABC1), with better than average diet and exercise regimes. The use of HRT and decreased incidence of coronary heart disease were coincident effects of a common cause (i.e., the benefits associated with a higher socioeconomic status), rather than cause and effect as had been supposed.[3]



In the mathematical sense, it is always correct to say "Correlation does not imply causation". However, the word "imply" in casual use loosely means suggests rather than requires. The idea that correlation and causation are connected is certainly true; correlation is required for causation.

However, in logic, the technical use of the word "implies" means:

To be a sufficient circumstance.

This is the meaning intended by statisticians when they say causation is not certain. Indeed, p implies q has the technical meaning of logical implication: if p then q symbolized as p → q. That is "if circumstance p is true, then q necessarily follows."

In contrast, the everyday English meaning of "imply" is:

To indicate or suggest.

Edward Tufte, in a criticism of the brevity of Microsoft PowerPoint presentations, deprecates the use of "is" to relate correlation and causation (as in "Correlation is not causation"), citing its inaccuracy as incomplete.[1] While it is not the case that correlation is causation, simply stating their nonequivalence omits information about their relationship. Tufte suggests that the shortest true statement that can be made about causality and correlation must be at least expanded to either:

Empirically observed covariation is a necessary but not sufficient condition for causality.


Correlation is not equal to causation; it is only a requirement for it.

General pattern

The cum hoc ergo propter hoc logical fallacy can be expressed as follows:

  1. A occurs in correlation with B.
  2. Therefore, A causes B.

In this type of logical fallacy, one makes a premature conclusion about causality after observing only a correlation between two or more factors. Generally, if one factor (A) is observed to only be correlated with another factor (B), it is sometimes taken for granted that A is causing B even when no evidence supports this. This is a logical fallacy because there are at least five possibilities:

  1. A may be the cause of B.
  2. B may be the cause of A.
  3. some unknown third factor C may actually be the cause of both A and B.
  4. there may be a combination of the above three relationships. For example, B may be the cause of A at the same time as A is the cause of B (contradicting that the only relationship between A and B is that A causes B). This describes a self-reinforcing system.
  5. the "relationship" is a coincidence or so complex or indirect that it is more effectively called a coincidence (i.e. two events occurring at the same time that have no direct relationship to each other besides the fact that they are occurring at the same time). A larger sample size helps to reduce the chance of a coincidence, unless there is a systematic error in the experiment.

In other words, there can be no conclusion made regarding the existence or the direction of a cause and effect relationship only from the fact that A and B are correlated. Determining whether there is an actual cause and effect relationship requires further investigation, even when the relationship between A and B is statistically significant, a large effect size is observed, or a large part of the variance is explained.


B causes A (reverse causation)

The more firemen fighting a fire, the bigger the fire is going to be.
Therefore firemen cause fire.

The above example is simple and easy to understand. The strong correlation between the number of firemen at a scene and the size of the fire that is present does not imply that the firemen cause the fire. Firemen are sent according to the severity of the fire and if there is a large fire, a greater number of firemen are sent; therefore it is rather that fire causes firemen to arrive at the scene.

Third factor C (the common-causal variable) causes both A and B

All these examples deal with a lurking variable, which is simply a hidden third variable that affects both clauses of the correlation; for example, the fact that it is summer in Example 3.

Example 1
Sleeping with one's shoes on is strongly correlated with waking up with a headache.
Therefore, sleeping with one's shoes on causes headache.

The above example commits the correlation-implies-causation fallacy, as it prematurely concludes that sleeping with one's shoes on causes headache. A more plausible explanation is that both are caused by a third factor, in this case alcohol intoxication, which thereby gives rise to a correlation.

Example 2
Young children who sleep with the light on are much more likely to develop myopia in later life.

The former is a recent scientific example that resulted from a study at the University of Pennsylvania Medical Center. Published in the May 13, 1999 issue of Nature[4], the study received much coverage at the time in the popular press.[5] However, a later study at The Ohio State University did not find a link between infants sleeping with the light on and development of myopia. It did find a strong link between parental myopia and the development of child myopia, also noting that myopic parents were more likely to leave a light on in their children's bedroom.[6][7][8][9] In this case, the cause of both conditions is parental myopia.

Example 3
As ice cream sales increase, the rate of drowning deaths increases sharply.
Therefore, ice cream causes drowning.

The aforementioned example fails to recognize the importance of time in relationship to ice cream sales. Ice cream is sold during the summer months at a much greater rate, and it is during the summer months that people are more likely to engage in activities involving water, such as swimming. The increased drowning deaths are simply caused by more exposure to water based activities, not ice cream.


With a decrease in the number of pirates, there has been an increase in global warming over the same period.
Therefore, global warming is caused by a lack of pirates.

The example above is used satirically by the parody religion Pastafarianism to illustrate the logical fallacy of assuming that correlation equals causation.

Since the 1950s, both the atmospheric CO2 level and crime levels have increased sharply.
Hence, atmospheric CO2 causes crime.

The above example arguably makes the mistake of prematurely concluding a causal relationship where the relationship between the variables, if any, is so complex it may be labeled coincidental. The two events have no simple relationship to each other beside the fact that they are occurring at the same time. Another possible example is the somewhat jocular Mierscheid Law.

A causes B and B causes A

Increased pressure results in increased temperature.
Therefore pressure causes temperature.

The ideal gas law, PV=nRT describes the direct relationship between pressure and temperature (along with other factors) to show that there is a direct correlation between the two properties. Given a fixed mass, an increase in temperature will cause an increase in pressure; likewise, increased pressure will cause an increase in temperature. This demonstrates (4) in that the two are directly proportional to each other and not independent functions.

Determining causation

David Hume argued that causality is based on experience, and experience similarly based on the assumption that the future models the past, which in turn can only be based on experience - leading to a vicious circular logic. In conclusion he asserted that causality is not based on actual reasoning: only correlation can actually be perceived.[10]

Intuitively, causation seems to require not just a correlation, but a counterfactual dependence. Suppose that a student performed poorly on a test and guesses that the cause was his not studying. To prove this, one thinks of the counterfactual - the same student writing the same test under the same circumstances but having studied the night before. If one could rewind history, and change only one small thing (making the student study for the exam), then causation could be observed (by comparing version 1 to version 2). Because one cannot rewind history and replay events after making small controlled changes, causation can only be inferred, never exactly known. This is referred to as the Fundamental Problem of Causal Inference - it is impossible to directly observe causal effects.[11]

A major goal of scientific experiments and statistical methods is to approximate as best as possible the counterfactual state of the world.[12] For example, one could run an experiment on identical twins who were known to consistently get the same grades on their tests. One twin is sent to study for six hours while the other is sent to the amusement park. If their test scores suddenly diverged by a large degree, this would be strong evidence that studying (or going to the amusement park) had a causal effect on test scores. In this case, correlation between studying and test scores would almost certainly imply causation.

Well-designed experimental studies replace equality of individuals as in the previous example by equality of groups. This is achieved by randomization of the subjects to two or more groups. Although not a perfect system, placing the subjects randomly in the treatment/placebo groups ensures that it is highly likely that the groups are reasonably equal in all relevant aspects. If the treatment has a significantly different effect than the placebo, one can conclude that the treatment is likely to have a causal effect on the disease. This likeliness can be quantified in statistical terms by the P-value.

See also


  1. ^ a b Tufte, Edward R. (2006), The Cognitive Style of PowerPoint: Pitching Out Corrupts Within, Cheshire, Connecticut: Graphics Press, pp. 5, ISBN 0-9613921-5-0,  
  2. ^ Aldrich, John (1995), "Correlations Genuine and Spurious in Pearson and Yule", Statistical Science 10 (4): 364–376, doi:10.2307/2246135 (inactive 2009-07-27),  
  3. ^ Lawlor DA, Davey Smith G, Ebrahim S (June 2004). "Commentary: the hormone replacement-coronary heart disease conundrum: is this the death of observational epidemiology?". Int J Epidemiol 33 (3): 464–7. doi:10.1093/ije/dyh124. PMID 15166201.  
  4. ^ Quinn GE, Shin CH, Maguire MG, Stone RA (May 1999). "Myopia and ambient lighting at night". Nature 399 (6732): 113–4. doi:10.1038/20094. PMID 10335839.  
  5. ^ CNN, May 13, 1999. Night-light may lead to nearsightedness
  6. ^ Ohio State University Research News, March 9, 2000. Night lights don't lead to nearsightedness, study suggests
  7. ^ Zadnik K, Jones LA, Irvin BC, et al. (March 2000). "Myopia and ambient night-time lighting". Nature 404 (6774): 143–4. doi:10.1038/35004661. PMID 10724157.  
  8. ^ Gwiazda J, Ong E, Held R, Thorn F (March 2000). "Myopia and ambient night-time lighting". Nature 404 (6774): 144. doi:10.1038/35004663. PMID 10724158.  
  9. ^ Stone; et al. (March 2000), "Myopia and ambient night-time lighting", Nature 404 (6774): 144, doi:10.1038/35004665  
  10. ^ David Hume (Stanford Encyclopedia of Philosophy)
  11. ^ Paul W. Holland. 1986. "Statistics and Causal Inference" Journal of the American Statistical Association, Vol. 81, No. 396. (Dec., 1986), pp. 945-960.
  12. ^ Judea Pearl. 2000. Causality: Models, Reasoning, and Inference, Cambridge University Press.

External links

Got something to say? Make a comment.
Your name
Your email address