There are a couple of concepts about confounding variables that we must understand:
  1. A confounding variable is a variable that is associated both with the exposure and the outcome. If it is only associated with the exposure but not with the outcome, then it cannot be a confounding variable.
  2. A confounding variable is always conceptual; however, in the conceptual pathway that connects the exposure and the outcome, the confounding variable cannot be an intermediate factor.
For example, if we consider a pathway that connects smoking and lung cancer, and as we know, high smoking causes high concentrations of cotinine in urine, then, cotinine cannot be used as a confounding variable for the association between smoking and lung cancer, even though cotinine is associated both with smoking and lung cancer. If we know that a third variable, Z, is a potential confounder between X and Y where X is the exposure variable and Y is the outcome variable, we must control for confounding variables. If we consider a randomised controlled trial, then, the process of randomisation takes care of observed and unobserved confounding variable as randomisation ensures that we have a common pool of participants who would or would not develop the disease or outcome of interest and we are allocating the intervention based on a random numbers table and this allocation is by chance alone. In cases of observational studies, we can control for confounding variables, in three ways:
  1. By restriction on the confounding variables. Say gender is a confounding variable in the study on the association between smoking and lung cancer. In that case, we can restrict the study only on males or females, but not both as gender is a confounding variable. The downside of this is, we cannot extrapolate the study findings based on females alone to males.
  2. We can match on the confounding variables. If we choose to conduct a study on smoking and lung cancer and we know that both gender and age are potential confounding variables, then we can conduct a study where we take information from equal number of males and match participants for age.
  3. We can use multivariate analysis after we have obtained data from the participants and we analyse the association between the exposure and the outcome. ## Hill's Criteria May epidemiological or clinical studies are causal; however other forms of studies are non-causal. Some non-causal studies would still be about valid associations, such as they would test if there were biased observations or if all confounding variables were accounted for. Therefore it is essential that we should be able to distinguish between causal and non-causal linkages. After we have controlled for the confounding variable, we have eliminated biases, and we have ruled out the play of chance in the association between X and Y, we should examine the possibility that this association is one of cause and effect. While no "checklist" is possible, we can be guided by the work of Sir Austin Bradford Hill, the 20th century British mathematician who, in a lecture delivered at the British Occupational Health Society meeting in 1965, discussed nine conditions. Sir Austin Hill described these as considerations (Hill 1965). These considerations are briefly described as follows.
Strength of Association. -- Strength of association denotes the extent to which the exposure and the outcome are linked to each other. The stronger an exposure is linked to an outcome, the less likely it is that another exposure variable would be able to explain this association, as this other variable would have to be even stronger in its association. The strength of association is determined by two factors: prevalence of the exposure variable, and the association between the exposure and the outcome. This association is in the form of relative risk or odds ratios. This is expressed in the form of population attributable risk. When these percentages are added up, they can exceed 100%, and this is because people can be simultaneously exposed to more than one risk factors. They can also be less than 100%, indicating that there are, other unknown factors that can be referred to as risk factors. If the prevalence of a risk factor is very high, it is not necessary that they should have very high relative risk to be causal. For example, in the association between exposure to inorganic arsenic and bladder cancer, we see that the average relative risk is about 1.55, but because the prevalence of inorganic arsenic exposure is very high in populations, therefore, the relative contribution of arsenic for bladder cancer where bladder cancer is highly prevalent, can be very high.
Consistency. -- this clause of consideration indicates that there should be a consistent pattern of association between the exposure variable of interest and the outcome. For example, if we consider tobacco smoking to be a cause for lung cancer, then we should find a consistent pattern in the association between tobacco smoking and lung cancer in all or majority of the studies conducted to test the association, even though they may be conducted in different populations, and under different circumstances.
Specificity. -- This refers to an intuitive notion that if X is to be a cause of Y, then there has to be a one-on-one relationship between X and Y. For example, if smoking is to be a cause of lung cancer, then for every lung cancer, we should find cigarette smoking. Now, as we know that one exposure can cause many diseases, this is a weak clause or consideration.
Temporality. -- Temporality refers to the situation that a cause must precede the effect or the outcome. Of all the different considerations that Hill proposed, this is perhaps the most robust one in the sense, that it can be both a necessary and a sufficient criterion if we want to establish a cause and effect linkage between an exposure and a disease condition or an intervention and a health effect
Coherence. -- By coherence, Hill meant that if we propose a cause and effect relationship between two entities or situations, then we should be able to explain such a relationship in a coherent manner or at least there should be a sensible or biological reality to it. Think cigarette smoking and lung cancer. Cigarette smoke contains nicotine but it also contains tar and other substances that are dangerous to human health (C. Smith, Livingston, and Doolittle 1997). As these are in the form of smoke that reaches the lung cavities, therefore it makes sense when we claim that smoking is associated with lung disease. But this is not always the case. When the association between ingested arsenic in drinking water and lung cancer linkage was first proposed by Allan Smith et al, they had to argue against the contention that while it was obvious that inhaled arsenic would be believable that it could cause cancer, was there reason to believe that ingested arsenic would cause cancer? Smith and colleagues showed that indeed, ingested arsenic would be as dangerous as inhaled arsenic (A. H. Smith et al. 2009).
Biological gradient. -- According to Hill, biological gradient would mean that as the "dose" of the exposure would increase, so would there be a corresponding increase in the effect if there would be a cause and effect association between an exposure and an outcome variable. This is intuitive and indeed, this is the basis of dose response curve we get to see in a number of situations. However, in reality, the nature of the association can be causal yet the dose response curve will not always have to be linear. A linear relationship would predict that as the dose of the exposure would increase, so would there be an increase in the effect size. This is not always true as there can be a ceiling effect in some cases; for example, if the dose would increase, the effect would also increase up to a certain extent and then it would hit a maximum and further increment in the dosage would not lead to a corresponding increase in the effect (Figure 3)