Understanding Bias, Confounding & Interaction in Epidemiology: Types, Examples & Prevention Strategies

Introduction to Bias, Confounding and Interaction in Epidemiology
What is Bias in Epidemiology?
Types of Bias
What is Confounding in Epidemiology?
Effect Modification and Interaction
Strategies to minimize bias and confounding
References

Introduction to Bias, Confounding and Interaction in Epidemiology

In epidemiological studies, the results are intended to reflect the true association between exposure and the development of an outcome.
However, these findings may be influenced by alternative explanations such as random error, bias, or confounding.
Such influences can lead researchers to incorrect results and conclusions, including detecting a statistical association when none exists or missing a true association.
These issues—random error, bias, and confounding—are particularly common in observational study designs.
It is therefore essential to consider and address these factors during both the design and analysis phases to minimize their impact on the study.
Bias-related measurement errors can arise at various stages of an epidemiological investigation and can affect both the internal and external validity of the results.
Research bias, confounding variables, and interactions among variables can all influence the determination and strength of the association and causation within a study.
Researchers, epidemiologists, and public health professionals should be vigilant in minimizing or avoiding bias to ensure the reliability and validity of research outcomes.

What is Bias in Epidemiology?

In observational epidemiological studies, bias refers to a systematic error in the study design, data collection, or analysis.
This error affects the observed estimation of effects between an exposure and an outcome of interest.
Bias arises from systematic variation that leads to a consistent deviation from the true value, causing systematic error in measuring association and interaction.

Types of Bias

More than fifty types of bias have been identified in epidemiological studies.
These biases result from various errors that may occur from the beginning of the investigation through to the reporting of results.
Such biases can significantly affect the validity and reliability of study findings.
Despite the wide range, the most common forms of bias encountered in epidemiological studies are especially noteworthy and frequently addressed.

Selection Bias

Selection bias is a systematic error that occurs during the selection, identification, or screening of the study population based on exposure and health outcome.
It compromises the external validity of the study, leading to false conclusions about the research hypothesis.
As a result, the findings may become irrelevant to other populations and fail to accurately reflect the true relationship between exposure and outcome.
This bias arises when characteristics of individuals included in the study differ from those in the population to which the study’s results are meant to apply.
These differing characteristics are typically associated with either the exposure or the outcome under investigation.
In general, all forms of selection bias involve a variation in the exposure-outcome relationship between study participants and those who were eligible but not included in the research.

In case-control studies:

Selection bias is a common issue, resulting in non-comparability between cases and controls.
Controls are intended to represent the same population as the cases, but bias occurs when the selected controls do not accurately represent the source population of the cases.

In cohort studies:

Since exposed and unexposed groups are selected before the outcome occurs, selection bias is less likely.
However, it can still occur if there is variation in follow-up or case identification across different exposure groups.

In randomized trials:

Participants are randomly assigned to groups, reducing the likelihood of selection bias.
Still, withdrawals or refusals can introduce bias if the reasons for leaving the study are related to the exposure or outcome.

Information bias

Information bias, also known as measurement bias, is a type of error that occurs during the data collection phase of an epidemiological study.
It leads to deviations in the measurement of effects due to inaccurate measurement or misclassification of important variables such as exposure, outcome, or confounders.
This bias can distort the true relationship between variables and affect the validity of the study’s conclusions.

Misclassification bias

Misclassification bias occurs when individuals are placed into incorrect classification groups regarding their exposure or outcome status.
For example, exposed individuals may be misclassified as unexposed, and unexposed as exposed, leading to inaccurate estimates of sensitivity and specificity in detecting exposure and effect.
This bias can also result from missing data, errors in data entry, or random recording mistakes.
The main types of misclassification bias are:

Differential misclassification bias

Occurs when misclassification of exposure or outcome differs between comparison groups.
Misclassification of one category (exposure or outcome) is related to the other category.
This can distort the true relationship, leading to either an overestimation or underestimation of the association.

Non-differential misclassification bias

Occurs when misclassification is similar across all groups being compared.
Misclassification of one category is unrelated to the other, meaning it affects both groups equally.
Typically biases results toward the null, making it harder to detect a true association.

Detection bias

Most likely in follow-up studies like cohort studies and clinical trials.
Arises due to differences in how outcomes are measured or verified between groups.
May inflate or deflate the size of an effect depending on assessment differences.
Example: Men with larger prostates are less accurately diagnosed with prostate cancer, potentially underestimating the true link between obesity and prostate cancer risk.

Interviewer or observer bias

Occurs due to inconsistent data collection by researchers or observers.
May result from inadequate assessment of exposure or outcomes, knowledge of hypothesis, or interview technique.
Influencing factors include the medium of interviewing, prioritization of questions, and awareness of exposure/outcome status.
Example: A researcher may more thoroughly evaluate the group receiving a new diabetes treatment, introducing bias.

Recall bias

A type of information bias common in case-control studies.
Occurs when participants’ memory of exposure is influenced by disease status or treatment awareness.
Cases may over-report exposures due to health concerns; exposed individuals may exaggerate or accurately report symptoms due to awareness.
This leads to inaccurate exposure histories and biased associations.

Reporting bias

Happens when participants’ answers are influenced by researcher expectations or social sensitivity of questions.
Topics like stigmatized diseases, undesirable behavior, or family issues may affect how participants respond.
This alters the accuracy of self-reported data, skewing study results.

What is Confounding in Epidemiology?

Confounding refers to a type of distortion that occurs in epidemiological studies when the observed relationship between an exposure and an outcome is influenced by a third variable—known as a confounder. This additional variable can either exaggerate or mask the true association, potentially altering the direction or strength of the effect. Confounding is categorized into two types:

Positive confounding: When the observed association is stronger than the true association (shifted away from the null).
Negative confounding: When the observed association is weaker than the true association (shifted toward the null).

Confounding Variable

A confounding variable is an external factor that is associated with both the dependent variable (outcome/disease) and the independent variable (exposure). It can influence the disease risk independently, leading to incorrect estimates of the true relationship between exposure and outcome. The presence of a confounder may either inflate or reduce the observed effect, obscuring the actual connection between the variables under study.

Common confounders include age, gender, lifestyle factors, socioeconomic status, and ethnicity, especially when they have known associations with the health outcome being investigated.
A variable is considered a confounder if it meets all three of the following criteria:
It is not part of the causal pathway between exposure and outcome (i.e., it does not result from the exposure).
It is associated with both the exposure and the outcome, but it does not cause the exposure.
Its distribution varies between the groups being compared, creating an imbalance.

Example

Consider a study examining whether coffee drinking increases the risk of heart disease. If coffee drinkers are more likely to be smokers than non-coffee drinkers, and smoking independently increases heart disease risk, then smoking acts as a confounder.
The association observed between coffee consumption and heart disease could actually be due to the effect of smoking. In this case, the study may falsely attribute the increased risk of heart disease to coffee, rather than identifying smoking as the real contributor.

Effect Modification and Interaction

Effect modification is a biological phenomenon that contrasts with bias and confounding, as it represents a true causal effect.
It occurs when one exposure variable modifies the impact of another exposure variable on a specific outcome.
When effect modification is present, different population groups demonstrate different risk estimates.
Effect modification and interaction are often used interchangeably but are also defined as distinct concepts.
Interaction refers to a statistical phenomenon where the combined impact of a risk factor and a confounder exceeds the expected impact based on their individual effects.
Interaction takes place when the presence of a third variable influences the effect (magnitude or direction) of the association between two other variables.
For example, a drug effective in treating viral diseases in adults may become ineffective when used in children, indicating that age modifies the drug’s effectiveness.
Analyzing associations at each level of the third variable is a practical method for identifying and addressing interaction.

Strategies to minimize bias and confounding

Errors and biases are common in epidemiological studies and can lead to inaccurate measurements of association.
These inaccuracies impact both external and internal validity, potentially rendering the study useless or only cautiously useful.
To ensure specificity, reliability, and accuracy, various strategies should be applied to minimize bias and confounding.

Ways to minimize bias:

Develop well-standardized protocols handled by trained interviewers and researchers.
Use standard questionnaires with appropriate close-ended questions, specific response options, and consistent questioning across comparison groups.
Verify obtained data by checking against pre-existing documentation, records, or biomarkers.
Conduct pilot studies to identify and fix problems in questionnaires and other measurement tools.
Estimate the likelihood of misclassification bias to assess the potential for bias occurrence.

Methods to reduce confounding at the design stage:

Randomization: Randomly allocate participants into groups to ensure equal distribution of variables and limit potential confounders (especially in clinical trials).
Restriction: Limit study participation to individuals who are similar in terms of the confounding factor.
Matching: Select controls so that the distribution of confounders is similar to that of cases, using either pair matching or frequency matching.

Methods to reduce confounding at the analysis stage:

Stratification: Assess the exposure-outcome association within each level of the confounding variable, such as age or gender.
Multivariable analysis: Apply statistical models to adjust for multiple confounding variables simultaneously and evaluate their individual effects.
Standardization: Use a standard reference population to balance out the effects of confounders between study groups.

References

Porche, D. J. (2024, January 6). Epidemiologic design bias, confounders, and interaction. Retrieved from https://connect.springerpub.com/content/book/978-0-8261-8514-3/part/part03/chapter/ch14
Chapter 4. Measurement error and bias. (2020, October 28). The BMJ. Retrieved from https://www.bmj.com/about-bmj/resources-readers/publications/epidemiology-uninitiated/4-measurement-error-and-bias
Bias. (n.d.). Boston University School of Public Health. Retrieved from https://sphweb.bumc.bu.edu/otlt/mph-modules/ep/ep713_bias/ep713_bias_print.html
Biases and confounding. (n.d.). Health Knowledge. Retrieved from https://www.healthknowledge.org.uk/public-health-textbook/research-methods/1a-epidemiology/biases
Baker, C. (2023, December 1). The Wrecking Ball: Bias, confounding, interaction and effect modification. Retrieved from https://pressbooks.lib.vt.edu/epidemiology/chapter/the-wrecking-ball-bias-confounding-interaction-and-effect-modification/
Detection bias – Catalog of Bias. (2023, April 17). Retrieved from https://catalogofbias.org/biases/detection-bias/
Delgado-Rodriguez, M. (2004). Bias. Journal of Epidemiology & Community Health, 58(8), 635–641. https://doi.org/10.1136/jech.2003.008466
Confounding in epidemiological studies. (n.d.). Health Knowledge. Retrieved from https://www.healthknowledge.org.uk/node/803
Thomas, L. (2023, June 22). Confounding variables: Definition, examples & controls. Scribbr. Retrieved from https://www.scribbr.com/methodology/confounding-variables/
Bovbjerg, M. L. (2020, October 1). Bias. In Epidemiology: The Basic Science of Public Health. Oregon State University. Retrieved from https://open.oregonstate.education/epidemiology/chapter/bias/
Tulchinsky, T. H., & Varavikova, E. A. (2014). Measuring, monitoring, and evaluating the health of a population. In The New Public Health (3rd ed., pp. 91–147). Elsevier. https://doi.org/10.1016/B978-0-12-415766-8.00003-3