Casual Inference
What is Causal Inference?
Causal inference is the process of identifying and quantifying the causal effect of one variable on another. It involves using statistical methods, study designs, and theoretical frameworks to establish causality while accounting for confounding factors, potential biases, and the limitations of observational data.
The Basic Idea
Many of the questions we ask have to do with cause and effect. What causes hiccups? Does raising the minimum wage cause unemployment? Will exposure to germs build immunity? Does social media contribute to anxiety? Can eating raw garlic help acne? We always want to understand why things happen or what causes certain phenomena. And our fascination with these questions is not unjustified! Identifying the source of an effect is critical to our understanding of the world. More than simply satisfying our curiosity, it informs decision-making in our individual lives and on a much larger, societal scale.
But how do we decipher these cause-and-effect relationships? This is where causal inference comes in. Causal inference is a methodological approach that spans various disciplines, including statistics, computer science, psychology, and social sciences. It involves using methods, processes, and theoretical frameworks for identifying causality or determining whether a cause-and-effect relationship occurs between two variables.1 Importantly, causal inference aims to eliminate all other possible causes for the observed effect, which is understandably challenging when exploring complex systems like healthcare, economics, and environmental science.
Causal inference is easiest to attain in clinical or laboratory settings. The gold-standard research method for identifying a causal relationship is the randomized controlled trial (RCT) where a treatment (a single variable of interest) is randomly assigned to participants. At the same time, all other factors are held constant. If the study shows a statistically significant effect from manipulating the variable, there is evidence of a causal relationship.
Unfortunately, it can be difficult—and even unethical—to conduct large-scale experiments that accurately reflect real-world phenomena. Complex systems like social programs or public health initiatives are not great candidates for RCTs. The good news is that researchers have developed tools and methods to infer causation in observational settings where RCTs are not possible.2
Example: Air Pollution and Respiratory Disease
Say researchers want to determine if air pollution causes an increase in respiratory diseases in a particular city. Simply running a traditional statistical analysis might identify a relationship between these variables, but this is not enough to infer causality. Perhaps you’ve heard the cautionary phrase, “correlation does not imply causation,” which sums up how a relationship between two variables does not necessarily mean that one variable causes the other.
In our example, researchers would use causal inference methods to estimate the effect of air pollution on the rate of respiratory disease while controlling for potential confounding variables that may influence the relationship. These methods might include:
- Directed Acyclic Graphs (DAGs): Visual tools that map the relationship between variables.3 For example, researchers might create a DAG to show connections between air pollution, respiratory disease, and other related variables like weather patterns, seasons, smoking rates, and cardiovascular health. This would help the researchers determine which variables need to be controlled for when estimating a causal effect.
- Propensity Score Analysis: A statistical matching technique used to estimate the effect of an intervention (in this case, air pollution) by attempting to isolate it from other variables. In our example, this might mean identifying people with similar likelihoods of being exposed to air pollution—based on characteristics like income and smoking habits—and then comparing their health outcomes.
- Time-Series Analysis: A method used to analyze data collected over time and identify patterns. For instance, historical data could help researchers identify if higher levels of air pollution were followed by higher rates of respiratory disease in the past.
- Difference in Differences: A statistical technique that compares outcomes between a treatment group and a control group over time. In our example, researchers might compare changes in respiratory disease rates in a city where air pollution levels spiked to a similar city where they stayed constant. If disease rates increased in the former city but stayed constant in the latter, it could suggest a causal relationship.
These methods—and many others—allow researchers to draw conclusions about causality in complex settings, providing us with a better understanding of real-world causal relationships.
Key Terms
Correlation: A statistical relationship between two variables in which a change in one variable is associated with a change in the other. Importantly, correlation does not imply that one variable caused the other to change.
Causation: Indicates that one variable directly affects another. Unlike correlation, causation suggests a cause-and-effect relationship between two variables.
Randomized Controlled Trial (RCT): A study design that involves randomly assigning participants to an experimental group that receives treatment or a control group that does not receive treatment. As a result, the only expected difference between the two groups is the variable being studied. This allows researchers to attribute differences in the outcome to the variable itself.
Quasi-Experiment: A study design used to estimate the causal effect of a variable on a target population without randomly assigning participants to treatment or control groups. Quasi-experiments offer an alternative to RCTs for estimating cause-and-effect relationships through observational data.4
The Counterfactual Model: Also called the potential outcome model, the counterfactual model allows researchers to predict how an outcome might change under different hypothetical circumstances and compare these differences to infer causality.5
Confounding Variables: Also called confounders, confounding variables are unmeasured variables that influence the relationship between the variables being studied. In a study examining the effects of smoking on heart disease, confounders might include age and socioeconomic status.
Sensitivity Analysis: A process of assessing the sensitivity of study results to potential unobserved confounding variables.6 Sensitivity analysis techniques are used to determine how strong the effects of an unobserved confounder would have to be to change the outcome. This is key when using non-experimental methods where it’s difficult to control for confounders.
History
Causality has been discussed among scientific researchers for centuries. For instance, Koch’s postulates, a set of criteria used to assess whether a microorganism causes a disease, have been used since the 19th century to determine if certain microbes cause certain diseases.7 These early types of methods for establishing causality differ from modern, formal methods of causal inference that rely on more reliable statistical approaches.
Sewall Wright was one of the first researchers to articulate causal assumptions mathematically.8 In 1921, Wright introduced a method called path diagrams to extract causation from correlational data and represent causal relationships on directed graphs. His work laid the foundation for future causal modeling techniques.
Around the 1970s, statistics professor Donald Rubin introduced a framework called the Rubin causal model (or the Neyman-Rubin causal model) built on earlier work by Jerzy Neyman. This model was designed to determine causality in observational and experimental studies. More specifically, it provides a method for researchers to compare potential outcomes of hypothetical scenarios, or counterfactuals, even though only one outcome can be observed in reality.
Later, in the late 20th century, computer scientist Judea Pearl introduced directed acyclic graphs (DAGs). As touched on earlier, these diagrams visually represent causal relationships and indicate the direction of causality between variables.8 DAGs allow researchers to identify confounding variables, which is essential for distinguishing causation from correlation and predicting the effects of interventions through counterfactual reasoning.
Consequences
Public Health
Introducing causal inference to scientific research has significant public health consequences. By uncovering causal connections between risk factors and diseases, researchers can explore potential interventions and reduce rates of illness.9 For example, identifying smoking as a cause of lung cancer led to public health interventions like cigarette taxes and smoking cessation programs which helped reduce lung cancer rates.
Causal inference is also valuable for evaluating how effective certain interventions might be. In drug discovery and development, causal inference techniques help researchers predict the outcomes of medical interventions and determine which interventions are most effective before implementing them at scale.10
Business Research
Today, many businesses conduct their own research, often in the form of randomized A/B tests. While A/B tests are perfect for determining which variant of a product or marketing strategy performs best, they aren’t great at capturing long-term effects or providing insight into complex interactions—like how combined changes to pricing and product features might impact customer engagement. This is where quasi-experiments and causal inference methods can benefit business.
Companies like Netflix, Amazon, and Uber have embraced causal inference methods to determine how certain business decisions or marketing moves might impact consumer behavior. For instance, Uber mentions using causal inference to determine how delivery delays might affect a customer’s future engagement with the platform.11 Causal inference can be a valuable tool to help businesses answer these kinds of questions by examining real-world user data rather than running experiments—which, in this case, would involve purposely delaying deliveries to study the impact on customer experience.
The Human Element
People naturally think in terms of cause and effect. We’re predisposed to seek evidence for why things occur as this helps us navigate the world and make predictions about the future—hence our fascination with “why” questions. Due to our tendency to infer cause, causal inference plays an incredibly important role in helping us distinguish between correlation and causation in research. The goal? Extract real truth from scientific studies.
Unfortunately, people still tend to make causal claims based on correlational evidence. We see this frequently in the media reports of scientific findings where headlines often mention causal relationships when there is only correlational evidence. In fact, a large body of research finds that causal theory errors are common when interpreting scientific findings.12
This is a significant problem. People often make decisions about what to do or what to believe based on research findings reported by the media. Can our reasoning be improved so we can avoid jumping to causal conclusions when we only have evidence of a correlational relationship?
One recent study explored this phenomenon among college students to identify potential educational interventions to improve our reasoning.12 The researchers noted that behavioral science students often interpret correlational findings with fairly low rates of error, specifically after taking psychology classes that expose them to correlational and experimental studies. The study found that a short and simple educational intervention about causal theory error significantly improved the students’ abilities to accurately distinguish between correlation and causation.
Being able to spot and understand true causal relationships is key to making decisions based on research. This is crucial for individual consumers, but also for business leaders and policymakers involved in designing and implementing interventions for business and societal issues.
Controversies
The Accuracy of Causal Inference
While causal inference is a valuable tool for understanding relationships between variables, it does come with a few key issues. For instance, debates often involve disagreements over the efficacy of causal inference methods. Critics point out weaknesses in determining causality in complex environments, such as when studying real-world observational data or when relying on the counterfactual model to make predictions about the future.
Controlling for confounding variables is particularly difficult when conducting quasi-experiments. In these experiments, causal inference often involves making assumptions when interpreting data, and this introduces the risk of bias.13 Some critics even worry that researchers could purposely manipulate statistical analyses to show causation where none exists. As with much of science, the field of causal inference is still developing new methodological tools to address the challenges and increase the validity of causal inference research.
Using AI to Predict Behavior
Artificial intelligence (AI) is often used to predict behavior. Unfortunately, like us humans, AI tools can fall into the same trap of equating correlation with causation. Here’s an example: machine learning programs are often used in risk prediction software to estimate people’s future medical needs. One widely used algorithm attempts to flag patients who would benefit from extra medical care now based on how likely they are to require medical care in the future.15 The problem? An analysis of this particular software showed that black patients with more chronic illnesses than white patients were not flagged as needing extra care.
This occurred because the algorithm used insurance claims to predict people’s future health needs, but did not account for the fact that healthcare spending is typically lower on black Americans than white Americans. This could be due to confounding variables like lack of insurance or systemic barriers to healthcare access. The algorithm assumed that lower healthcare spending meant people had fewer health conditions, but this causal relationship was not accurate for all populations.
The ultimate issue with relying on AI for causal inference is that predictive AI recognizes patterns, but this can lead to incorrect conclusions about causation. This issue has led to the development of causal AI programs that rely on causal models to identify cause-and-effect relationships rather than just correlational evidence. These programs are currently being explored to simulate scenarios and compare the potential effect of different interventions on an outcome.
Case Study
Do Short-Term Rentals Affect Rent Prices?
This Airbnb case study is an excellent illustration of the benefits of causal inference for distinguishing causation from correlation. In October 2023, the Conference Board of Canada released a report analyzing the impact of short-term rentals on rental prices across the country.15 They tested for a causal link between Airbnb activity and rent increases between 2016 and 2022 across 330 neighborhoods in 19 Canadian cities.
As you might expect, they observed a correlational relationship—Airbnb activity and rental prices rose together during this period. But when testing for a causal link, researchers found no meaningful impact of Airbnbs on rent increases. Of the 30% increase in rents over this time, at most less than 1% could be attributed to Airbnb activity.
This research follows moves from several Canadian cities and provinces to implement policies to regulate short-term rentals. While these policies have significantly reduced Airbnb activity (by nearly 50%), there is no evidence that this has resulted in lower rents.
These results may be surprising. For years, people have been concerned that Airbnb activity is reducing the housing supply and causing rents to surge. This report highlights the importance of using causal inference to determine causation instead of making assumptions about causation based on correlational evide
Related TDL Content
Illusory Correlation
We often see causation when there is only a correlation between two variables, but sometimes we see a correlation when there is no real association between these two variables at all. This phenomenon is known as illusory correlation. This article explores the concept of illusory correlation, why it occurs, and what you can do to avoid it.
How to Predict Mental Illnesses: The Digital Future of Mental Healthcare
With rates of mental health conditions at an all-time high, taking a proactive approach to mental health treatment is becoming increasingly crucial. Health-tracking apps present a valuable opportunity to analyze real-world health data and uncover digital indicators that might predict mental illness, allowing healthcare practitioners to address concerns early. Check out this article to learn how this might work.
References
- Pearl, J. (2010). An Introduction to Causal Inference. The International Journal of Biostatistics, 6(2). https://doi.org/10.2202/1557-4679.1203
- Plümper, T., Troeger, V. E., & Neumayer, E. (2019). Case selection and causal inferences in qualitative comparative research. PLoS ONE, 14(7). https://doi.org/10.1371/journal.pone.0219727
- Tilden, E. L., & Snowden, J. M. (2018). The causal inference framework: A primer on concepts and methods for improving the study of well-woman childbearing processes. Journal of Midwifery & Women's Health, 63(6), 700. https://doi.org/10.1111/jmwh.12710
- Moss, H. A., Melamed, A., & Wright, J. D. (2019). Measuring cause-and-effect relationships without randomized clinical trials: Quasi-experimental methods for gynecologic oncology research. Gynecologic Oncology, 152(3), 533-539. https://doi.org/10.1016/j.ygyno.2018.11.006
- Höfler, M. (2005). Causal inference based on counterfactuals. BMC Medical Research Methodology, 5, 28. https://doi.org/10.1186/1471-2288-5-28
- Liu, W., Kuramoto, S. J., & Stuart, E. A. (2013). An Introduction to Sensitivity Analysis for Unobserved Confounding in Non-Experimental Prevention Research. Prevention Science : The Official Journal of the Society for Prevention Research, 14(6), 570. https://doi.org/10.1007/s11121-012-0339-5
- Segre, J. A. (2013). What does it take to satisfy Koch’s postulates two centuries later? Microbial genomics and Propionibacteria acnes. The Journal of Investigative Dermatology, 133(9), 2141. https://doi.org/10.1038/jid.2013.260
- Pearl, J. (2022). Causal Inference: History, Perspectives, Adventures, and Unification (An Interview with Judea Pearl). Observational Studies 8(2), 23-36. https://dx.doi.org/10.1353/obs.2022.0007
- Glass, T. A., Goodman, S. N., Hernán, M. A., & Samet, J. M. (2013). Causal Inference in Public Health. Annual Review of Public Health, 34, 61. https://doi.org/10.1146/annurev-publhealth-031811-124606
- Michoel, T., & Zhang, J. D. (2023). Causal inference in drug discovery and development. Drug Discovery Today, 28(10), 103737. https://doi.org/10.1016/j.drudis.2023.103737
- Harinen, T., & Li, B. (2019, June 19). Using Causal Inference to Improve the Uber User Experience. Uber. Retrieved August 20, 2024, from https://www.uber.com/en-CA/blog/causal-inference-at-uber/
- Seifert, C. M., Harrington, M., Michal, A. L., & Shah, P. (2022). Causal theory error in college students’ understanding of science studies. Cognitive Research: Principles and Implications, 7. https://doi.org/10.1186/s41235-021-00347-5
- Hammerton, G., & Munafò, M. R. (2021). Causal inference with observational data: The need for triangulation of evidence. Psychological Medicine, 51(4), 563-578. https://doi.org/10.1017/S0033291720005127
- Sgaier, S. K., Huang, V., & Charles, G. (2020). The Case for Causal AI. Stanford Social Innovation Review, 18(3), 50-55. https://doi.org/10.48558/KT81-SN73
- Conference Board of Canada (2023). Airbnb activity and rental markets in Canada: Analyzing the impact of short-term rentals. The Conference Board of Canada.
About the Author
Annika Steele
Annika completed her Masters at the London School of Economics in an interdisciplinary program combining behavioral science, behavioral economics, social psychology, and sustainability. Professionally, she’s applied data-driven insights in project management, consulting, data analytics, and policy proposal. Passionate about the power of psychology to influence an array of social systems, her research has looked at reproductive health, animal welfare, and perfectionism in female distance runners.