Casual Inference
What is Causal Inference?
Causal inference is the process of identifying and quantifying the causal effect of one variable on another. It involves using statistical methods, study designs, and theoretical frameworks to establish causality while accounting for confounding factors, potential biases, and the limitations of observational data.
The Basic Idea
Many of the questions we ask have to do with cause and effect. What causes hiccups? Does raising the minimum wage cause unemployment? Will exposure to germs build immunity? Does social media contribute to anxiety? Can eating raw garlic help acne? We always want to understand why things happen or what causes certain phenomena. And our fascination with these questions is not unjustified! Identifying the source of an effect is critical to our understanding of the world. More than simply satisfying our curiosity, it informs decision-making in our individual lives and on a much larger, societal scale.
But how do we decipher these cause-and-effect relationships? This is where causal inference comes in. Causal inference is a methodological approach that spans various disciplines, including statistics, computer science, psychology, and social sciences. It involves using methods, processes, and theoretical frameworks for identifying causality or determining whether a cause-and-effect relationship occurs between two variables.1 Importantly, causal inference aims to eliminate all other possible causes for the observed effect, which is understandably challenging when exploring complex systems like healthcare, economics, and environmental science.
Causal inference is easiest to attain in clinical or laboratory settings. The gold-standard research method for identifying a causal relationship is the randomized controlled trial (RCT) where a treatment (a single variable of interest) is randomly assigned to participants. At the same time, all other factors are held constant. If the study shows a statistically significant effect from manipulating the variable, there is evidence of a causal relationship.
Unfortunately, it can be difficult—and even unethical—to conduct large-scale experiments that accurately reflect real-world phenomena. Complex systems like social programs or public health initiatives are not great candidates for RCTs. The good news is that researchers have developed tools and methods to infer causation in observational settings where RCTs are not possible.2
Example: Air Pollution and Respiratory Disease
Say researchers want to determine if air pollution causes an increase in respiratory diseases in a particular city. Simply running a traditional statistical analysis might identify a relationship between these variables, but this is not enough to infer causality. Perhaps you’ve heard the cautionary phrase, “correlation does not imply causation,” which sums up how a relationship between two variables does not necessarily mean that one variable causes the other.
In our example, researchers would use causal inference methods to estimate the effect of air pollution on the rate of respiratory disease while controlling for potential confounding variables that may influence the relationship. These methods might include:
- Directed Acyclic Graphs (DAGs): Visual tools that map the relationship between variables.3 For example, researchers might create a DAG to show connections between air pollution, respiratory disease, and other related variables like weather patterns, seasons, smoking rates, and cardiovascular health. This would help the researchers determine which variables need to be controlled for when estimating a causal effect.
- Propensity Score Analysis: A statistical matching technique used to estimate the effect of an intervention (in this case, air pollution) by attempting to isolate it from other variables. In our example, this might mean identifying people with similar likelihoods of being exposed to air pollution—based on characteristics like income and smoking habits—and then comparing their health outcomes.
- Time-Series Analysis: A method used to analyze data collected over time and identify patterns. For instance, historical data could help researchers identify if higher levels of air pollution were followed by higher rates of respiratory disease in the past.
- Difference in Differences: A statistical technique that compares outcomes between a treatment group and a control group over time. In our example, researchers might compare changes in respiratory disease rates in a city where air pollution levels spiked to a similar city where they stayed constant. If disease rates increased in the former city but stayed constant in the latter, it could suggest a causal relationship.
These methods—and many others—allow researchers to draw conclusions about causality in complex settings, providing us with a better understanding of real-world causal relationships.
About the Author
Annika Steele
Annika completed her Masters at the London School of Economics in an interdisciplinary program combining behavioral science, behavioral economics, social psychology, and sustainability. Professionally, she’s applied data-driven insights in project management, consulting, data analytics, and policy proposal. Passionate about the power of psychology to influence an array of social systems, her research has looked at reproductive health, animal welfare, and perfectionism in female distance runners.