Casual Inference

What is Causal Inference?

Causal inference is the process of identifying and quantifying the causal effect of one variable on another. It involves using statistical methods, study designs, and theoretical frameworks to establish causality while accounting for confounding factors, potential biases, and the limitations of observational data.

The Basic Idea

Many of the questions we ask have to do with cause and effect. What causes hiccups? Does raising the minimum wage cause unemployment? Will exposure to germs build immunity? Does social media contribute to anxiety? Can eating raw garlic help acne? We always want to understand why things happen or what causes certain phenomena. And our fascination with these questions is not unjustified! Identifying the source of an effect is critical to our understanding of the world. More than simply satisfying our curiosity, it informs decision-making in our individual lives and on a much larger, societal scale.

But how do we decipher these cause-and-effect relationships? This is where causal inference comes in. Causal inference is a methodological approach that spans various disciplines, including statistics, computer science, psychology, and social sciences. It involves using methods, processes, and theoretical frameworks for identifying causality or determining whether a cause-and-effect relationship occurs between two variables.1 Importantly, causal inference aims to eliminate all other possible causes for the observed effect, which is understandably challenging when exploring complex systems like healthcare, economics, and environmental science.

Causal inference is easiest to attain in clinical or laboratory settings. The gold-standard research method for identifying a causal relationship is the randomized controlled trial (RCT) where a treatment (a single variable of interest) is randomly assigned to participants. At the same time, all other factors are held constant. If the study shows a statistically significant effect from manipulating the variable, there is evidence of a causal relationship.

Unfortunately, it can be difficult—and even unethical—to conduct large-scale experiments that accurately reflect real-world phenomena. Complex systems like social programs or public health initiatives are not great candidates for RCTs. The good news is that researchers have developed tools and methods to infer causation in observational settings where RCTs are not possible.2

Example: Air Pollution and Respiratory Disease

Say researchers want to determine if air pollution causes an increase in respiratory diseases in a particular city. Simply running a traditional statistical analysis might identify a relationship between these variables, but this is not enough to infer causality. Perhaps you’ve heard the cautionary phrase, “correlation does not imply causation,” which sums up how a relationship between two variables does not necessarily mean that one variable causes the other.

In our example, researchers would use causal inference methods to estimate the effect of air pollution on the rate of respiratory disease while controlling for potential confounding variables that may influence the relationship. These methods might include:

  • Directed Acyclic Graphs (DAGs): Visual tools that map the relationship between variables.3 For example, researchers might create a DAG to show connections between air pollution, respiratory disease, and other related variables like weather patterns, seasons, smoking rates, and cardiovascular health. This would help the researchers determine which variables need to be controlled for when estimating a causal effect.
  • Propensity Score Analysis: A statistical matching technique used to estimate the effect of an intervention (in this case, air pollution) by attempting to isolate it from other variables. In our example, this might mean identifying people with similar likelihoods of being exposed to air pollution—based on characteristics like income and smoking habits—and then comparing their health outcomes.
  • Time-Series Analysis: A method used to analyze data collected over time and identify patterns. For instance, historical data could help researchers identify if higher levels of air pollution were followed by higher rates of respiratory disease in the past.
  • Difference in Differences: A statistical technique that compares outcomes between a treatment group and a control group over time. In our example, researchers might compare changes in respiratory disease rates in a city where air pollution levels spiked to a similar city where they stayed constant. If disease rates increased in the former city but stayed constant in the latter, it could suggest a causal relationship.

These methods—and many others—allow researchers to draw conclusions about causality in complex settings, providing us with a better understanding of real-world causal relationships.

Data can tell you that the people who took a medicine recovered faster than those who did not take it, but they can’t tell you why. Maybe those who took the medicine did so because they could afford it and would have recovered just as fast without it.


Judea Pearl, The Book of Why: The New Science of Cause and Effect

About the Author

A smiling woman with long blonde hair is standing, wearing a dark button-up shirt, set against a backdrop of green foliage and a brick wall.

Annika Steele

Annika completed her Masters at the London School of Economics in an interdisciplinary program combining behavioral science, behavioral economics, social psychology, and sustainability. Professionally, she’s applied data-driven insights in project management, consulting, data analytics, and policy proposal. Passionate about the power of psychology to influence an array of social systems, her research has looked at reproductive health, animal welfare, and perfectionism in female distance runners.

About us

We are the leading applied research & innovation consultancy

Our insights are leveraged by the most ambitious organizations

Image

I was blown away with their application and translation of behavioral science into practice. They took a very complex ecosystem and created a series of interventions using an innovative mix of the latest research and creative client co-creation. I was so impressed at the final product they created, which was hugely comprehensive despite the large scope of the client being of the world's most far-reaching and best known consumer brands. I'm excited to see what we can create together in the future.

Heather McKee

BEHAVIORAL SCIENTIST

GLOBAL COFFEEHOUSE CHAIN PROJECT

OUR CLIENT SUCCESS

$0M

Annual Revenue Increase

By launching a behavioral science practice at the core of the organization, we helped one of the largest insurers in North America realize $30M increase in annual revenue.

0%

Increase in Monthly Users

By redesigning North America's first national digital platform for mental health, we achieved a 52% lift in monthly users and an 83% improvement on clinical assessment.

0%

Reduction In Design Time

By designing a new process and getting buy-in from the C-Suite team, we helped one of the largest smartphone manufacturers in the world reduce software design time by 75%.

0%

Reduction in Client Drop-Off

By implementing targeted nudges based on proactive interventions, we reduced drop-off rates for 450,000 clients belonging to USA's oldest debt consolidation organizations by 46%

Read Next

Notes illustration

Eager to learn about how behavioral science can help your organization?