Inferential Statistics
What are Inferential Statistics?
Inferential statistics is a branch of statistics that allows researchers to make generalizations about a larger population based on a sample of data. By using techniques such as hypothesis testing and confidence intervals, inferential statistics helps estimate population parameters, test relationships between variables, and make predictions beyond the immediate dataset. This approach is crucial when collecting data from an entire population is impractical or impossible.
The Basic Idea
Have you ever read a statement like, “2.6 million Europeans are now vegan” and wondered, Wow, how did they find the time to ask everyone in Europe about their dietary choices?1 Well, they likely didn’t. Instead, almost all research relies on some level of inferential statistics, a method where researchers take a representative sample from the population they want to study and draw conclusions or make predictions based on that data.
When we’re first introduced to statistics, we usually learn about descriptive statistics, which summarize the main features of a data set. Descriptive statistics report characteristics of your data like the distribution concerns, the frequency of each value, the central tendency and averages, or the variability. In this method, there is usually less uncertainty since the statistics aim to directly describe the sample, without assuming anything beyond the data at hand. For example, if you counted the number of chocolate chips in every cookie in one batch, you could use descriptive statistics to get a quick overview of all the cookies in that batch—for instance, the average number of chocolate chips each cookie has. But keep in mind that the data you collect about this batch won’t necessarily help you make any inferences about another batch of cookies.
However, descriptive statistics isn’t always the best option. After all, sometimes you can only acquire data from smaller samples because it is too difficult, expensive, or outright impossible to collect data from the entire population that you’re interested in. For example, what if one recipe made 250 cookies? That would be a lot of cookies to eat in one day. Instead, inferential statistics allows us to make inferences using just a sample from an entire population. Using this method, you would only need to take a sample from just a few of those cookies, collect data on those, and then extrapolate (or infer information) to learn more about the rest of the cookies in that batch. As you can see, this approach is incredibly helpful when studying larger groups of objects or people, as it’s often far too time and resource-intensive to take data from an entire population.
With inferential statistics, it’s important to use random and unbiased sampling methods. If your sample isn’t representative of your population, then you can’t make valid statistical inferences to generalize about the rest of the population. For example, if you only choose the best-looking cookies to test, they may be bigger or have more chocolate chips than the rest of the batch—which would skew your data on what the average cookie from that recipe truly looks like. Instead, you’d want to make sure the cookies you tested were randomly selected. The same is true for any other use of inferential statistics; researchers often use software programs to ensure the sample they study is truly chosen at random in order to limit bias.
Beyond the realm of baking, inferential statistics has helped us make advances in countless fields. In areas like science, economics, and medicine, it’s allowed for informed, data-driven decision-making. Inferential statistics is especially key in designing experiments and analyzing the results, as these methods are applied by scientists to determine if their findings are statistically significant and can be generalized to a broader population. Businesses also use inferential statistics for market research, customer behavior analysis, and forecasting trends, improving strategic planning and targeting. Lastly, epidemiologists and healthcare researchers use inferential statistics when understanding relationships between treatments and outcomes, estimating population health trends, and working with policymakers to guide public health interventions.2
Key Terms
Population: The entire group of individuals or instances that the data is intended to describe. In inferential statistics, the overall goal is to make generalizations or inferences about the population based on data from a sample.
Sample: A subset of the population selected for measurement or observation. In inferential statistics, the sample serves as a representative part of the population from which inferences are drawn.
Parameter: A numerical characteristic of a population, such as a mean or standard deviation. Since population parameters are typically unknown, inferential statistics focuses on estimating these parameters using sample statistics. For example, a sample mean is used to estimate the population mean, and hypothesis tests may be conducted to determine the likelihood of certain parameter values based on the sample data.
Confidence Interval: A range of values derived from the sample that is believed to cover the true population parameter with a certain probability.
Hypothesis Testing: A method for testing predictions about a parameter in a population using the data measured in a sample. Usually, researchers begin an experiment by forming two different types of hypotheses: a null hypothesis (H0) and an alternative hypothesis (H1).
Null Hypothesis (H0): In hypothesis testing, this is the default assumption that there is no relationship between two measured variables. In other words, researchers may start by assuming, ‘the variables we’re studying don’t affect each other’ or ‘the outcome of our hypothesis won’t be true.’
Alternative Hypothesis (H1): In hypothesis testing, this is the statement that contradicts the null hypothesis, indicating that there is a statistically significant effect or relationship between the variables you’re studying.
P-Value: In hypothesis testing, this is the number that helps us decide whether the results from a study are meaningful or not. In other words, the p-value shows how likely it is that the results happened by chance. A smaller p-value (usually less than 0.05) means the results are less likely due to random chance, so we can be more confident that there’s a real effect or relationship in the data.
Type I Errors: A type I error (false-positive) occurs if an investigator rejects a null hypothesis that is actually true in the population.
Type II Errors: A type II error (false-negative) occurs if the investigator fails to reject a null hypothesis that is actually false in the population.
Probability Theory: A mathematical framework that quantifies the likelihood of events in random processes. It assigns probabilities between 0 (impossible) and 1 (certain) to outcomes based on the sample space and events. In inferential statistics, probability theory underpins methods for drawing conclusions about populations based on sample data, helping to measure uncertainty and assess the reliability of inferences.
History
The roots of inferential statistics can be traced back to the 17th and 18th centuries with the development of the theory of probability. French mathematician Pierre-Simon Laplace and English statistician Thomas Bayes were pivotal in advancing probability theory, which later underpinned the methods of inferential statistics. Laplace's work in the early 19th century on what he called the “analytical theory of probabilities,” along with his central limit theorem, helped statisticians estimate population details from samples. He built on Bayes' theorem, which explains how to update the chances of something being true when new information is available. Together, Laplace and Bayes’ work shaped the theorems we still use today for making inferences about populations from sample data.3
In the 20th century came the formalization of inferential statistics, which is largely attributed to British statistician and geneticist Sir Ronald A. Fisher. He introduced several key concepts, including variance analysis and maximum likelihood estimation, along with his contributions to experimental design and various statistical tests. This formed the backbone of hypothesis testing, a method used to infer the properties of a larger population from sample data. All of these developments were crucial for researchers’ adoption of more robust large-scale sampling and statistical inference.
Around the same time, the Polish-British statisticians Jerzy Neyman and Egon Pearson expanded the framework of hypothesis testing, which included the introduction of Type I and Type II errors, and established more rigorous criteria for testing hypotheses.3
British mathematician Karl Pearson, another key player in the field of statistics, contributed significantly to the development of the Pearson correlation coefficient, which measures relationships between variables, and the chi-squared test, which assesses the goodness-of-fit for models.3
When did behavioral science come into the picture? Well, in the mid-20th century psychologists and social scientists started adopting statistical methods to validate theories about human behavior and social structures. The field of psychometrics, for instance, heavily relies on inferential statistics to construct and validate tests that measure psychological attributes such as intelligence, personality, and aptitude.
In recent years, Large Language Models (LLMs) and other AI technologies have also transformed inferential statistics in several ways. Not only have LLMs improved our ability to process large datasets and automate complex statistical tasks like hypothesis testing, but have even offered powerful tools for creating predictive models. LLMs can incorporate complex patterns in the data that traditional inferential methods might miss, leading to more accurate inferences. AI algorithms can also manage many more variables than statisticians could with traditional software, allowing them to now handle much larger, more complex datasets.
People
Pierre-Simon Laplace (1749–1827): French mathematician and physicist who offered fundamental contributions to probability theory, particularly through the development of Bayesian inferences and the central limit theorem which are both still considered influential in modern inferential statistics.
Thomas Bayes (1702–1761): An English statistician and clergyman. He’s best known for Bayes' Theorem, which provides a method for updating probabilities based on new evidence and was foundational to Bayesian inference, a key approach in inferential statistics for estimating parameters and decision-making under uncertainty.
Sir Ronald A. Fisher (1890–1962): A British statistician and geneticist, Fisher is considered one of the greatest contributors to modern statistics. He developed the analysis of variance (ANOVA), maximum likelihood estimation, and laid the groundwork for hypothesis testing. His work helped formalize many of the key concepts in inferential statistics.
Jerzy Neyman (1894–1981): A Polish statistician who worked closely with Egon Pearson to develop the Neyman-Pearson framework.
Egon Pearson (1895–1980): A British statistician who co-developed the Neyman-Pearson framework for hypothesis testing, introducing the concepts of Type I and Type II errors and the likelihood ratio test. Their work formalized the logic behind decision-making in hypothesis testing and is a cornerstone of inferential statistics.
Karl Pearson (1857–1936): As a British mathematician and statistician, Pearson is considered the founder of modern statistics and introduced key concepts such as the Pearson correlation coefficient and chi-square test. His work established the foundation for statistical inference and the application of statistical methods in biological and social sciences.
Consequences
In the world of experimental and social psychology, inferential statistics offers indispensable tools for uncovering insights into human behavior. By applying some of the techniques discussed, like t-tests and ANOVA, researchers can analyze whether the differences between experimental and control groups are significant (and not due to random chance). This allows psychologists to rigorously test hypotheses, offering stronger evidence for understanding behavioral phenomena like memory retention, mental health risk factors, learning processes, or even responses to stress. The better we can understand how the brain in general works, the better equipped we are to support individuals.
Beyond psychology, inferential statistics are important in all kinds of clinical research. Randomized controlled trials (RCTs) are frequently used to assess the effectiveness of different therapeutic interventions, both in clinical psychology settings as well as pharmaceutical settings. By comparing treatment groups against control groups, researchers can infer whether observed changes in health outcomes are likely due to the intervention rather than external factors. This is true for clinicians looking at the impact of certain therapies on mental health or the effects of certain drugs on cancer cells; healthcare as a whole relies largely on inferential statistics when developing new treatments.
Inferential statistics permeates everyday life, too. In political science, you may be familiar with pre-election surveys. This use of sample data from pollsters allows analysts to make educated predictions about voter preferences and behavior, helping forecast election outcomes and make predictions about voting populations as a whole. In economics, inferential statistics help us make forecasts about macroeconomic indicators like unemployment rates, investment trends, and even Gross Domestic Product (GDP). Through statistical models, economists can use past data to infer future conditions, which thus allows them to shape economic strategies.
Controversies
Inferential statistics, while powerful, is not without its controversies and limitations. One common pitfall is p-value misinterpretation; the misuse and misinterpretation of p-values, which is the level of marginal significance within a statistical hypothesis test, can lead to false conclusions. Critics argue that a p-value threshold (commonly 0.05) is arbitrary and can be misleading. Since the p-value is often misinterpreted by readers who aren’t familiar with inferential statistics, they may mistakenly think that, for example, a p-value of .05 means that 95% of participants acted a certain way (when that’s actually not at all what a p-value is stating). No matter your scientific background, it’s important to be careful when interpreting results from any study and use statistical values appropriately.
Another common criticism is over-reliance on null hypothesis decision testing, where you either reject or fail to reject the null hypothesis. Basically, this is another way of asking: did what we think was going to happen end up happening? This turns complex data into a binary decision, which of course, isn’t always the case. However, with technological advances, we’re now able to model data in much more complex ways than the simple yes or no hypothesis testing.4
The advent of computational methods has led to the use of more sophisticated statistical techniques like structural equation modeling and multilevel modeling in behavioral sciences. Particularly when we’re studying such complex topics as human behavior, alternative approaches like Bayesian inference could be helpful, as they’re a bit more nuanced; these methods allow for a more sophisticated analysis of complex data sets, such as longitudinal data or data with nested structures. Many inferential statistical methods rely on assumptions that might not hold in real-world data. One common assumption is normality, meaning the data follows a symmetrical, bell-shaped distribution. Another key assumption is independence, meaning the variables do not influence each other. If any of the assumptions we’ve made about our data or our data collection aren’t true, we can compromise the validity of the inferences.5
If you’ve read any scientific papers, you’ve likely come across a discussion and limitations section where the author talks about needing a bigger sample size. Small sample sizes can lead to unreliable inferences. Usually, the larger the sample size, the more likely it is that you’ll have enough data to essentially ‘smooth out’ any major outliers that may be misleading or are different only by chance but not practically significant. Just like with most research, the more data we collect, the more trials we run, and the more we can understand about our surroundings.
Case Study
Salk’s Polio Vaccine
In 1954, a scientist named Jonas Salk conducted a study looking at the effectiveness of the polio vaccine. This was a double-blind study involving 1.8 million children, with 400,000 receiving the vaccine and the rest receiving a placebo.6 This dual protocol illustrates both the power and the limitations of randomized clinical trials, specifically with a single statistical design. Inferential statistical methods were used to compare the incidence of polio in the vaccinated group and placebo group. The vaccine group had significantly fewer cases of polio, leading to the conclusion that the vaccine was effective. Clearly, our ability to use inferential statistics and their use in interpreting the results of randomized control trials has been incredibly important in the scientific field (and public health). We’ve continued to rely on our ability to extrapolate from smaller samples in order to make sense of data, research, and the world around us.
Antihypertensive Drugs
Researchers conducted a systematic review of 208 randomized controlled trials to evaluate the effectiveness of new drugs designed to lower blood pressure.7 Most of the studies they looked at randomly divided participants into two groups: one receiving the new drug and the other receiving a placebo. In one of the studies analyzed, for example, the null hypothesis was that the new drug had no effect on blood pressure reduction compared to the placebo. The alternative hypothesis was that the drug would lead to a significant reduction in blood pressure. After a month, blood pressure levels were measured for both groups. The data showed that the average blood pressure in the drug group was lower than in the placebo group.
Here’s where the inferential statistics come in: the researchers used a t-test to compare the mean blood pressure reduction between the two groups. Inferential statistics, specifically p-values, were used to determine whether the observed difference was statistically significant. Within these studies, based on the results, if the p-value was below a certain threshold (in this case, 0.05), the null hypothesis would be rejected, leading to the conclusion that the new drug significantly reduced blood pressure compared to the placebo. If not, the null hypothesis would be retained, suggesting no significant difference.
These types of studies, powered by inferential statistics, allowed researchers to generalize the results from the sample to the broader population of people with hypertension, guiding healthcare professionals in deciding whether the drug should be adopted in practice.
Related TDL Content
How to Predict Mental Illnesses: The Digital Future of Mental Healthcare
See another example of inferential statistics in action; in this case understanding mental health, healthcare, and the digital future. This article discusses the potential for using predictive analytics and digital tools to improve mental healthcare by identifying early indicators of mental illness through data like online language patterns.
Taking a Hard Look at Democracy
Although we’ve discussed inferences in the formal setting, we constantly make inferences in our own minds. This interview with Tom Spiegler, a co-founder of TDL, discusses the inferences we make regarding politics and democracy.
References
- The Vegan Society. (2023.). Worldwide growth of veganism. https://www.vegansociety.com/news/media/statistics/worldwide
- Al-Benna, S., Al-Ajam, Y., Way, B., & Steinstraesser, L. (2010). Descriptive and inferential statistical methods used in burns research. Burns, 36(3), 343-346. https://doi.org/10.1016/j.burns.2009.04.030
- (2008). Inferential Statistics. In: The Concise Encyclopedia of Statistics. Springer, New York, NY. https://doi-org.gate3.library.lse.ac.uk/10.1007/978-0-387-32833-1_197
- Marshall, G., & Jonker, L. (2011). An introduction to inferential statistics: A review and practical guide. Radiography, 17(1), e1-e6. https://doi.org/10.1016/j.radi.2009.12.006
- Allua, S., & Thompson, C. B. (2009). Basics of research part 15: Inferential statistics. Air Medical Journal, 28(4), 168-171. https://doi.org/10.1016/j.amj.2009.04.013
- Meldrum M. (1998). "A calculated risk": the Salk polio vaccine field trials of 1954. BMJ (Clinical research ed.), 317(7167), 1233–1236. https://doi-org.gate3.library.lse.ac.uk/10.1136/bmj.317.7167.1233
- Paz, M. A., de-La-Sierra, A., Sáez, M., Barceló, M. A., Rodríguez, J. J., Castro, S., Lagarón, C., Garrido, J. M., Vera, P., & Coll-de-Tuero, G. (2016). Treatment efficacy of anti-hypertensive drugs in monotherapy or combination: ATOM systematic review and meta-analysis of randomized clinical trials according to PRISMA statement. Medicine, 95(30), e4071. https://doi-org.gate3.library.lse.ac.uk/10.1097/MD.0000000000004071
About the Author
Annika Steele
Annika completed her Masters at the London School of Economics in an interdisciplinary program combining behavioral science, behavioral economics, social psychology, and sustainability. Professionally, she’s applied data-driven insights in project management, consulting, data analytics, and policy proposal. Passionate about the power of psychology to influence an array of social systems, her research has looked at reproductive health, animal welfare, and perfectionism in female distance runners.