Why do we rely on specific information over statistics?
Base Rate Fallacy, explained.
What is the Base Rate Fallacy?
When provided with both individuating information, which is specific to a certain person or event, and base rate information, which is objective, statistical information, we tend to assign greater value to the specific information and often ignore the base rate information altogether. This is referred to as the base rate fallacy, or base rate neglect.
Where this bias occurs
If you’ve ever been a college student, you probably know that there are certain stereotypes attached to different majors. For example, students in engineering are often viewed as hardworking but cocky, students in business are stereotypically preppy and aloof, and arts students are typecast as activists with an edgy fashion sense. Of course, these stereotypes are wide generalizations, which are often way off the mark. Yet, they are frequently used to make projections about how individuals might act.
Daniel Kahneman and Amos Tversky once conducted a study where participants were presented with a personality sketch of a fictional graduate student referred to as Tom W. They were given a list of nine areas of graduate studies, and told to rank them in order of likelihood that that is the field in which Tom W. is pursuing his studies. At the time when this study was conducted, far more students were enrolled in education and the humanities than in computer science. However, 95% of participants said it was more likely that Tom W. was studying computer science than education or humanities. Their predictions were based purely on the personality sketch - the individuating information - with total disregard for the base rate information.1
As much as that one person in your History elective course might look and act like the stereotypical medical student, the odds that they are actually studying medicine are very low, since there are typically only 100 or so people in that program, compared to the thousands of students enrolled in other faculties, like Management or Science. While it can be easy to make these kinds of snap judgments about people, we can’t let specific information completely erase the base rate information.
Debias Your Organization
Most of us work & live in environments that aren’t optimized for solid decision-making. We work with organizations of all kinds to identify sources of cognitive bias & develop tailored solutions.
The base rate fallacy can lead us to make inaccurate probability judgments in many different aspects of our lives. As demonstrated by Kahneman and Tversky in the aforementioned example, it can cause us to jump to conclusions about people based on our initial impressions of them.2 In turn, this can lead us to develop preconceived notions about people, as well as to perpetuate potentially harmful stereotypes. This fallacy can also impact our financial decisions, by prompting us to overreact to transient changes in our investments. If the base rate statistics show consistent growth, it is likely that any setbacks are only temporary and that things will get back on track. Yet, if we ignore the base rate information, we may feel inclined to sell, as we may predict that the value of our stocks will continue to decline.3
The individual effects of base rate fallacy can add up to become significant challenges if this fallacy is committed by people who make probability judgments about others, such as a doctor diagnosing a patient. In their 1982 book, Judgment Under Uncertainty: Heuristics and Biases4, Kahneman and Tversky cited a study in which participants were given the following scenario: “If a test to detect a disease whose prevalence is 1/1000 has a false positive rate of 5%, what is the chance that a person found to have a positive result actually has the disease, assuming you know nothing about the person's symptoms or signs?” Half the participants responded 95%, the average answer was 56%, and only a handful of participants gave the correct response: 2%. The participants in this study were not physicians themselves, but this example demonstrates how important it is that medical professionals understand base rates, so as not to commit this fallacy. Not taking base rate information into account can have a significant toll on the patient’s mental wellbeing, and it may prevent physicians from examining other potential causes, as 95% odds seem pretty certain.
Why it happens
There have been a number of explanations proposed for why the base rate fallacy occurs. One of the main theories posits that it is a matter of relevance, such that we ignore base rate information because we classify it as irrelevant and therefore feel that it should be ignored. It has also been suggested that the base rate fallacy results from the representativeness heuristic.
Maya Bar-Hillel’s 1980 paper, “The base-rate fallacy in probability judgments”5 addresses the limitations of previous theories of base rate fallacy and presents an alternate explanation: relevance. Specifically, we ignore base rate information because we believe it to be irrelevant to the judgment we are making. Bar-Hillel contends that, prior to making a judgment, we categorize the information given to us into different levels of relevance. If something is deemed irrelevant, we discard it and do not factor it into the conclusion we draw. Thus, it is not that we are incapable of integrating different types of information; if two types of information are assigned equal relevance, we will give them equal consideration. It is misattributions of relevance that cause us to ignore vital information, value certain information more than we should, or focus on one source of information when we should be integrating multiple.
Furthermore, Bar-Hillel explains that part of what makes us view certain pieces of information as more relevant than others is specificity. The more specific information is to the situation at hand, the more relevant it seems. Individuating information is, by nature, incredibly specific. As such, we denote it as highly relevant. Base rate information, on the other hand, is very general. We categorize it as low relevance information. In making a judgment, we take into consideration the information we consider to be relevant and ignore that which has been deemed irrelevant. To us, this may feel like an effective strategy, but it can actually compromise the accuracy of our judgments.
Bar-Hillel contends that representativeness is not a sufficient explanation for why the base rate fallacy occurs, as it cannot account for this fallacy in all contexts.6 That being said, representativeness may be one of the factors that contributes to the base rate fallacy, specifically in cases like the Tom W. study described by Kahneman and Tversky.7
Heuristics are mental shortcuts we use to facilitate judgment and decision-making. The representativeness heuristic, which was introduced by Kahneman and Tversky, describes our tendency to judge the probability of something based on the extent to which the object or event in question is similar to the prototypical exemplar of the category it falls into. We mentally categorize objects and events, grouping them based on similar features. Each category has a prototype, which is the average example of all the objects and events sorted into that category. The more the object or event resembles that prototype, the more representative of that category we judge it to be. The more representative it is, the more likely we believe its outcomes will align with those of the prototype.8
The representativeness heuristic can give rise to the base rate fallacy, as we may view an event or object as extremely representative and make a probability judgment based solely off of that, without stopping to consider base rate values. To refer back to Tom W., judgments about his field of study were inferred from his appearance and personality. He was deemed to be representative of a computer science graduate student, thereby leading participants to rank him as more likely to be pursuing studies in that field than in programs with far greater enrolment rates. Since there were far more students in both education and humanities than in computer science, it was more probable that he was studying the former, rather than the later. Yet, representativeness caused participants to overlook the base rate information, which proved to be essential.
Why it is important
Having at least a basic knowledge of statistics is useful, as it allows you to interpret information more accurately. It equips you to understand the results of new research, to assess whether or not a study was well-designed, among other things. Knowledge of base rates will allow you to better understand the likelihood of certain events occurring in your life, whether it’s the odds of winning the lottery or developing a certain condition.
How to avoid it
To avoid committing the base rate fallacy, we need to work on paying more attention to the base rate information available to us, as well as recognizing that personality and past behaviors are not as reliable predictors of future behavior as we think they are. This requires us to be more effortful when assessing the probability that a given event will occur. It’s easier to fall back on effortless, automatic processes, which make decision-making much easier, however, this increases the risk of error. By being aware of this fallacy and taking an active approach to combating it, we can reduce the frequency with which we commit it.
How it all started
One cannot discuss the topic of base rate fallacy without mentioning Kahneman and Tversky. Their 1973 paper, “On the Psychology of Prediction”9 described how the representativeness heuristic can lead us to commit the base rate fallacy. They illustrated this through the previously mentioned example of the Tom W. study, in which participants made their predictions based off of the personality sketch and forgot to account for the number of graduate students enrolled in each program.
Another early explanation of the base rate fallacy can be found in Maya Bar-Hillel’s 1980 paper, “The base-rate fallacy in probability judgments”.10 Here, this fallacy is described as “people’s tendency to ignore base rates in favor of, e.g., individuating information (when such is available), rather than integrate the two” (p. 211). This paper points out the limitations of Kahneman and Tversky’s representativeness explanation, and provides an alternate theory explaining the base rate fallacy. Specifically, Bar-Hillel pinpointed perceived relevance as the underlying factor of this fallacy. She suggested that the more specific information is, the more relevance we assign to it. As such, we attend to individuating information because it is specific, and therefore considered relevant, and ignore base rate information because it is general, and therefore deemed less relevant to the topic at hand.
Example 1 - The cab problem
This classic example of the base rate fallacy is presented in Bar-Hillel’s foundational paper on the topic.11 First, participants are given the following base rate information. There are two cab companies in a city: one is the “Green” company, the other is the “Blue” company. The names of the company refer to the colors of their respective taxis. It is specified that, of all the cabs in the city, 85% are blue and 15% are green. Then, a scenario is described in which a cab, which a witness later identifies as green, was involved in a hit and run one night. In order to assess the reliability of the witness, the court ordered that their ability to discriminate between blue and green cabs at nighttime be tested. It is shown that the witness can accurately distinguish the colors 80% of the time but confuses them 20% of the time. Participants are then asked to give the likelihood that the cab involved in the hit and run was actually green.
Many people are inclined to respond that the probability that the witness correctly identified a green cab at night is 80%. However, everyone who gives that answer is committing the base rate fallacy. When taking into account the base rate information, which tells us that only 15% of the cabs in the city are green, the actual probability that the witness was correct is 41%. This probability is achieved through inferential statistics calculations, which take into account both the percentage of each color cab in the city and the likelihood that the witness correctly discriminated between the colors at night.
Example 2 - How much will you donate?
In their 2000 paper, “Feeling “holier than thou”: are self-serving assessments produced by errors in self- or social prediction?”12, Nicholas Epley and David Dunning found that we have a tendency to commit the base rate fallacy when predicting our own behavior because we have access to ample individuating information about ourselves. In their study, university students were given five dollars and asked to predict how much of that money they would donate to one of three charities, as well as how much the average peer would donate. After their initial predictions, the donations of 13 of their peers were revealed, one by one. Participants were allowed to revise their predictions after the donations of three of their peers were revealed, then again after seven were revealed and once more after the thirteenth was revealed. In general, the initial predictions were generous, although people did think their own generosity to be superior to that of their peers: at the start of the study, the average prediction for one’s own donation was about $2.75, while the average prediction for their peers was about $2.25. The actual amount donated was $1.50. At the three time points where they were given the chance to revise their predictions, participants adjusted their predictions of their peers’ donations to match the base rate information they had acquired. After seeing all 13 donations made by their peers, the average prediction of peers’ donations closely resembled the actual average donation amount of $1.50. Interestingly enough, participants’ predictions for themselves did not change, even as they gained more base rate information.
The reason why participants took base rate information into consideration when making predictions about their peers is that they did not have access to individuating information about any of these people. As a result, they had to rely on base rate information. However, this was not the case when making predictions about themselves. Participants used their own personality and past behaviors as individuating information in making the prediction about how much money they would donate. Since we tend to value individuating information more than base rate information, they did not adjust their predictions for themselves as they gained access to more base rate information.13
This demonstrates that, when no specific individuating information is available, we will use base rate information in making predictions. However, as soon as we have access to that individuating information, we latch onto it and use it instead, thereby committing base rate fallacy.
What it is
Base rate fallacy refers to how we tend to rely more on specific information than we do statistics when making probability judgments.
Why it happens
There are multiple factors that contribute to the occurrence of the base rate fallacy. One is the representativeness heuristic, which states that the extent to which an event or object is representative of its category influences our probability judgments, which little regard for base rates. Another is relevance, which suggests that we consider specific information to be more relevant than general information, and therefore selectively attend to individuating information over base rate information.
Example 1 - The cab problem
A classic explanation for the base rate fallacy involves a scenario in which 85% of cabs in a city are blue and the rest are green. One night, a cab is involved in a hit and run accident. A witness claims the cab was green, however later tests show that they only correctly identify the color of the cab at night 80% of the time. When asked what the probability is that the cab involved in the hit and run was green, people tend to answer that it is 80%. However, this ignores the base rate information that only 15% of the cabs in the city are green. When taking all the information into consideration, crunching the numbers shows that the likelihood that the witness was correct is actually 41%.
Example 2 - How much will you donate?
Participants in a study were asked how much out of the five dollars they were given would they donate to a given charity. They were asked to make the same prediction about their average peer. Next, they were presented with the actual donations of 13 other donors and given the chance to adjust their predictions. They adjusted their predictions of their peers to match the base rate information but did not change their predictions for themselves. When we have access to individuating information, we assign it greater value than base rate information, which is why their ratings of themselves stayed the same. However, participants did not have access to individuating information about their peers and therefore relied on base rate information instead.
How to avoid it
To avoid committing the base rate fallacy, we need to take a more active approach to assessing probability, by working on paying more attention to the base rate information available to us and by recognizing that personality and past behaviors are not as reliable predictors of future behavior as we think they are.