To understand the look-elsewhere effect, we will first need to have a very basic understanding of what it means to have a “statistically significant” finding. When researchers want to test a hypothesis, they’ll typically run an experiment, where they compare the outcomes of different groups—for instance, one group that receives the treatment the researcher is studying, and a control group that just gets a placebo. As long as all other factors are carefully controlled for, if we find that there’s a difference between how these groups fare, then it’s safe to say that the difference was caused by the treatment. Right?
The problem is, even when researchers have controlled for other variables, there is still the possibility that any differences between groups are due to random coincidence. This is because, although we are trying to make generalizations about how a treatment would affect a whole population, we have to test it on a much smaller sample of individuals. If, for some reason, our sample turns out not to be representative of the whole population, then our results would be misleading.
To illustrate, imagine you’re working at an ice cream parlor, where people are allowed to sample the flavors. One day a huge group of people comes in, about one hundred of them, all wanting to sample the mint chocolate chip. Obviously, there are lots of chocolate chips in the mint chocolate chip, but they’re not totally evenly distributed throughout the bucket. So, as you’re giving people their samples, the vast majority of the time, the samples contain some chocolate—but every now and then, an unlucky person gets a sample that’s just mint ice cream, a sample that doesn’t properly represent the flavor.
In science, sampling poses a similar problem: there’s always a chance that our experimental sample, just through bad luck, has characteristics that make them respond differently to the treatment than the rest of the population. This means that our findings would be the result of chance (also known as sampling error) and would be leading us to the wrong conclusion about our treatment.
We can never fully escape this problem—but we can try to get around it using statistics. There are many statistical tests available to help scientists judge whether their result is actually significant. In many cases, scientists use statistical tests to calculate a p-value, a number that indicates the probability of obtaining a significant result borne of chance instead of treatment effects. For example, a p of 0.1 would indicate a 10% chance. Researchers in different fields will mutually agree on a p threshold that a result has to cross in order to be considered significant. Often, this line is drawn at 0.05, meaning scientists are agreeing to tolerate no more than a 5% probability that a result was just a coincidence. Bogus significant results are known as alpha errors, or Type I errors.
With that out of the way, we can get back to the look-elsewhere effect.
More statistical tests, more problems
One of the reasons that the look-elsewhere effect happens is purely mathematical. It is known in statistics as the problem of multiple comparisons. As the name suggests, this problem arises when scientists perform many statistical tests on the same dataset. While this might not seem like it should be an issue, it actually inflates the chances of committing an alpha error.3 The more times a researcher goes looking for a result in the same dataset, the more likely they are to hit on something that looks interesting on the surface but is actually just the result of noise, or random fluctuations in the data.4
This, in a nutshell, is the statistical explanation for the look-elsewhere effect. However, this doesn’t quite tell the whole story. After all, researchers are trained in statistics—they should know better than to just conduct a bunch of tests willy-nilly. Moreover, there are ways to statistically correct for the problem of multiple comparisons, in cases when it’s really necessary to carry out a lot of different tests.3 So why does this problem persist in scientific research? That answer comes down to unconscious cognitive biases.
Humans are fallible—even scientists
People are prone to a whole suite of biases & heuristics that distort their thinking. What’s more, unconscious biases are just that: unconscious. Even when we have been taught about the flaws in our own thinking, it is often still very difficult to avoid falling into the same cognitive traps. An even more difficult pill to swallow: this truth applies as much for experts as it does for laypeople. Although many of us tend to see scientists as somehow above making the same errors in judgment as the rest of us, evidence has shown that this is not the case. Even more surprisingly, the formal education that scientists have in statistics doesn’t insulate them from biased reasoning when it comes to estimating probabilities.
One famous demonstration of this fact is about sample sizes. It’s a basic fact in statistics that large samples are always better; smaller samples make it more difficult to detect a possible effect. And yet, research has shown that even highly renowned statisticians sometimes fail to account for sample size.
In a paper titled “Belief in the Law of Small Numbers,” the Nobel Prize-winning behavioral economists Daniel Kahneman and Amos Tversky had experienced research scientists, including two authors of statistics textbooks, fill out a questionnaire describing hypothetical research scenarios. The experts were asked to choose sample sizes, estimate the risk of failure, and to give advice to a hypothetical graduate student conducting the project. The results showed that a large majority of respondents made errors in their judgments because they didn’t pay enough attention to sample size.5
In short, it’s clear that even the most erudite among us are vulnerable to cognitive bias. And on top of our lack of intuition for statistics, there are other biases, such as optimism bias and effort justification, that likely play a role in the look-elsewhere effect.
We are optimistic to a fault
Optimism bias describes how we are generally more oriented towards positivity: we pay more attention to positive information, we remember happy events better than upsetting ones, and we have positive expectations of the people and world around us.6 This “bias” isn’t necessarily a bad thing: on the contrary, our general optimism clearly enhances our wellbeing. Sometimes, however, optimism bias can lead us to suppress negative information, ignoring facts that make us feel bad, in favor of ones that brighten our mood.7 When it comes to the look-elsewhere effect, the determination to seek out positive information might lead some researchers to disregard their initial insignificant results, and keep looking for a more exciting finding.
We hate to see our hard work go to waste
By the time a researcher gets to the analysis stage of an experiment, it’s likely that they’ve invested a considerable amount of time and energy into designing the experiment, acquiring all the necessary materials, and collecting data. Research requires a whole lot of effort, and we never want to feel like our hard work has gone to waste. And when it starts to seem like maybe it was for nothing, we start doing some cognitive gymnastics to avoid having to confront that unpleasant truth. This phenomenon is known as effort justification.
Often, effort justification causes people to ascribe higher value to the object or project that they’ve been hard at work on. In a classic study by Elliot Aronson and Judson Mills, female college students were told that they would be participating in a group discussion about sexuality. However, some of them were first put through an embarrassing initiation process, supposedly in order to prove that they wouldn’t be too uncomfortable to participate in the conversation. The women who had to put in the extra effort later rated the contents of the discussion as more interesting, and their fellow group mates more intelligent, compared to those who hadn’t done the initiation.8
When it comes to the look-elsewhere effect, researchers’ unwillingness to let go of projects that they’ve sunk a lot of effort into might drive them to continue running statistical tests, past the point where they should probably give up. It is difficult to accept it when a hypothesis doesn’t pan out, and many people adopt the attitude that finding any significant result is better than coming away with nothing, even if that result isn’t what they were originally looking for.
Academia’s “rat race”
While flawed human reasoning may lead individuals to fall for the look-elsewhere effect, it is undeniable that there are also many structural forces at play that drive this problem. With the replication crisis still ongoing, many have pointed the finger of blame at the culture of modern academia, where researchers are incentivized to publish as many scholarly papers as they can and new graduates are locked into fierce competition for a dwindling number of jobs. According to a 2013 study, there were only enough academic positions for 12.8% of Ph.D. graduates in the United States to find employment,9 and the problem has only shown signs of worsening since then. This kind of job market puts tremendous pressure on people to perform.
Another issue here has to do with how performance is gauged, and the type of research that is seen as publishable. Generally speaking, only statistically significant results are considered interesting enough to merit publication. As a result, many researchers perceive statistically insignificant results to be “failures”—even though an insignificant result still conveys valuable information. This dynamic motivates scientists to “look elsewhere,” and try to reach statistical significance wherever possible.