A Psi Test for the Health of Science
by Alex HolcombeScience is sick. How will we know when it's been cured?
Meta-analysis quantitatively combines the evidence from multiple experiments, across different papers and laboratories. It's the best way we have to determine the upshot of a spate of studies.
Published studies of psi (telepathy, psychokinesis, and other parapsychological phenomena) have been submitted to meta-analysis. The verdict of these meta-analyses is that the evidence for the existence of psi is close to overwhelming. Bosch, Steinkamp, & Boller (2006, Psychological Bulletin), for example, meta-analyzed studies of the ability of participants to affect the output of random number generators. These experiments stemmed from an older tradition in which participants attempted to influence a throw of dice to yield a particular target number. As for the old dice experiments, many of the studies found that the number spat out by the random number generator was more often the target number that the participant was gunning for than one would expect by chance. In their heroic effort, Bosch et al. combined the results of 380 published experiments, and calculated that if in fact psychokinesis does not exist, the probability of finding the evidence published was less than one in a thousand (for one of their measures, z = 3.67). In other words, it is extremely unlikely that so much evidence in favor of psychokinesis would have resulted if psychokinesis actually does not exist.
Like many others, I suspect that this evidence stems not from the existence of psi, but rather from various biases in the way science today is typically conducted.
"Publication bias" refers to the tendency for a study to be published if it is interesting, while boring results rarely make it out of the lab. "P-hacking" - equally insidious - is the tendency of scientists to try many different statistical analyses until they find a statistically significant result. If you try enough analyses or tests, you're nearly guaranteed to find a statistically significant although spurious result. But despite scientists' suspicion that the seemingly-overwhelming evidence for psi is a result of publication bias and p-hacking, there is no way to prove this, or to establish it beyond a reasonable doubt (we shouldn't expect proof, as that may be a higher standard than is feasible for empirical studies of a probabilistic phenomenon).
Fortunately these issues have received plenty of attention, and new measures are being adopted (albeit slowly) to address them. Researchers have been encouraged to publicly announce (simply by posting on a website) a single, specific statistical analysis plan prior to collecting data. This can eliminate p-hacking. Other positive steps, like sharing of code and data, helps other scientists to evaluate the evidence more deeply, to spot signs of p-hacking as well as inappropriate analyses and simple errors. In the case of a recent study of psi by Tressoldi et al., Sam Schwartzkopf has been able to wade into the arcane details of the study, revealing possible problems. But even if the Tressoldi et al. study is shown to be seriously flawed, Sam's efforts won't overturn all the previous evidence for psi, nor will it combat publication bias in future studies. We need a combination of measures to address the maladies that afflict science.
OK, so let's say that preregistration, open science, and other measures are implemented, and together fully remedy the unhealthy traditions that hold back efforts to know the truth. How will we know science has been cured?
A Psi Test for the health of science might be the answer. According to the Psi Test, until it can be concluded that psi does not exist using the same meta-analysis standards as are applied to any other phenomenon in the biomedical or psychological literature, science has not yet been cured.
Do we really need to eliminate publication bias to pass the Psi Test, or can meta-analyses deal with it? Funnel plots can provide evidence for publication bias. But given that most areas of science are rife with publication bias, if we use publication bias to overturn the evidence for psi, to be consistent we'd end up disbelieving countless more-legitimate phenomena. And my reading of medicine’s standard meta-analysis guide, by the Cochrane Collaboration, is that in Cochrane reviews, evidence for publication bias raises concerns but is not used to overturn the verdict indicated by the evidence.
Of course, instead of concluding that science is sick, we might instead conclude that psi actually exists. But I think this is not the case - mainly because of what I hear from physicists. And I think if psi did exist, there’d likely be even more overwhelming evidence for it by now than we have. Still, I want us to be able to dismiss psi using the same meta-analysis techniques we use for the run-of-the-mill. Others have made similar points.
The Psi Test for the health of science, even if valid, won't tell us right away that science has been fixed. But in retrospect we’ll know. After the year science is cured, when taking psi studies published that year and after, applying the standard meta-analysis technique will result in the conclusion that psi does not exist.
Below, I consider two objections to this Psi Test.
Objection 1: some say that we already can conclude that psi does not exist, based on Bayesian evaluation of the psi proposition. To evaluate the evidence from psi studies, a Bayesian first assigns a probability that psi exists, prior to seeing the studies' data. Most physicists and neuroscientists would say that our knowledge of how the brain works and of physical law very strongly suggests that psychokinesis is impossible. To overturn this Bayesian prior, one would need much stronger evidence than even the one-in-a-thousand chance derived from psi studies that I mentioned above. I agree; it's one reason I don't believe in psi. However, it's pretty hard to pin down in quantitative fashion and partially explains why Bayesian analysis hasn’t taken over the scientific literature more rapidly. Also, there may be expert physicists out there that think some sort of quantum interaction could underlie psi, and it's hard to know how to quantitatively combine the opinions of dissenters with the majority.
Rather than relying on a Bayesian argument (although Bayesian analysis is still useful, even with a neutral prior), I'd prefer that our future scientific practice, involving preregistration, unbiased publishing, replication protocols, and so on reach the point where if hundreds of experiments on a topic are available, they should be fairly definitive. Do you think we will get there?
Objection 2: Some will say that science can never eliminate publication bias. While publication bias is reduced by the advent of journals like PLoS ONE that accept null results, and by the growing number of journals that accept papers prior to the data being collected, it may forever remain a significant problem. But there are further steps one could take: in open notebook science, all data is posted on the net as soon as it is collected, eliminating all opportunity for publication bias. But open notebook science might never become standard practice, and publication bias may remain strong enough that substantial doubt will persist for many scientific issues. In that case, the only solution may be a pre-registered, confirmatory large-scale replication of an experiment, similar to what we are doing at Perspectives on Psychological Science (I'm an associate editor for the new Registered Replication Report article track). Will science always need that to pass the psi test?