Jul 10, 2014

What Jason Mitchell's 'On the emptiness of failed replications' gets right

by

Jason Mitchell's essay 'On the emptiness of failed replications' is notable for being against the current effort to publish replication attempts. Commentary on the essay that I saw was pretty negative (e.g. "awe-inspiringly clueless", “defensive pseudo-scientific, anti-Bayesian academic ass-covering”, "Do you get points in social psychology for publicly declaring you have no idea how science works?").

Although I reject his premises, and disagree with his conclusion, I don't think Mitchell's arguments are incomprehensibly mad. This seems to put me in a minority, so I thought I'd try and explain the value in what he's saying. I'd like to walk through his essay assuming he is a thoughtful rational person. Why would a smart guy come to the views he has? What is he really trying to say, and what are his assumptions about the world of psychology that might, perhaps, illuminate our own assumptions?

Experiments as artefacts, not samples

First off, key to Mitchell's argument is a view that experiments are complex artefacts, in the construction of which errors are very likely. Effects, in this view, are hard won, eventually teased out via a difficult process of refinement and validation. The value of replication is self-evident to anyone who thinks statistically: sampling error and publication bias will produce lots of false positives, you improve your estimate of the true effect by independent samples (= replications). Mitchell seems to be saying that the experiments are so complex that replications by other labs aren't independent samples of the same effect. Although they are called replications there are, he claims, most likely to be botched, and so informative of nothing more than the incompetence of the replicators.

When teaching our students many of us will have deployed the saying "The plural of anecdote is not data". What we mean by this is that many weak observations - of ghosts, aliens or psychic powers - do not combine multiplicatively to make strong evidence in favour of these phenomena. If I've read him right, Mitchell is saying the same thing about replication experiments - many weak experiments are uninformative about real effects.

Tacit practical knowledge

Part of Mitchell's argument rests on the importance of tacit knowledge in running experiments (see his section "The problem with recipe-following"). We all know that tacit knowledge about experimental procedures exists in science. Mitchell puts a heavy weight on the importance of this. This is a position which presumably would have lots of sympathy from Daniel Kahneman, who suggested that all replication attempts should involve the original authors.

There's a tension here between how science should be and how it is. Obviously our job is to make things explicit, to explain how to successfully run experiments so that anyone can run them but the truth is, full explanations aren't always possible. Sure, anyone can try and replicate based on a methods section, but - says Mitchell - you will probably be wasting your time generating noise rather than data, and shouldn't be allowed to commit this to the scientific record.

Most of us would be comfortable with the idea that if a non-psychologist ran our experiments they might make some serious errors (one thinks of the hash some physical scientists made of psi-experiments, failing completely to account for things like demand effects, for example). Mitchell's line of thought here seems to take this one step further, you can't run a social psychologist's experiments without special training in social psychology. Or even, maybe, you can't successfully run another lab's experiment without training from that lab.

I think happen to think he's wrong on this, and that he neglects to mention the harm of assuming that successful experiments have a "special sauce" which cannot be easily communicated (it seems to be a road to elitism and mysticism to me, completely contrary to the goals science should have). Nonetheless, there's definitely some truth to the idea, and I think it is useful to consider the errors we will make if we assume the contrary, that methods sections are complete records and no special background is required to run experiments.

Innuendo

Mitchell makes the claim that targeting an effect for replication amounts to the innuendo that the effects under inspection are unreliable, which is a slur on the scientists who originally published them. Isn't this correct? Several people on twitter admitted, or tacitly admitted, that their prior beliefs were that many of these effects aren't real. There is something disingenuous about claiming, on the one hand, that all effects should be replicated, but, on the other, targeting particular effects for attention. If you bought Mitchell's view that experiments are delicate artefacts which render most replications uninformative, you can see how the result is a situation which isn't just uninformative but actively harmful to the hard-working psychologists whose work is impugned. Even if you don't buy that view, you might think that selection of which effects should be the focus of something like the Many Labs project is an active decision made by a small number of people, and which targets particular individuals. How this processes works out in practice deserves careful consideration, even if everyone agrees that it is a Good Thing overall.

Caveats

There are a number of issues in Mitchell's essay I haven't touched on - this isn't meant to be a complete treatment, just an explanation of some of the reasonable arguments I think he makes. Even if I disagree with them, I think they are reasonable; they aren't as obviously wrong as some have suggested and should be countered rather than dismissed.

Stepping back, my take on the 'replication crisis' in psychology is that it really isn't about replication. Instead, this is what digital disruption looks like in a culture organised around scholarly kudos rather than profit. We now have the software tools to coordinate data collection, share methods and data, analyse data, and interact with non-psychologists, both directly and via the media, in unprecedented ways and at an unprecedented rate. Established scholarly communities are threatened as "the way things are done" is challenged. Witness John Bargh's incredulous reaction to having his work challenged (and note that this was 'a replicate and explain via alternate mechanism' type study that Mitchell says is a valid way of doing replication). Witness the recent complaint of medical researcher Jonathan S. Nguyen-Van-Tam when a journalist included critique of his analysis technique in a report on his work. These guys obviously believe in a set of rules concerning academic publishing which many of us aren't fully aware of or believe no longer apply.

By looking at other disrupted industries, such as music or publishing, we can discern morals for both sides. Those who can see the value in the old way of doing things, like Mitchell, need to articulate that value and fast. There's no way of going back, but we need to salvage the good things about tight-knit, slow moving, scholarly communities. The moral for the progressives is that we shouldn't let the romance of change blind us to the way that the same old evils will reassert themselves in new forms, by hiding behind a facade of being new, improved and more equitable.