Aug 7, 2014

What we talk about when we talk about replication

by

If I said, “Researcher A replicated researcher B’s work”, what would you take me to mean?

There are many possible interpretations. I could mean that A had repeated precisely the methods of researcher B, and obtained similar results. Or I could be saying that A had repeated precisely the methods of researcher B, and obtained very different results. I could be saying that A had repeated only those methods which were theorized to influence the results. I could mean that A had devised new methods which were meant to explore the same phenomenon. Or I could mean that researcher B had copied everything down to the last detail.

We do have terms for these different interpretations. A replication of precise methods is a direct replication, while a replication which uses new methods but gets at the same phenomenon is a conceptual replication. Once a replication has been completed, you can look at the results and call it a “successful replication” if the results are the same, and a “failed replication” if the results are different.

Unfortunately, these terms are not always used, and the result is that recent debates over replication have become not only heated, but confused.

Take, for instance, nobel laureate Daniel Kahneman’s open letter to the scientific community, A New Etiquette for Replication. He writes:

“Even rumors of a failed replication cause immediate reputational damage by raising a suspicion of negligence (if not worse). The hypothesis that the failure is due to a flawed replication comes less readily to mind – except for authors and their supporters, who often feel wronged.”

Here he uses the common phrasing, “failed replication”, to indicate a replication where different results were obtained. The cause of those different results is unknown, and he suggests that one option is that the methods used in the direct replication were not correct, which he calls a “flawed replication”. What, then, is the term for a replication where the methods are known to be correct but different results were still found?

Further on in his letter, Kahneman adds:

“In the myth of perfect science, the method section of a research report always includes enough detail to permit a direct replication. Unfortunately, this seemingly reasonable demand is rarely satisfied in psychology, because behavior is easily affected by seemingly irrelevant factors.”

We take “direct replication” to mean copying the original researcher’s methods. As Kahneman points out, perfect copying is impossible. When a factor that once seemed irrelevant may have influenced the results, is that a “flawed replication”, or simply no longer a “direct replication”? How can we distinguish between replications which copy as much of the methods as possible, and those which copy only those elements of the methods which the original author hypothesizes should influence the result?

This terminology is not only imprecise, it differs from what others use. In their Registered Reports: A Method to Increase the Credibility of Published Results, Brian Nosek and Daniel Lakens write:

“There is no such thing as an exact replication. Any replication will differ in innumerable ways from the original. A direct replication is the attempt to duplicate the conditions and procedure that existing theory and evidence anticipate as necessary for obtaining the effect (Open Science Collaboration, 2012, 2013; Schmidt, 2009). Successful replication bolsters evidence that all of the sample, setting, and procedural differences presumed to be irrelevant are, in fact, irrelevant.”

This statement contains an admirably clear definition of “direct replication”, which the authors use here to mean a replication copying only those elements of the methods considered relevant. This is distinct from Kahneman’s usage of the term “direct replication”. Kahneman, instead, may be conflating “direct replication” with “literal replication”, a much less common term meaning “the precise duplication of the specific design and results of a previous study” (Heiman, 2002).

Nosek and Lakens also use the term “successful replication” in a way which implies that not only were the results replicated, the methods were as well, as they take the replication’s success to be a commentary on the methods. However, even “successful replications” may not successfully replicate methods, as pointed out by Simone Schnall in her critique of the special issue edited by Nosek and Lakens:

Various errors in several of the replications (e.g., in the “Many Labs” paper) became only apparent once original authors were allowed to give feedback. Errors were uncovered even for successfully replicated findings.

Whether or not there were methodological errors in these particular cases, the possibility of such errors even when results are replicated remains a possibility, one which is elided by the terminology of “successful replication”. This is not merely a point of semantics, as "successful replications" may be checked less carefully for methodological errors than “failed replications”.

There are many other examples of researchers using replication terminology in ways that are not maximally clear. So far I have only quoted from social psychologists. When we attempt to speak across disciplines we face even greater potential for confusion.

As such, I propose:

1) That we resurrect the term “literal replication”, meaning “the precise duplication of the specific design of a previous study” rather than overload the term “direct replication”. Direct replication can then mean only the duplication of those methods deemed to be relevant. Of course, a perfect literal replication is impossible, but using this terminology implies that duplication of as much of the previous study as possible is the goal.

2) That we retire the phrases “failed replication” and “successful replication”, which do not distinguish between procedure and results. In their place, we can use “replication with different results” and “flawed replication” for the former, and “replication with similar results” and “sound replication” for the latter.

Thus, a replication attempt where the goal was to precisely duplicate materials and where this was successfully done, but different results were found, would be a sound literal replication with different results. An attempt only to duplicate elements of the design hypothesized to be relevant, leading to some methodological questions, yet where similar results were found, would be a flawed direct replication with similar results.

These terms may seem unnecessarily wordy, and indeed may not always be needed, but I encourage everyone to use them when precision is important, for instance in published articles or in debates with those who disagree with you. I know that from now on, when I hear someone use the bare term “replication”, I will ask, “What kind?”

Thanks to JP de Ruiter, Etienne LeBel, and Sheila Miguez for their feedback on this post.

Share on: TwitterFacebookGoogle+Email

Comments