Expectations of replicability and variability in priming effects, Part II: When should we expect replication, how does this relate to variability, and what do we do when we fail to replicate?by Joseph Cesario, Kai Jonas
Continued from Part 1.
Now that some initial points and clarifications have been offered, we can move to the meat of the argument. Direct replication is essential to science. What does it mean to replicate an effect? All effects require a set of contingencies to be in place. To replicate an effect is to set up those same contingencies that were present in the initial investigation and observe the same effect, whereas to fail to replicate an effect is to set up those same contingencies and fail to observe the same effect. Putting aside what we mean by "same effect" (i.e., directional consistency versus magnitude), we don't see any way in which people can reasonably disagree on this point. This is a general point true of all domains of scientific inquiry.
The real question becomes, how can we know what contingencies produced the effect in the original investigation? Or more specifically, how can we separate the important contingencies from the unimportant contingencies? There are innumerable contingencies present in a scientific investigation that are totally irrelevant to obtaining the effect: the brand of the light bulb in the room, the sock color of the experimenter, whether the participant got a haircut last Friday morning or Friday afternoon. Common sense can provide some guidance, but in the end the theory used to explain the effect specifies the necessary contingencies and, by omission, the unnecessary contingencies. Therefore, if one is operating under the wrong theory, one might think some contingencies are important when really they are unimportant, and more interestingly, one might miss some necessary contingencies because the theory did not mention them as being important.
Before providing an example, it might be useful to note that, as far as we can tell, no one has offered any criticism of the logic outlined above. Many sarcastic comments have been made along the lines of, "apparently we can never learn anything because of all these mysterious moderators." And it is true that the argument can be misused to defend poor research practices. But at core, there is no criticism about the basic point that contingencies are necessary for all effects and a theory establishes those contingencies.
As an example, consider some research showing social category priming resulting in behavioral effects, such as increased hostility following priming of young black male. If one is operating under a classic spreading activation model then the theory dictates the following contingencies: a prime event, the association between the prime and the target behavior, and the opportunity to express the target behavior. This is the account given by Bargh et al. (1996) in their earliest article on the topic: the pictures activate the category, the associate hostile is activated, and being provoked by the experimenter is the opportunity to express the behavior of hostility. Under this model, if you provide those contingencies to participants then there is no reason to fail to obtain the effect, other than the effect not being real. Hence, it makes sense to question the original effect following a failure if you primed participants (who associate young black males with hostility2) and provoked them. Note also that the theory tells you not only what is needed, but that everything else is irrelevant to obtaining the effect.
But what if this model is wrong? If "direct expression" models are wrong, then there may be additional contingencies that are needed to produce the effect that were present in the original investigation but went unidentified by the experimenter/researcher. For instance, if you understand this priming effect not as a simple expression of the activated concept "hostile" but instead as a self-regulatory response to a physically formidable outgroup male, then you can look to the large literature on defensive threat regulation to identify many other contingencies that should influence the results. For example, presence of escape is a known moderator of threat responses in rodents, such that when escape is available rodents are more likely to flee, relative to when escape is not available and defensive attack is likely. The contingency, then, concerns the animal's ability to escape, and research in defensive threat regulation has found the same contingency to be important for human threat responses (e.g., Blanchard et al., 2001). When we manipulated this variable while priming young black male (Cesario et al., 2010), we found flight responses to be more likely (when participants were in an open space) than fight responses (when participants were in a sound-attenuating booth).
As another example, Macrae & Johnston (1998) studied the importance of competing goals on the effects of priming helpfulness, and found that when people had a competing goal, behavioral priming effects were eliminated. Hence, prior tasks in the study might activate competing goals (perhaps perspective taking or establishing a social connection with outgroups) that could eliminate the aggressive response. Or sampling from a population of people high in agreeableness, who might have the competing goal of getting along chronically accessible, could eliminate the aggressive response.
Of course, as many people have correctly pointed out, if a replication failure occurs it is completely inappropriate to defensively yell, "unknown moderator!" Indeed, this was stated explicitly in the very article that many priming researchers have cited when engaging in exactly this kind of defensive behavior: "priming researchers cannot appeal endlessly to “unknown moderation” without doing the work to provide evidence for such moderation" (Cesario, 2014, p. 44).
To send a clear message to researchers, then: Please stop reflexively citing this article when responding to failures to replicate! You are missing the point of that article. Instead, the correct response is to productively work with other researchers to systematically establish that such moderation did in fact occur! When a failure occurs, the best course forward is to register a replication involving both locations/populations of interest, ideally with an extension of the original study that measures or manipulates the hypothesized moderator.
To take a final example, consider that there is now low evidentiary value for the original elderly priming-slow walking example, with evidence that some of these effects have been p-hacked (Lakens, 2014). Given this, the proper course of action is to try and replicate the effect, with preregistered replications, to provide more evidence relevant to this judgment. If priming researchers will not do this, then we forfeit the right to continue to talk about this as if it were an established, reliable effect. Of course there are limits: practical considerations prevent most of us from spending the next 20 years priming the same category in order to obtain the most accurate population effect size estimate. But we cannot continue to direct attention to the one or two "successful" publications as if getting something published establishes is truth once and for all.
2 Note that the assumption of an association between black males and hostility is critical, and while it might be a reasonable assumption in some regions it may not hold in others, pointing to yet another reason why variability in priming effects may exist across regions. For instance, this assumption would most likely not hold in a country such as the Netherlands, where its black population has a Caribbean background and is mostly associated with parties, fun, and good food. At the same time, lighter skin-toned North African males, e.g. from Morocco, would most likely elicit the effect since they are associated with hostility (see also Dotsch & Wigboldus 2008).
Bargh, J.A., Chen, M., & Burrows, L. (1996). Automaticity of social behavior: Direct effects of trait construct and stereotype activation on action. Journal of Personality and Social Psychology, 71, 230-244. doi: 10.1037/0022-3522.214.171.124
Blanchard, D.C., Hynd, A.L., Minke, K.A., Minemoto, T., & Blanchard, R.J. (2001). Human defensive behaviors to threat scenarios show parallels to fear- and anxiety-related defense patterns of non-human mammals. Neuroscience and Biobehavioral Reviews, 25, 761-770.
Cesario, J. (2014). Priming, replication, and the hardest science. Perspectives on Psychological Science, 9, 40-48. doi: 10.1177/1745691613513470
Cesario, J., Plaks, J.E., Hagiwara, N., Navarrete, C.D., & Higgins, E.T. (2010). The ecology of automaticity: How situational contingencies shape action semantics and social behavior. Psychological Science, 21, 1311-1317. doi: 10.1177/0956797610378685
Dotsch, R., & Wigboldus, D.H.J. (2008). Virtual prejudice. Journal of Experimental Social Psychology, 44, 1194-1198. doi: doi:10.1016/j.jesp.2008.03.003
Lakens, Daniel, Professors are Not Elderly: Evaluating the Evidential Value of Two Social Priming Effects Through P-Curve Analyses (January 20, 2014). Available at SSRN: http://ssrn.com/abstract=2381936 or http://dx.doi.org/10.2139/ssrn.2381936
Macrae, C.N., & Johnston, L. (1998). Help, I need somebody: Automatic action and inaction. Social Cognition, 16, 400-417.