In 1959, Festinger and Carlsmith reported the results of an experiment that spawned a voluminous body of research on cognitive dissonance. In that experiment, all subjects performed a boring task. Some participants were paid $1 or $20 to tell the next subject the task was interesting and fun whereas participants in a control condition did no such thing. All participants then indicated how enjoyable they felt the task was, their desire to participate in another similar experiment, and the scientific importance of the experiment. The authors hypothesized that participants paid $1 to tell the next participant that the boring task they just completed was interesting would experience internal dissonance, which could be reduced by altering their attitudes on the three outcomes measures. A little known fact about the results of this experiment, however, is that only on one of these outcome measures did a statistically significant effect emerge across conditions. The authors reported that subjects paid $1 enjoyed the task more than those paid $20 (or the control participants), but no statistically significant differences emerged on the other two measures.
In another highly influential paper, Word, Zanna, and Cooper (1974) documented the self-fulfilling prophecies of racial stereotypes. The researchers had white subjects interview trained white and black applicants and coded for six non-verbal behaviors of immediacy (distance, forward lean, eye contact, shoulder orientation, interview length, and speech error rate). They found that that white interviewers treated black applicants with lower levels of non-verbal immediacy than white applicants. In a follow-up study involving white subjects only, applicants treated with less immediate non-verbal behaviors were judged to perform less well during the interview than applicants treated with more immediate non-verbal behaviors. A fascinating result, however, a little known fact about this paper is that only three of the six measures of non-verbal behaviors assessed in the first study (and subsequently used in the second study) were statistically significant.
What do these two examples make salient in relation to how psychologists report their results nowadays? Regular readers of prominent journals like Psychological Science and Journal of Personality and Social Psychology may see what I’m getting at: It is very rare these days to see articles in these journals wherein half or most of the reported dependent variables (DVs) fail to show statistically significant effects. Rather, one typically sees squeaky-clean looking articles where all of the DVs show statistically significant effects across all of the studies, with an occasional mention of a DV achieving “marginal significance” (Giner-Sorolla, 2012).
In this post, I want us to consider the possibility that psychologists’ reporting practices may have changed in the past 50 years. This then raises the question as to how this came about. One possibility is that as incentives became increasingly more perverse in psychology (Nosek,Spies, & Motyl, 2012), some researchers realized that they could out-compete their peers by reporting “cleaner” looking results which would appear more compelling to editors and reviewers (Giner-Sorolla, 2012). For example, decisions were made to simply not report DVs that failed to show significant differences across conditions or that only achieved “marginal significance”. Indeed, nowadays sometimes even editors or reviewers will demand that such DVs not be reported (see PsychDisclosure.org; LeBel et al., 2013). A similar logic may also have contributed to researchers’ deciding not to fully report independent variables that failed to yield statistically significant effects and not fully reporting the exclusion of outlying participants due to fear that this information may raise doubts among the editor/reviewers and hurt their chance of getting their foot in the door (i.e., at least getting a revise-and-resubmit).
An alternative explanation is that new tools and technology have given us the ability to measure a greater number of DVs, which makes it more difficult to report on all them. For example, neuroscience (e.g., EEG, fMRI) and eye-tracking methods yield multitudes of analyzable data that were not previously available. Though this is undeniably true, the internet and online article supplements gives us the ability to fully report our methods and results and use the article to draw attention to the most interesting data.
Considering the possibility that psychologists’ reporting practices have changed in the past 50 years has implications for how to construe recent calls for the need to raise reporting standards in psychology (LeBel et al., 2013; Simmons, Nelson, & Simonsohn, 2011; Simmons, Nelson, & Simonsohn, 2012). Rather than seeing these calls as rigid new requirements that might interfere with exploratory research and stifle our science, one could construe such calls as a plea to revert back to the fuller reporting of results that used to be the norm in our science. From this perspective, it should not be viewed as overly onerous or authoritarian to ask researchers to disclose all excluded observations, all tested experimental conditions, all analyzed measures, and their data collection termination rule (what I’m now calling the BASIC 4 methodological categories covered by PsychDisclosure.org and Simmons et al.’s, 2012 21-word solution). It would simply be the way our forefathers used to do it.
Festinger, L. & Carlsmith, J. M. (1959). Cognitive consequences of forced compliance. The Journal of Abnormal and Social Psychology, 58(2), Mar 1959, 203-210. doi: 10.1037/h0041593
Giner-Sorolla, R. (2012). Science or art? How esthetic standards grease the way through the publication bottleneck but undermine science. Perspectives on Psychological Science, 7(6), 562–571. 10.1177/1745691612457576
LeBel, E. P., Borsboom, D., Giner-Sorolla, R., Hasselman, F., Peters, K. R., Ratliff, K. A., & Smith, C. T. (2013). PsychDisclosure.org: Grassroots support for reforming reporting standards in psychology. Perspectives on Psychological Science, 8(4), 424-432. doi: 10.1177/1745691613491437
Nosek, B. A., Spies, J. R., & Motyl, M. (2012). Scientific utopia: II. Restructuring incentives and practices to promote truth over publishability. Perspectives on Psychological Science, 7, 615-631. doi: 10.1177/1745691612459058
Simmons J., Nelson L. & Simonsohn U. (2011) False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allow Presenting Anything as Significant. Psychological Science, 22(11), 1359-1366.
Simmons J., Nelson L. & Simonsohn U. (2012) A 21 Word Solution Dialogue: The Official Newsletter of the Society for Personality and Social Psychology, 26(2), 4-7.
Word, C. O., Zanna, M. P., & Cooper, J. (1974). The nonverbal mediation of self-fulfilling prophecies in interracial interaction. Journal of Experimental Social Psychology, 10(2), 109–120. doi: 10.1016/0022-1031(74)90059-6