Researcher Degrees of Freedom in Data Analysis
by Sean MackinnonThe enormous amount of options available for modern data analysis is both a blessing and a curse. On one hand, researchers have specialized tools for any number of complex questions. On the other hand, we’re also faced with a staggering number of equallyviable choices, many times without any clearcut guidelines for deciding between them. For instance, I just popped open SPSS statistical software and counted 18 different ways to conduct posthoc tests for a oneway ANOVA. Some choices are clearly inferior (e.g., the LSD test doesn’t adjust pvalues for multiple comparisons) but it’s possible to defend the use of many of the available options. These ambiguous choice points are sometimes referred to as researcher degrees of freedom.
In theory, researcher degrees of freedom shouldn’t be a problem. More choice is better, right? The problem arises from two interconnected issues: (a) Ambiguity as to which statistical test is most appropriate and (b) an incentive system where scientists are rewarded with publications, grants, and career stability when their pvalues fall below the revered p < .05 criterion. So, perhaps unsurprisingly, when faced with a host of ambiguous options for data analysis, most people settle on the one that achieves statistically significant results. Simmons, Nelson, and Simonsohn (2011) argue that this undisclosed flexibility in data analysis allows people to present almost any data as “significant,” and calls for 10 simple guidelines for reviewers and authors to disclose in every paper – which, if you haven’t read yet are worth checking out. In this post, I will discuss a few guidelines of my own for conducting data analysis in a way that strives to overcome our inherent tendency to be selfserving.

Make as many data analytic decisions as possible before looking at your data. Review the statistical literature and decide on which statistical test(s) will be best before looking at your collected data. Continue to use those tests until enough evidence emerges to change your mind. The important thing is that you make these decisions before looking at your data. Once you start playing with the actual data, your selfserving biases will start to kick in. Do not underestimate your ability for selfdeception: Selfserving biases are powerful, pervasive, and apply to virtually everyone. Consider preregistering your data analysis plan (perhaps using the Open Science Framework to keep yourself honest and to convince future reviewers that you aren’t exploiting researcher degrees of freedom.

When faced with a situation where there are too many equally viable choices, run a small number of the best choices, and report all of them. In this case, decide on 25 different tests ahead of time. Report the results of all choices, and make a tentative conclusion based if the majority of these tests agree. For instance, when determining model fit in structural equation modeling, there many different methods you might use. If you can’t figure out which method is best by reviewing the statistical literature – it’s not entirely clear, statisticians disagree about as often as any other group of scientists – then report the results of all tests, and make a conclusion if they all converge on the same solution. When they disagree, make a tentative conclusion based on the majority of tests that agree (e.g., 2 of 3 tests come to the same conclusion). For the record, I currently use CFI, TLI, RMSEA, and SRMR in my own work, and use these even if other fit indices provide more favorable results.

When deciding on a data analysis plan after you’ve seen the data, keep in mind that most researcher degrees of freedom have minimal impact on strong results. For any number of reasons, you might find yourself deciding on a data analysis plan after you’ve played around with the data for a while. At the end of the day, strong data will not be influenced much by researcher degrees of freedom. For instance, results should look much the same regardless of whether you exclude outliers, transform them, or leave them in the data when you have a study with high statistical power. Simmons et al. (2011) specifically recommend that results should be presented (a) with and without covariates, and (b) with and without specific data points excluded, if any were removed. Again, the general idea is that strong results will not change much when you alter researcher degrees of freedom. Thus, I again recommend analyzing the data in a few different ways and looking for convergence across all methods when you’re developing a data analysis plan after seeing the data. This sets the bar higher to try and combat your natural tendency to report just the one analysis that “works.” When minor data analytic choices drastically change the conclusions, this should be a warning sign that your solution is unstable and the results are probably not trustworthy. The number one reason why you have an unstable solution is probably because you have low statistical power. Since you hopefully had a strict data collection end date, the only viable alternative when results are unstable is to replicate the results in a second, more highlypowered study using the same data analytic approach.
At the end of the day, there is no “quickfix” for the problem of selfserving biases during data analysis so long as the incentive system continues to reward novel, statistically significant results. However, by using the tips in this article (and elsewhere) researchers can focus on finding strong, replicable results by minimizing the natural human tendency to be selfserving.
References
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). Falsepositive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22, 13591366. doi:10.1177/0956797611417632
Comments