Nov 5, 2014

Inflated Grades, Inflated Science

by

This year, I’ve been teaching a lot more than usual. With that extra teaching comes a lot more grading – and students with concerns about grades. With all the talk about grade inflation lately, I’ve been thinking about HOW grades come to be inflated. While there are certainly political pressures from governments and institutions to produce “successful” students that contribute to grade inflation, I’m thinking about these problems like a psychologist -- a data analyst, really. I’ve learned a lot by working in the open science movement about the psychological processes that underlie questionable research practices and effect size inflation. In this post, I want to draw parallels between grade inflation at universities, and effect size inflation in scientific research. To some extent, I think there are similar psychological processes that contribute to both problems.

Grades have monetary value

High university grades have pragmatic, monetary value to students. At the low end, avoiding failures means not having to pay extra tuition to re-take a course. On the other end, high university grades increase the odds that students will receive scholarships to pay for their education and provides increased access to graduate-level education – which for many individuals, means a higher-paid, more prestigious job. It makes sense then that students will act in their best interest to improve their own grades, while actively avoiding situations that will negatively impact their grades.

Measurement of Academic Success Contains Error

University grades are a psychological construct that (theoretically) measure a student’s capacity for critical thought, knowledge of the subject matter, and analytical skill. Like any psychological construct, teachers need to operationalize this construct with specific measures – in university, that usually means essays, exams, quizzes and assignments. Classical test theory suggests that all measurement is imperfect:

True Score = Measured Score + Error

So, whenever we grade a student’s exam or essay, it approximates their true level of competence with a certain degree of inaccuracy. Maybe you add up the final grade incorrectly, create a poor multiple choice question, or are just in a bad mood when you graded that essay. All of this is error.

Another assumption underlying many statistics is that the residuals are normally distributed. So, sometimes you give students grades that are too high, and sometimes you give grades that are too low, but on average these will tend to cancel each other out in terms of the average grades for your class (assuming randomly distributed errors).

Stacking the deck to get an advantage

The thing is (despite these statistical truisms) most students think that getting a lower grade than they deserve is terribly unfair. Students will tend to work in their own self-interest to get better grades – and the tangible rewards that go along with good grades. Through a few consistently applied tactics, students positively bias the error terms – and in doing so, tend to inflate grades overall. There are at least three primary ways students might do this.

Contesting grades: When students receive a grade that is lower than they deserve – which will happen occasionally due to measurement error in the grading process – students will often try to convince their professor to change that grade. However, I have yet to have a student argue that they should receive a lower grade even though it’s probable that I’ve made those kinds of errors too.

Dropping classes: When students figure they are going to fail a course, then tend to drop the course so it won’t count against their GPA. Thus, the worst instances of performance can be wiped from their record. Of course, students drop classes for all kinds of reasons – however, I think it’s reasonable to say that poor grades increase the odds a student will drop a class (in fact it’s the #1 reason students give for withdrawing from courses in one survey).

Taking easy classes: It’s easier to get a good grade in some classes than others. Sometimes it’s because the material is easier, while other times it’s because the student has prior experience (e.g., a French immersion student taking introductory French in university). While I don’t have hard evidence to prove this, I think most teachers were students themselves long enough to understand that plenty of students are thinking about this when selecting classes.

Because students selectively contest poor grades, and drop courses more frequently when their performance is poor, and actively search for classes where it is easier to get a good grade, this produce a selective positive bias in overall grades.

The parallel with research effect sizes

I think there are clear parallels between grade inflation and inflated effect sizes in science. Like students and good grades, statistically significant results have a pragmatic, monetary value for scientists. Statistically significant results are much more likely to be published in scientific journals. Publications are the currency in academia – numerous, high-impact publications are required to win grants, tenure, and sometimes even to keep your job. It’s not surprising then that scientists will go to great lengths to get statistically significant results to publish in journals – sometimes creating “manufactured beauties” in the process.

In a previous post, I talked about how sampling variation can lead some researchers to find null results, even when the experimenter has done everything right. Imagine you get grant money to run an experiment. It’s well-grounded in theory, and you even go to great lengths to ensure you have 95% power. You collect the data and run the experiment and eagerly check your results … only to find that your hypotheses were not supported. With 95% power, 1 in 20 research studies will end up like this (assuming a frequentist approach which, for better or worse, is dominant in psychology right now). This is a frightening fact of our line of work – there is an element of random chance that permeates statistics. Psychologically speaking, this feels dreadfully unfair. You did good scientific work, but because of outside pressures to prioritize “statistically significant” work with ps < .05, your research will encounter significant challenges getting published. With this in mind it makes sense why many scientists engage in behaviours to “game” this potentially unfair system – the rationale is not that different from those of our students trying to make their way through university with good grades.

Like many in the OSC, I think that major structural changes are needed at the level of the scientific journals to incentivize good research practices, rather than simply incentivizing novel, statistically significant results. When I look at the issues using grades as an analogy, it seems to me that asking scientists to change their questionable research practices without changing the underlying structural problems is a lot like asking students to simply accept a bad grade -- good for the credibility of the institution, but bad for the individual.

Thinking about these two issues together has been an interesting thought experiment both in empathizing with student concerns, and with understanding precisely what it is about the current climate of publishing that feels so unfair sometimes. Like my students, I’d like to believe that hard work should equate to career success – however, the unfortunate truth of science is that there is a bit of luck involved. Even if I disagree with questionable research practices, think I can understand why many people do it. Probability can be cruel.

Share on: TwitterFacebookGoogle+Email

Comments