Feb 5, 2014

Open Data and IRBs


Among other things the open science movement encourages “open data” practices, that is, researchers making data freely available on personal/lab websites or institutional repositories for others to use. For some, open data is a necessity as the NIH and NSF have adopted data-sharing policies and require some grant applications to include data management and dissemination plans. According to the NIH:

“...all data should be considered for data sharing. Data should be made as widely and freely available as possible while safeguarding the privacy of participants, and protecting confidential and proprietary data.” (emphasis theirs)

Before making human subject data open several issues must be considered. First, data should be de-identified to maintain subject confidentiality so responses cannot be linked to identities and data are seemingly anonymous. Second, researchers should consider Institutional Review Board’s (IRB) policies about data sharing. (Disclosure: I have been a member of my university's IRB for 6 years and chair of my Departmental Review Board, DRB, for 7 years.)

Unfortunately, while the policies and procedure of all IRBs require researchers to obtain consent, disclose study procedures to subjects, and maintain confidentiality, it is unknown how many IRBs have policies and procedures for open data dissemination. Thus, a conflict may arise between researchers who want to adopt open data practices or need to disseminate data (those with NIH or NSF grants) and judgements of IRBs.

This is an especially important issue for those who want to share data that are already collected: can use data be openly disseminated without IRB review? (I address this below when I offer recommendations.) What can researchers do when they want or need to share data freely, but their IRB does not have a clear policy? And what say does an IRB have in open data practices?

While IRBs should be consulted and informed about open data, as I delineate below IRBs are not now and were never intended to be data-monitoring groups (Bankert & Amdur, 2000). IRBs are regulated and have little say in whether a researcher can share data, based on the purview, scope, and responsibilities of IRBs.

IRBs in the United States are regulated under US Health and Human Services (HHS) guidelines for Protection of Human Subjects. The guidelines describe the composition of IRBs, record keeping, define levels of risk, and list specific duties of IRBs and hint at their limits.

When they function appropriately IRBs review research protocols to (1) evaluate risks; (2) determine whether subject confidentiality is maintained, that is, whether responses are linked to identities (‘confidentiality’ differs from ‘privacy’, which means others will not know a person participated in a study); and (3) evaluate whether subjects are given sufficient information about risks, procedures, privacy, and confidentiality. HHS Regulations Part 46, Subpart A, Section 111 ("Criteria for IRB Approval of Research") (a)(2), is very specific on the purview of IRBs in evaluating protocols:

"In evaluating risks and benefits, the IRB should consider only those risks and benefits that may result from the research (as distinguished from risks and benefits of therapies subjects would receive even if not participating in the research). The IRB should not consider possible long-range effects of applying knowledge gained in the research (for example, the possible effects of the research on public policy) as among those research risks that fall within the purview of its responsibility." [emphasis added]

And regulations §46.111 (a)(6) and (a)(7) state that IRBs are to evaluate the safety, privacy, and confidentiality of subjects in proposed research:

(a)(6) "When appropriate, the research plan makes adequate provision for monitoring the data collected to ensure the safety of subjects.” (a)(7) “When appropriate, there are adequate provisions to protect the privacy of subjects and to maintain the confidentiality of data."

The regulations make it clear that IRBs should consider only risks directly related to the study, and explicitly forbid IRBs from evaluating potential long-range effects of new knowledge gained from the study, as in new knowledge resulting from data sharing. Thus, IRBs should concern themselves with evaluating a study for safety, confidentiality, and that information is disclosed; reviewing existing data for dissemination is not under the purview of the IRB. The only issue that should concern IRBs about open data is whether the data are de-identified to “...protect the privacy of subjects and to maintain the confidentiality of data." It is not the responsibility of the IRB to monitor data, that responsibility falls to the researcher.

Nonetheless, IRBs may take the position that they are data monitors and deny a researcher’s request to openly disseminate data. In denying a request an IRB may use the argument ‘subjects would not have participated if they knew the data would be openly shared.’ In this case, IRBs would be playing mind-readers; there is no way an IRB can assume subjects would not have participated if they knew data would be openly shared. However, whether a person would decline to participate if they were informed about a researcher’s intent to openly disseminate data is an empirical question.

Also, with this argument the IRB is implicitly suggesting subjects would need to have been informed about open data dissemination in the consent form. But, such a requirement for consent forms neglects other federal guidelines. The Belmont Report provides responsibilities for human researchers, much like the APA's ethical principles, and describes what information should be included in the consent process:

“Most codes of research establish specific items for disclosure intended to assure that subjects are given sufficient information. These items generally include: the research procedure, their purposes, risks and anticipated benefits, alternative procedures (where therapy is involved), and a statement offering the subject the opportunity to ask questions and to withdraw at any time from the research.”

The Belmont Report does not even mention that subjects should be informed about the potential long-range plans or uses of the data they provide. Indeed, researchers do not have to tell subjects what analyses will be used, and for good reason. All the Belmont requires is for subjects be informed about the purpose of the study, the procedures, and be informed about their privacy and confidentiality of responses.

Another argument an IRB could make is the data could be used maliciously. For example, a researcher could make a data set open that included ethnicity and test scores and someone else could use that data to show certain ethnic groups are smarter than others. (This example is based on a recent Open Science Framework post that is the basis for this post.)

Although it is more likely that open data would be used as intended, someone could use data as they were not intended and may find a relationship between ethnicity and test scores. So what? The data are not malicious or problematic, it is the person using (misusing?) the data, and IRBs should not be in the habit of allowing only politically correct research to proceed (Lilienfeld, 2010). Also, by considering what others might do with open data, IRBs would be mind-reading and overstepping its purview by considering “...long-range effects of applying knowledge gained in the research (for example, the possible effects of the research on public policy).”

The bottom line is IRBs cannot know whether subjects would not have participated in a project if they knew the data would be openly disseminated, or potential findings by others. Federal regulations inform IRBs of their specific duties, which do not include data monitoring or making judgments on open data dissemination; those duties are the responsibilities of the researcher.

So what should you do if you want to make your data open? First, don't fear the IRB, but don’t forget the IRB. Perhaps re-examine IRB policies any time you plan a new project to remind yourself of the IRB requirements.

Second, making your data open does depend on what subjects agree to on the consent form, and this is especially important if you want to make existing data open. If subjects are told their participation will remain private (identities not disclosed) and responses will remain confidential (identities not linked to responses), openly disseminating de-identified data would not violate the agreement. However, if subjects were told the data would ‘not be disseminated’, the researcher may violate the agreement if they openly share data. In this case the IRB would need to be involved, subjects may need to re-consent to allow their responses to be disseminated, and new IRB approval may be needed as the original consent agreement may change.

Third, de-identify data sets you plan to make open. This includes removing names, student IDs, the subject numbers, timestamps, and anything else that could be used to uniquely identify a person.

Fourth, inform your IRB and department of your intentions. Describe your de-identification process and that you are engaging in open data practices as you see appropriate while maintaining subject confidentiality and privacy. (If someone objects, direct them toward federal IRB regulations.)

Finally, work with your IRB to develop guidelines and policies for data sharing. Given the speed and recency of the open science and open data movements, it is unlikely many IRBs have considered such policies.

We want greater transparency in science, and open data is one practice the can help. The IRB should not be seen as a hurdle or barrier to disseminating data, but as a reminder that one of the best practices in science is to ensure the integrity of our data and information communications by responsibly maintaining the confidence and privacy of our research subjects.


Bankert, E., & Amdur, R. (2000). The IRB is not a data and safety monitoring board. IRB: Ethics and Human Research, 22(6), 9-11.

De Wolfe, V. A., Sieber, J. E., Steel, P. M., & Zarate, A. O. (2005). Part I: What is the requirement for data sharing? IRB: Ethics and Human Research, 27(6), 12-16.

De Wolfe, V. A., Sieber, J. E., Steel, P. M., & Zarate, A. O. (2006). Part III: Meeting the challenge when data sharing is required. IRB: Ethics and Human Research, 28(2), 10-15.

Lilienfeld, S.O. (2010). Can psychology become a science? Personality and Individual Differences, 49, 281-288.