Jun 5, 2014

Open Projects - Open Humans


This article is the second in a series highlighting open science projects around the community. You can read the interview this article was based on: edited for clarity, unedited.

While many researchers encounter no privacy-based barriers to releasing data, those working with human participants, such as doctors, psychologists, and geneticists, have a difficult problem to surmount. How do they reconcile their desire to share data, allowing their analyses and conclusions to be verified, with the need to protect participant privacy? It's a dilemma we've talked about before on the blog (see: Open Data and IRBs, Privacy and Open Data). A new project, Open Humans, seeks to resolve the issue by finding patients who are willing - even eager - to share their personal data.

Open Humans, which recently won a $500,000 grant from the Knight Foundation, grew out of the Personal Genome Project. Founded in 2005 by Harvard genetics professor George Church, the Personal Genome Project sought to solve a problem that many genetics researchers had yet to recognize. "At the time people didn't really see genomes as inherently identifiable," Madeleine Price Ball explains. Ball is co-founder of OpenHumans, Senior Research Scientist at PersonalGenomes.org, and Director of Research at the Harvard Personal Genome Project. She quotes from 1000 Genomes' informed consent form: "'Because of these measures, it will be very hard for anyone who looks at any of the scientific databases to know which information came from you, or even that any information in the scientific databases came from you.'"

"So that's sort of the attitude scientists had towards genomes at the time. Also, the Genetic Information Nondiscrimination Act didn't exist yet. And there was GATTACA. Privacy was still this thing everyone thought they could have, and genomes were this thing people thought would be crazy to share in an identifiable manner. I think the scientific community had a bit of unconscious blindness, because they couldn't imagine an alternative."

Church found an initial ten participants - the list includes university professors, health care professionals, and Church himself. The IRB interviewed each of the participants to make sure they truly understood the project and, satisfied, allowed it to move forward. The Personal Genome Project now boasts over 3,400 participants, each of whom have passed an entrance exam showing that they understand what will happen to their data, and the risks involved. Most participants are enthusiastic about sharing. One participant described it as "donating my body to science, but I don't have to die first".

The Personal Genome Project's expansion hasn't been without growing pains. "We've started to try to collect data beyond genomes." Personal health information, including medical history, procedures, test results, prescriptions, has been provided by a subset of participants. "Every time one of these new studies was brought before the IRB they'd be like ‘what? that too?? I don't understand what are you doing???' It wasn't scaling, it was confusing, the PGP was trying to collect samples and sequence genomes and it was trying to let other groups collect samples and do other things."

Thus, Open Humans was born. "Open Humans is an abstraction that takes part of what the PGP was doing (the second part) and make it scalable," Ball explains. "It's a cohort of participants that demonstrate an interest in public data sharing, and it's researchers that promise to return data to participants."

Open Humans will start out with a number of participants and an array of public data sets, thanks to collaborating projects American Gut, Flu Near You, and of course, the Harvard Personal Genome Project. Participants share data and, in return, researchers promise to share results. What precisely "sharing results" means has yet to be determined. "We're just starting out and know that figuring out how this will work is a learning process," Ball explains. But she's already seen what can happen when participants are brought into the research process - and brought together:

"One of the participants made an online forum, another a Facebook group, and another maintains a LinkedIn group… before this happened it hadn't occurred to me that abandoning the privacy-assurance model of research could empower participants in this manner. Think about the typical study - each participant is isolated, they never see each other. Meeting each other could breach confidentiality! Here they can talk to each other and gasp complain about you. That's pretty empowering." Ball and her colleague Jason Bobe, Open Humans co-founder and Executive Director of PersonalGenomes.org, hope to see all sorts of collaborations between participants and researchers. Participants could help researchers refine and test protocols, catch errors, and even provide their own analyses.

Despite these dreams, Ball is keeping the project grounded. When asked whether Open Humans will require articles published using their datasets to be made open access, she replies that, "stacking up a bunch of ethical mandates can sometimes do more harm than good if it limits adoption". Asked about the effect of participant withdrawals on datasets and reproducibility, she responds, "I don't want to overthink it and implement things to protect researchers at the expense of participant autonomy based on just speculation." (It is mostly speculation. Less than 1% of Personal Genome Project users have withdrawn from the study, and none of the participants who've provided whole genome or exome data have done so.)

It's clear that Open Humans is focused on the road directly ahead. And what does that road look like? "Immediately, my biggest concern is building our staff. Now that we won funding, we need to hire a good programmer... so if you are or know someone that seems like a perfect fit for us, please pass along our hiring opportunities". She adds that anyone can join the project's mailing list to get updates and find out when Open Humans is open to new participants - and new researchers. "And just talk about us. Referring to us is an intangible but important aspect for helping promote awareness of participant-mediated data sharing as a participatory research method and as a method for creating open data."

In other words: start spreading the news. Participant mediated data isn't the only solution to privacy issues, but it's an enticing one - and the more people who embrace it, the better a solution it will be.

May 20, 2014

Support Publication of Clinical Trials for International Clinical Trials Day


Today is International Clinical Trials Day, held on May 20th in honor of George Lind, the famous Scottish physician who began one of the world's first clinical trials on May 20th, 1747. This trial discovered that vitamin C deficiency was the cause of scurvy. While it and the other life-saving trials that have been conducted in the last two hundred and sixty seven years are surely worth celebration, International Clinical Trials Day is also a time to reflect on the problems that plague the clinical trials system. In particular, the lack of reporting on nearly half of all clinical trials has potentially deadly consequences.

The AllTrials campaign, launched in January 2013, aims to have all past and present clinical trials registered and reported. From the AllTrials campaign website:

Doctors and regulators need the results of clinical trials to make informed decisions about treatments.

But companies and researchers can withhold the results of clinical trials even when asked for them. The best available evidence shows that about half of all clinical trials have never been published, and trials with negative results about a treatment are much more likely to be brushed under the carpet.

This is a serious problem for evidence based medicine because we need all the evidence about a treatment to understand its risks and benefits. If you tossed a coin 50 times, but only shared the outcome when it came up heads and you didn’t tell people how many times you had tossed it, you could make it look as if your coin always came up heads. This is very similar to the absurd situation that we permit in medicine, a situation that distorts the evidence and exposes patients to unnecessary risk that the wrong treatment may be prescribed.

It also affects some very expensive drugs. Governments around the world have spent billions on a drug called Tamiflu: the UK alone spent £500 million on this one drug in 2009, which is 5% of the total £10bn NHS drugs budget. But Roche, the drug’s manufacturer, published fewer than half of the clinical trials conducted on it, and continues to withhold important information about these trials from doctors and researchers. So we don’t know if Tamiflu is any better than paracetamol. (Author's note: in April 2014 a review based on full clinical trial data determined that Tamiflu was almost entirely ineffective.)

Initiatives have been introduced to try to fix this problem, but they have all failed. Since 2008 in the US the FDA has required results of all trials to be posted within a year of completion of the trial. However an audit published in 2012 has shown that 80% of trials failed to comply with this law. Despite this fact, no fines have ever been issued for non-compliance. In any case, since most currently used drugs came on the market before 2008, the trial results that are most important for current medical practice would not have been released even if the FDA’s law was fully enforced.

We believe that this situation cannot go on. The AllTrials initiative is campaigning for the publication of the results (that is, full clinical study reports) from all clinical trials – past, present and future – on all treatments currently being used.

We are calling on governments, regulators and research bodies to implement measure to achieve this.

And we are calling for all universities, ethics committees and medical bodies to enact a change of culture, recognise that underreporting of trials is misconduct and police their own members to ensure compliance.

You can learn more about the problem of missing clinical trial data in this brief. AllTrials also provides slides on this issue to incorporate into talks and presentations as well as a petition you can sign.

May 7, 2014

When Science Selects for Fraud


This post is in response to Jon Grahe's recent article in which he invited readers to propose metaphors that might help us understand why fraud occurs and how to prevent it.

Natural selection is the process by which populations change as individual organisms succeed or fail to adapt to their environments. It is also an apt metaphor for how human cultures form and thrive. The scientific community, broadly speaking, selects for a number of personality traits, and those traits are more common among scientists than in the general population. In some cases, this is necessary and beneficial. In other cases, it is tragic.

The scientific community selects for curiosity. Not every scientist is driven by a deep desire to understand the natural world, but so many are. How boring would endless conferences, lab meetings, and lectures be if one didn’t delight in asking questions and figuring out answers. It also selects for a certain kind of analytical thinking. Those who can spot a confound or design a carefully controlled experiment are more likely to succeed. And it selects for perseverance. Just ask the researchers who work late into the night running gels, observing mice, or analyzing data.

The scientific community, like the broader culture of which it is a part, sometimes selects unjustly. It selects for the well-off: those who can afford the kind of schools where a love of science is cultivated rather than ignored or squashed, those who can volunteer in labs because they don’t need to work to support themselves and others, those who can pay $30 to read a journal article. It selects for white men: those who don’t have to face conscious and unconscious discrimination, cultural stereotyping, and microaggressions.

Of particular relevance right now is the way the scientific community selects for fraud. If asked, most scientists would say that the ideal scientist is honest, open-minded, and able to accept being wrong. But we do not directly reward these attributes. Instead, success - publication of papers, grant funding, academic positions and tenure, the approbation of our peers - is too often based on a specific kind of result. We reward those who can produce novel and positive results. We don’t reward based on how they produce them.

This does give an advantage to those with good scientific intuitions, which is a reasonable thing to select for. It also gives an advantage to risk-takers, those willing to risk their careers on being right. The risk averse? They have two options: to drop out of scientific research, as I did, or to commit fraud in order to ensure positive results, as Diederik Stapel, Marc Hauser and Jens Foster did. Among the risk-averse, those who are unwilling to do shoddy or unethical science are selected against. Those who are willing are selected for, and often reach the tops of their fields.

One of the more famous examples of natural selection is the peppered moth of England. Before the Industrial Revolution, these moths were lightly colored, allowing them to blend in with the light gray bark of the average tree. During the Industrial Revolution, extreme pollution painted the trees of England black with soot. To adapt, peppered moths evolved dark, soot-colored wings.

We can censure the individuals who commit fraud, but this is like punishing the peppered moth for its dirty wings. As long as success in the scientific community is measured by results and not process, we will continue to select for those willing to violate process in order to ensure results. Our species, the scientists, need to change our environment if we want to evolve past fraud.

Photo of Jon Grahe Biston betularia by Donald Hobern, CC BY 2.0

Apr 23, 2014

Memo From the Office of Open Science


Dear Professor Lucky,

Congratulations on your new position as assistant professor at Utopia University. We look forward to your joining our community and are eager to aid you in your transition from Antiquated Academy. It’s our understanding that Antiquated Academy does not have an Office of Open Science, so you may be unfamiliar with who we are and what we do.

The Office of Open Science was created to provide faculty, staff and students with the technical, educational, social and logistical support they need to do their research openly. We recognize that the fast pace of research and the demands placed on scientists to be productive make it difficult to prioritize open science. We collaborate with researchers at all levels to make it easier to do this work.

Listed below are some of the services we offer.


Jan 22, 2014

Open Projects - Wikipedia Project Medicine


This article is the first in a series highlighting open science projects around the community. You can read the interview this article was based on: edited for clarity, unedited.

Six years ago, Doctor James Heilman was working a night shift in the ER when he came across an error-ridden article on Wikipedia. Someone else might have used the article to dismiss the online encyclopedia, which was then less than half the size it is now. Instead, Heilman decided to improve the article. “I noticed an edit button and realized that I could fix it. Sort of got hooked from there. I’m still finding lots of articles that need a great deal of work before they reflect the best available medical evidence.”

Heilman, who goes by the username Jmh649 on Wikipedia, is now the president of the board of Wiki Project Med. A non-profit corporation created to promote medical content on Wikipedia, WPM contains over a dozen different initiatives aimed at adding and improving articles, building relationships with schools, journals and other medical organizations, and increasing access to research.

One of the initiatives closest to Heilman’s heart is the Translation Task Force, an effort to identify key medical articles and translate them into as many languages as they can. These articles cover common and potentially deadly medical circumstances, such as gastroenteritis (diarrhea), birth control, HIV/AIDS, and burns. With the help of Translators Without Borders, over 3 million words have been translated into about 60 languages. One of these languages is Yoruba, a West African language. Although Yoruba is spoken by nearly 30 million people, there are only a few editors working to translate medical articles into it.

“The first two billion people online by and large speak/understand at least one of the wealthy languages of the world. With more and more people getting online via cellphones that is not going to be true for the next 5 billion coming online. Many of them will find little that they can understand.” Wikipedia Zero, a program which provides users in some developing countries access to Wikipedia without mobile data charges, is increasing access to the site.

“People are, for better or worse, learning about life and death issues through Wikipedia. So we need to make sure that content is accurate, up to date, well-sourced, comprehensive, and accessible. For readers with no native medical literature, Wikipedia may well be the only option they have to learn about health and disease.”

That’s Jake Orlowitz (Ocaasi), WPM’s outreach coordinator. He and Heilman stress that there’s a lot of need for volunteer help, and not just with translating. Of the 80+ articles identified as key, only 31 are ready to be translated. The rest need citations verified, jargon simplified, content updated and restructured, and more.

In an effort to find more expert contributors, WPM has launched a number of initiatives to partner with medical schools and other research organizations. Orlowitz was recently a course ambassador to the UCSF medical school, where students edited Wikipedia articles for credit. He also set up a partnership with the Cochrane Collaboration a non-profit made up of over 30,000 volunteers, mostly medical professionals, who conduct reviews of medical interventions. “We arranged a donation of 100 full access accounts to The Cochrane Library, and we are currently coordinating a Wikipedian in Residence position with them. That person will teach dozens of Cochrane authors how to incorporate their findings into Wikipedia,” explains Orlowitz.

Those who are familiar with how Wikipedia is edited might balk at the thought of contributing. Won’t they be drawn in to “edit wars”, endless battles with people who don’t believe in evolution or who just enjoy conflict? “There are edit wars,” admits Heilman. “They are not that common though. 99% of articles can be easily edited without problems.”

Orlowitz elaborates on some of the problems that arise. “We have a lot of new editors who don't understand evidence quality.” The medical experts they recruit face a different set of challenges. “One difficulty many experts have is that they wish to reference their own primary sources. Or write about themselves. Both those are frowned upon. We also have some drug and device companies that edit articles in their area of business--we discourage this strongly and it's something we keep an eye on.”

And what about legitimate differences of opinion about as yet unsettled medical theories, facts and treatments?

“Wikipedia 'describes debates rather than engaging in them'. We don't take sides, we just summarize the evidence on all sides--in proportion to the quality and quantity of that evidence,” says Orlowitz. Heilman continues: “For example Cochrane reviews state it is unclear if the risk versus benefits of breast cancer screening are positive or negative. The USPSTF is supportive. We state both.” Wikipedia provides detailed guidelines for evaluating sources and dealing with conflicting evidence.

Another reason academics might hesitate before contributing is the poor reputation Wikipedia has in academic circles. Another initiative, the Wikipedia-journal collaboration, states: "One reason some academics express for not contributing to Wikipedia is that they are unable to get the recognition they require for their current professional position. A number of medical journals have agreed in principle to publishing high quality Wikipedia articles under authors' real names following formal peer review.” A pilot paper, adapted from the Wikipedia article on Dengue Fever, is to be published in the Journal of Open Medicine, with more publications hopefully to come.

The stigma against Wikipedia itself is also decreasing. “The usage stats for the lay public, medical students, junior physicians, and doctors, and pharmacists are just mindbogglingly high. It's in the range of 50-90%, even for clinical professionals. We hear a lot that doctors 'jog their memory' with Wikipedia, or use it as a starting point,” says Orlowitz. One 2013 study found that a third or more of general practitioners, specialists and medical professors had used Wikipedia, with over half of physicians in training accessing it. As more diverse kinds of scientific contributions begin to be recognized, Wikipedia edits may make their way onto CVs.

Open science activists may be disappointed to learn that Wikipedia doesn’t require or even prefer open access sources for its articles. “Our policy simply states that our primary concern is article content, and verifi_ability_. That standard is irrespective of how hard or easy it is to verify,” explains Orlowitz. Both Wikipedians personally support open access, and would welcome efforts to supplement closed access citations with open ones. “If there are multiple sources of equal quality that come to the same conclusions we support using the open source ones,” says Heilman. A new project, the Open Access Signalling project aims to help readers quickly distinguish what sources they’ll be able to access.

So what are the best ways for newcomers to get involved? Heilman stresses that editing articles remains one of the most important tasks of the project. This is especially true of people affiliated with universities. “Ironically, since these folks have access to high quality paywalled sources, one great thing they could do would be to update articles with them. We also could explore affiliating a Wikipedia editor with a university as a Visiting Scholar, so they'd have access to the library's catalogue to improve Wikipedia, in the spirit of research affiliates,” says Orlowitz.

Adds Heilman, “If there are institution who would be willing to donate library accounts to Wikipedia's we would appreciate it. This would require having the Wikipedian register in some manner with the university. There are also a number of us who may be willing / able to speak to Universities that wish to learn more about the place of Wikipedia in Medicine.” The two also speak at conferences and other events.

Wiki Project Med, like Wikipedia itself, is an open community - a “do-ocracy”, as Orlowitz calls it. If you’re interested in learning more, or in getting involved, you can check out their project page, which details their many initiatives, or reach out to Orlowitz or the project as a whole on Twitter (@JakeOrlowitz, @WikiProjectMed) or via email (jorlowitz@gmail.com, wikiprojectmed@gmail.com).

Jan 8, 2014

When Open Science is Hard Science


When it comes to opening up your work there is, ironically, a bit of a secret. Here it is: being open - in open science, open source software, or any other open community - can be hard. Sometimes it can be harder than being closed.

In an effort to attract more people to the cause, advocates of openness tend to tout its benefits. Said benefits are bountiful: increased collaboration and dissemination of ideas, transparency leading to more frequent error checking, improved reproducibility, easier meta-analysis, and greater diversity in participation, just to name a few.

But there are downsides, too. One of those is that it can be difficult to do your research openly. (Note here that I mean well and openly. Taking the full contents of your hard drive and dumping it on a server somewhere might be technically open, but it’s not much use to anyone.)

How is it hard to open up your work? And why?

Closed means privacy.

In the privacy of my own home, I seldom brush my hair. Sometimes I spend all day in my pajamas. I leave my dirty dishes on the table and eat ice cream straight out of the tub. But when I have visitors, or when I’m going out, I make sure to clean up.

In the privacy of a closed access project, you might take shortcuts. You might recruit participants from your own 101 class, or process your data without carefully documenting which steps you took. You’d never intentionally do something unethical, but you might get sloppy.

Humans are social animals. We try to be more perfect for each other than we do for ourselves. This makes openness better, but it also makes it harder.

Two heads need more explanation than one.

As I mentioned above, taking all your work and throwing it online without organization or documentation is not very helpful. There’s a difference between access and accessibility. To create a truly open project, you need to be willing to explain your research to those trying to understand it.

There are numerous routes towards sharing your work, and the most open projects take more than one. You can create stellar documentation of your project. You can point people towards background material, finding good explanations of the way your research methodology was developed or the math behind your data analysis or how the code that runs your stimulus presentation works. You can design tutorials or trainings for people who want to run your study. You can encourage people to ask questions about the project, and reply publicly. You can make sure to do all the above for people at all levels - laypeople, students, and participants as well as colleagues.

Even closed science is usually collaborative, so hopefully your project is decently well documented. But making it accessible to everyone is a project in itself.

New ideas and tools need to be learned.

As long as closed is the default, we’ll need to learn new skills and tools in the process of becoming open, such as version control, format conversion and database management.

These skills aren’t unique to working openly. And if you have a good network of friends and colleagues, you can lean on them to supplement your own expertise. But the fact remains that “going open” isn’t as easy as flipping a switch. Unless you’re already well-connected and well-informed, you’ll have a lot to learn.

People can be exhausting.

Making your work open often means dealing with other people - and not always the people you want to deal with. There are the people who mean well, but end up confusing, misleading, or offending you. There are the people who don’t mean well at all. There are the discussions that go off in unproductive directions, the conversations that turn into conflicts, the promises that get forgotten.

Other people are both a joy and a frustration, in many areas of life beyond open science. But the nature of openness assures you’ll get your fair share. This is especially true of open science projects that are explicitly trying to build community.

It can be all too easy to overlook this emotional labor, but it’s work - hard work, at that.

There are no guarantees.

For all the effort you put into opening up your research, you may find no one else is willing to engage with it. There are plenty of open source software projects with no forks or new contributors, open science articles that are seldom downloaded or science wikis that remain mostly empty, open government tools or datasets that no one uses.

Open access may increase impact on the whole, but there are no promises for any particular project. It’s a sobering prospect to someone considering opening up their research.

How can we make open science easier?

We can advocate for open science while acknowledging the barriers to achieving it. And we can do our best to lower those barriers:

Forgive imperfections. We need to create an environment where mistakes are routine and failures are expected - only then will researchers feel comfortable exposing their work to widespread review. That’s a tall order in the cutthroat world of academia, but we can begin with our own roles as teachers, mentors, reviewers, and internet commentators. Be a role model: encourage others to review your work and point out your mistakes.

Share your skills as well as your research. Talk about your experiences opening up your research with colleagues. Host lab meetings, department events, and conference panels to discuss the practical difficulties. If a training, website, or individual helped you understand some skill or concept, recommend widely. Talking about the individual steps will help the journey seem less intimidating - and will give others a map for how to get there.

Recognize the hard work of others with words and, if you can, financial support. Organization, documentation, mentorship, community management. These are areas that often get overlooked when it comes to celebrating scientific achievement - and allocating funding. Yet many open science projects would fail without leadership in these areas. Contribute what you can and support others who take on these roles.

Collaborate. Open source advocates have been creating tools to help share the work involved in opening research - there’s Software Carpentry, the Open Science Framework, Sage Bionetworks, and Research Compendia, just to name a few. But beyond sharing tools, we can share time and resources. Not every researcher will have the skillset, experience, or personality to quickly and easily open up their work. Sharing efforts across labs, departments and even schools can lighten the load. So can open science specialists, if we create a scientific culture where these specialists are trained, utilized and valued.

We can and should demand open scientific practices from our colleagues and our institutions. But we can also provide guidelines, tools, resources and sympathy. Open science is hard. Let’s not make it any harder.

Dec 13, 2013

Chasing Paper, Part 3


This is part three of a three part post brainstorming potential improvements to the journal article format. Part one is here, part two is here.

The classic journal article is only readable by domain experts.

Journal articles are currently written for domain experts. While novel concepts or terms are usually explained, there is the assumption of a vast array of background knowledge and jargon is the rule, not the exception. While this leads to quick reading for domain experts, it can make for a difficult slog for everyone else.

Why is this a problem? For one thing, it prevents interdisciplinary collaboration. Researchers will not make a habit of reading outside their field if it takes hours of painstaking, self-directed work to comprehend a single article. It also discourages public engagement. While science writers do admirable work boiling hard concepts down to their comprehensible cores, many non-scientists want to actually read the articles, and get discouraged when they can’t.

While opaque scientific writing exists in every format, technologies present new options to translate and teach. Jargon could be linked to a glossary or other reference material. You could be given a plain english explanation of a term when your mouse hovers over it. Perhaps each article could have multiple versions - for domain experts, other scientists, and for laypeople.

Of course, the ability to write accessibly is a skill not everyone has. Luckily, any given paper would mostly use terminology already introduced in previous papers. If researchers could easily credit the teaching and popularization work done by others, they could acknowledge the value of those contributions while at the same time making their own work accessible.

The classic journal article has no universally-agreed upon standards.

Academic publishing, historically, has been a distributed system. Currently, the top three publishers still account for less than half (42%) of all published articles (McGuigan and Russell, 2008). While certain format and content conventions are shared among publishers, generally speaking it’s difficult to propagate new standards, and even harder to enforce them. Not only do standards vary, they are frequently hidden, with most of the review and editing process taking place behind closed doors.

There are benefits to decentralization, but the drawbacks are clear. Widespread adoption of new standards, such as Simmons et al’s 21 Word Solution or open science practices, depends on the hard work and high status of those advocating for them. How can the article format be changed to better accommodate changing standards, while still retaining individual publishers’ autonomy?

One option might be to create a new section of each journal article, a free-form field where users could record whether an article met this or that standard. Researchers could then independently decide what standards they wanted to pay attention to. While this sounds messy, if properly implemented this feature could be used very much like a search filter, yet would not require the creation or maintenance of a centralized database.

A different approach is already being embraced: an effort to make the standards that currently exist more transparent by bringing peer review out into the open. Open peer review allows readers to view an article’s pre-publication history, including the authorship and content of peer reviews, while public peer review allows the public to participate in the review process. However, these methods have yet to be generally adopted.


It’s clear that journal articles are already changing. But they may not be changing fast enough. It may be better to forgo the trappings of the journal article entirely, and seek a new system that more naturally encourages collaboration, curation, and the efficient use of the incredible resources at our disposal. With journal articles commonly costing more than $30 each, some might jump at the chance to leave them behind.

Of course, it’s easy to play “what if” and imagine alternatives; it’s far harder to actually implement them. And not all innovations are improvements. But with over a billion dollars spent on research each day in the United States, with over 25,000 journals in existence, and over a million articles published each year, surely there is room to experiment.


Budd, J.M., Coble, Z.C. and Anderson, K.M. (2011) Retracted Publications in Biomedicine: Cause for Concern.

Wright, K. and McDaid, C. (2011). Reporting of article retractions in bibliographic databases and online journals. J Med Libr Assoc. 2011 April; 99(2): 164–167.

McGuigan, G.S. and Russell, R.D. (2008). The Business of Academic Publishing: A Strategic Analysis of the Academic Journal Publishing Industry and its Impact on the Future of Scholarly Publishing. Electronic Journal of Academic and Special Librarianship. Winter 2008; 9(3).

Simmons, J.P., Nelson, L.D. and Simonsohn, U.A. (2012) A 21 Word Solution.

Dec 12, 2013

Chasing Paper, Part 2


This is part two of a three part post brainstorming potential improvements to the journal article format. Part one is here, part three is here here.

The classic journal article format is not easily updated or corrected.

Scientific understanding is constantly changing as phenomena are discovered and mistakes uncovered. The classic journal article, however, is static. When a serious flaw in an article is found, the best a paper-based system can do is issue a retraction, and hope that a reader going through past issues will eventually come across the change.

Surprisingly, retractions and corrections continue to go mostly unnoticed in the digital era. Studies have shown that retracted papers go on to receive, on average, more than 10 post-retraction citations, with less than 10% of those citations acknowledging the retraction (Budd et al, 2011). Why is this happening? While many article databases such as PubMed provide retraction notices, the articles themselves are often not amended. Readers accessing papers directly from publishers’ websites, or from previously saved copies, can sometimes miss it. A case study of 18 retracted articles found several which they classified as “high risk of missing [the] notice”, with no notice given in the text of the pdf or html copies themselves (Wright et al, 2011). It seems likely that corrections have even more difficulty being seen and acknowledged by subsequent researchers.

There are several technological solutions which can be tried. One promising avenue would be the adoption of version control. Also called revision control, this is a way of tracking all changes made to a project. This technology has been used for decades in computer science and is becoming more and more popular - Wikipedia and Google Docs, for instance, both use version control. Citations for a paper could reference the version of the paper then available, but subsequent readers would be notified that a more recent version could be viewed. In addition to making it easy to see how articles have been changed, adopting such a system would acknowledge the frequency of retractions and corrections and the need to check for up to date information.

Another potential tool would be an alert system. When changes are made to an article, the authors of all articles which cite it could be notified. However, this would require the maintenance of up-to-date contact information for authors, and the adoption of communications standards across publishers (something that has been accomplished before with initiatives like CrossRef). A more transformative approach would be to view papers not as static documents but as ongoing projects that can be updated and contributed to over time. Projects could be tracked through version control from their very inception, allowing for a kind of pre-registration. Replications and new analyses could be added to the project as they’re completed. The most insightful questions and critiques from the public could lead to changes in new versions of the article.

The classic journal article only recognizes certain kinds of contributions.

When journal articles were first developed in the 1600s, the idea of crediting an author or authors must have seemed straightforward. After all, most research was being done by individuals or very small groups, and there were no such things as curriculum vitae or tenure committees. Over time, academic authorship has become the single most important factor in determining career success for individual scientists. The limitations of authorship can therefore have an incredible impact on scientific progress.

There are two major problems with authorship as it currently functions, and they are sides of the same coin. Authorship does not tell you what, precisely, each author did on a paper. And authorship does not tell you who, precisely, is responsible for each part of a paper. Currently, the authorship model provides only a vague idea of who is responsible for a paper. While this is sometimes elaborated upon briefly in the footnotes, or mentioned in the article, more often readers employ simple heuristics. In psychology, the first author is believed to have led the work, the last author to have provided physical and conceptual resources for the experiment, and any middle authors to have contributed in an unknown but significant way. This is obviously not an ideal way to credit people, and often leads to disputes, with first authorship sometimes misattributed. It has grown increasingly impractical as multiauthor papers have become more and more common. What does authorship on a 500-author paper even mean?

The situation is even worse for people whose contributions are not awarded with authorship. While contributions may be mentioned in the acknowledgements or cited in the body of the paper, neither of these have much impact when scientists are applying for jobs or up for tenure. This gives them little motivation to do work which will not be recognized with authorship. And such work is greatly needed. The development of tools, the collection and release of open data sets, the creation of popularizations and teaching materials, and the deep and thorough review of others’ work - these are all done as favors or side projects, even though they are vital to the progress of research. How can new technologies address these problems? There have been few changes made in this area, perhaps due to the heavy weight of authorship in scientific life, although there are some tools like Figshare which allow users to share non-traditional materials such as datasets and posters in citable (and therefore creditable) form. A more transformative change might be to use the version control system mentioned above. Instead of tracking changes to the article from publishing onwards, it could follow the article from its beginning stages. In that way, each change could be attributed to a specific person.

Another option might simply be to describe contributions in more detail. Currently if I use your methodology wholesale, or briefly mention a finding of yours, I acknowledge you in the same way - a citation. What if, instead, all significant contributions were listed? Although space is not a constraint with digital articles, the human attention span remains limited, and so it might be useful to create common categories for contribution, such as reviewing the article, providing materials, doing analyses, or coming up with an explanation for discussion.

There are two other problems are worth mentioning in brief. First, the phenomenon of ghost authorship, where substantial contributions to the running of a study or preparation of a manuscript go unacknowledged. This is frequently done in industry-sponsored research to hide conflicts of interest. If journal articles used a format where every contribution was tracked, ghost authorship would be impossible. Another issue is the assignment of contact authors, the researchers on a paper who readers are invited to direct questions to. Contact information can become outdated fairly quickly, causing access to data and materials to be lost; if contact information can be changed, or responsibility passed on to a new person, such loss can be prevented.

Dec 11, 2013

Chasing Paper, Part 1


This is part one of a three part post. Parts two and three have now been posted.

The academic paper is old - older than the steam engine, the pocket watch, the piano, and the light bulb. The first journal, Philosophical Transactions, was published on March 6th, 1665. Now that doesn’t mean that the journal article format is obsolete - many inventions much older are still in wide use today. But after a third of a millennium, it’s only natural that the format needs some serious updating.

When brainstorming changes, it may be useful to think of the limitations of ink and paper. From there, we can consider how new technologies can improve or even transform the journal article. Some of these changes have already been widely adopted, while others have never even been debated. Some are adaptive, using the greater storage capacity of computing to extend the functions of the classic journal article, while others are transformative, creating new functions and features only available in the 21st century.

The ideas below are suggestions, not recommendations - it may be that some aspects of the journal article format are better left alone. But we all benefit from challenging our assumptions about what an article is and ought to be.

The classic journal article format cannot convey the full range of information associated with an experiment.

Until the rise of modern computing, there was simply no way for researchers to share all the data they collected in their experiments. Researchers were forced to summarize: to gloss over the details of their methods and the reasoning behind their decisions and, of course, to provide statistical analyses in the place of raw data. While fields like particle physics and genetics continue to push the limits of memory, most experimenters now have the technical capacity to share all of their data.

Many journals have taken to publishing supplemental materials, although this rarely encompasses the entirety of data collected, or enough methodological detail to allow for independent replication. There are plenty of explanations for this slow adoption, including ethical considerations around human subjects data, the potential to patent methods, or the cost to journals of hosting this extra materials. But these are obstacles to address, not reasons to give up. The potential benefits are enormous: What if every published paper contained enough methodological detail that it could be independently replicated? What if every paper contained enough raw data that it could be included in meta-analysis? How much of meta-scientific work is never undertaken, because it's dependent on getting dozens or hundreds of contact authors to return your emails, and on universities to properly store data and materials?

Providing supplemental material, no matter how extensive, is still an adaptive change. What might a transformative change look like? Elsevier’s Article of the Future project attempts to answer that question with new, experimental formats that include videos, interactive models, and infographics. These designs are just the beginning. What if articles allowed readers to actually interact with the data and perform their own analyses? Virtual environments could be set up, lowering the barrier to independent verification of results. What if authors reported when they made questionable methodological decisions, and allowed readers, where possible, to see the results when a variable was not controlled for, or a sample was not excluded?

The classic journal article format is difficult to organize, index or search.

New technology has already transformed the way we search the scientific literature. Where before researchers were reliant on catalogues and indexes from publishers, and used abstracts to guess at relevance, databases such as PubMed and Google Scholar allow us to find all mentions of a term, tool, or phenomena across vast swathes of articles. While searching databases is itself a skill, its one that allows us to search comprehensively and efficiently, and gives us more opportunities to explore.

Yet old issues of organization and curation remain. Indexes used to speed the slow process of skimming through physical papers. Now they’re needed to help researchers sort through the abundance of articles constantly being published. With tens of millions of journal articles out there, how can we be sure we’re really accessing all the relevant literature? How can we compare and synthesize the thousands of results one might get on a given search?

Special kinds of articles - reviews and meta-analyses - have traditionally helped us synthesize and curate information. As discussed above, new technologies can help make meta-analyses more common by making it easier for researchers to access information about past studies. We can further improve the search experience by creating more detailed metadata. Metadata, in this context, is the information attached to an article which lets us categorize it without having to read the article itself. Currently, fields like title, author, date, and journal are quite common in databases. More complicated fields less often adopted, but you can find metadata on study type, population, level of clinical trial (where applicable), and so forth. What would truly comprehensive metadata look like? Is it possible to store the details of experimental structure or analysis in machine-readable format - and is that even desirable?

What happens when we reconsider not the metadata but the content itself? Most articles are structurally complex, containing literature reviews, methodological information, data, and analysis. Perhaps we might be better served by breaking those articles down into their constituent parts. What if methods, data, analysis were always published separately, creating a network of papers that were linked but discrete? Would that be easier or harder to organize? It may be that what we need here is not a better kind of journal article, but a new way of curating research entirely.

Nov 27, 2013

The State of Open Access


To celebrate Open Access Week last month, we asked people four questions about the state of open access and how it's changing. Here are some in depth answers from two people working on open access: Peter Suber, Director of the Harvard Office for Scholarly Communication and the Harvard Open Access Project, and Elizabeth Silva, associate editor at the Public Library of Science (PLOS).

How is your work relevant to the changing landscape of Open Access? What would be a successful outcome of your work in this area?

Elizabeth: PLOS is now synonymous with open access publishing, so it’s hard to believe that 10 years ago, when PLOS was founded, most researchers were not even aware that availability of research was a problem. We all published our best research in the best journals. We assumed our colleagues could access it, and we weren’t aware of (or didn’t recognize the problem with) the inability of people outside of the ivory tower to see this work. At that time it was apparent to the founders of PLOS, who were among the few researchers who recognized the problem, that the best way to convince researchers to publish open access would be for PLOS to become an open access publisher, and prove that OA could be a viable business model and an attractive publishing venue at the same time. I think that we can safely say that the founders of PLOS succeeded in this mission, and they did it decisively.

We’re now at an exciting time, where open access in the natural sciences is all but inevitable. We now get to work on new challenges, trying to solve other issues in research communication.

Peter: My current job has two parts. I direct the Harvard Office for Scholarly Communication (OSC), and I direct the Harvard Open Access Project (HOAP). The OSC aims to provide OA to research done at Harvard University. We implement Harvard's OA policies and maintain its OA repository. We focus on peer-reviewed articles by faculty, but are expanding to other categories of research and researchers. In my HOAP work, I consult pro bono with universities, scholarly societies, publishers, funding agencies, and governments, to help them adopt effective OA policies. HOAP also maintains a guide to good practices for university OA policies, manages the Open Access Tracking Project, writes reference pages on federal OA-related legislation, such as FASTR, and makes regular contributions to the Open Access Directory and the catalog of OA journals from society publishers.

To me success would be making OA the default for new research in every field and language. However, this kind of success more like a new plateau than a finish line. We often focus on the goal of OA itself, or the goal of removing access barriers to knowledge. But that's merely a precondition for an exciting range of new possibilities for making use of that knowledge. In that sense, OA is closer to the minimum than the maximum of how to take advantage of the internet for improving research. Once OA is the default for new research, we can give less energy to attaining it and more energy to reaping the benefits, for example, integrating OA texts with open data, improving the methods of meta-analysis and reproducibility, and building better tools for knowledge extraction, text and data mining, question answering, reference linking, impact measurement, current awareness, search, summary, translation, organization, and recommendation.

From the researcher's side, making OA the new default means that essentially all the new work they write, and essentially all the new work they want to read, will be OA. From the publisher's side, making OA the new default means that sustainability cannot depend on access barriers that subtract value, and must depend on creative ways to add value to research that is already and irrevocably OA.

How do you think the lack of Open Access is currently impacting how science is practiced?

Peter: The lack of OA slows down research. It distorts inquiry by making the retrievability of research a function of publisher prices and library budgets rather than author consent and internet connectivity. It hides results that happen to sit in journals that exceed the affordability threshold for you or your institution. It limits the correction of scientific error by limiting the number of eyeballs that can examine new results. It prevents the use of text and data mining to supplement human analysis with machine analysis. It hinders the reproducibility of research by excluding many who would want to reproduce it. At the same time, and ironically, it increases the inefficient duplication of research by scholars who don't realize that certain experiments have already been done.

It prevents journalists from reading the latest developments, reporting on them, and providing direct, usable links for interested readers. It prevents unaffiliated scholars and the lay public from reading new work in which they may have an interest, especially in the humanities and medicine. It blocks research-driven industries from creating jobs, products, and innovations. It prevents taxpayers from maximizing the return on their enormous investment in publicly-funded research.

I assume we're talking about research that authors publish voluntarily, as opposed to notes, emails, and unfinished manuscripts, and I assume we're talking about research that authors write without expectation of revenue. If so, then the lack of OA harms research and researchers without qualification. The lack of OA benefits no one except conventional publishers who want to own it, sell it, and limit the audience to paying customers.

Elizabeth: There is a prevailing idea that those that need access to the literature already have it; that those that have the ability to understand the content are at institutions that can afford the subscriptions. First, this ignores the needs of physicians, educators, science communicators, and smaller institutions and companies. More fundamentally, limiting access to knowledge, so that rests in the hands of an elite 1%, is archaic, backwards, and counterproductive. There has never been a greater urgency to find solutions to problems that fundamentally threaten human existence – climate change, disease transmission, food security – and in the face of this why would we advocate limited dissemination of knowledge? Full adoption of open access has the potential to fundamentally change the pace of scientific progress, as we make this information available to everyone, worldwide.

When it comes to issues of reproducibility, fraud or misreporting, all journals face similar issues regardless of the business model. Researchers design their experiments and collect their data long before they decide the publishing venue, and the quality of the reporting likely won’t change based on whether the venue is OA. I think that these issues are better tackled by requirements for open data and improved reporting. Of course these philosophies are certainly intrinsically linked – improved transparency and access can only improve matters.

What do you think is the biggest reason that people resist Open Access? Do you think there are good reasons for not making a paper open access?

Elizabeth: Of course there are many publishers who resist open access, which reflects a need to protect established revenue streams. In addition to large commercial publishers, there are a lot of scholarly societies whose primary sources of income are the subscriptions for the journals they publish.

Resistance from authors, in my experience, comes principally in two forms. The first is linked to the impact factor, rather than the business model. Researchers are stuck in a paradigm that requires them to publish as ‘high’ as possible to achieve career advancement. While there are plenty of high impact OA publications with which people choose to publish, it just so happens that the highest are subscription journals. We know that open access increases utility, visibility and impact of individual pieces of research, but the fallacy that a high impact journal is equivalent to high impact research persists.

The second reason cited is that the cost is prohibitory. This is a problem everyone at PLOS can really appreciate, and we very much sympathize with authors who do not have the money in their budget to pay author publication charges (APCs). However, it’s a problem that should really be a lot easier to overcome. If research institutions were to pay publication fees, rather than subscription fees, they would save a fortune; a few institutions have realized this and are paying the APCs for authors who choose to go OA. It would also help if funders could recognize publishing as an intrinsic part of the research, folding the APC into the grant. We are also moving the technology forward in an effort to reduce costs, so that savings can be passed onto authors. PLOS ONE has been around for nearly 7 years, and the fees have not changed. This reflects efforts to keep costs as low as we can. Ironically, the biggest of the pay-walled journals already charge authors to publish: for example, it can be between $500 and $1000 for the first color figure, and a few hundred for each additional one; on top of this there are page charges and reprint costs. Not only is the public paying for the research and the subscription, they are paying for papers that they can’t read.

Peter: There are no good reasons for not making a paper OA, or at least for not wanting to.

There are sometimes reasons not to publish in an OA journal. For example, the best journals in your field may not be OA. Your promotion and tenure committee may give you artificial incentives to limit yourself to a certain list of journals. Or the best OA journals in your field may charge publication fees which your funder or employer will not pay on your behalf. However, in those cases you can publish in a non-OA journal and deposit the peer-reviewed manuscript in an OA repository.

The resistance of non-OA publishers is easier to grasp. But if we're talking about publishing scholars, not publishers, then the largest cause of resistance by far is misunderstanding. Far too many researchers still accept false assumptions about OA, such as these 10:

--that the only way to make an article OA is to publish it in an OA journal --that all or most OA journals charge publication fees --that all or most publication fees are paid by authors out of pocket --that all or most OA journals are not peer reviewed --that peer-reviewed OA journals cannot use the same standards and even the same people as the best non-OA journals --that publishing in a non-OA journal closes the door on lawfully making the same article OA --that making work OA makes it harder rather than easier to find --that making work OA limits rather than enhances author rights over it --that OA mandates are about submitting new work to OA journals rather than depositing it in OA repositories, or --that everyone who needs access already has access.

In a recent article in The Guardian I corrected six of the most widespread and harmful myths about OA. In a 2009 article, I corrected 25. And in my 2012 book, I tried to take on the whole legendarium.

How has the Open Access movement changed in the last five years? How do you think it will change in the next five years?

Peter: OA has been making unmistakable progress for more than 20 years. Five years ago we were not in a qualitatively different place. We were just a bit further down the slope from where we are today.

Over the next five years, I expect more than just another five years' worth of progress as usual. I expect five years' worth of progress toward the kind of success I described in my answer to your first question. In fact, insofar as progress tends to add cooperating players and remove or convert resisting players, I expect five years' worth of compound interest and acceleration.

In some fields, like particle physics, OA is already the default. In the next five years we'll see this new reality move at an uneven rate across the research landscape. Every year more and more researchers will be able to stop struggling for access against needless legal, financial, and technical barriers. Every year, those still struggling will have the benefit of a widening circle of precedents, allies, tools, policies, best practices, accommodating publishers, and alternatives to publishers.

Green OA mandates are spreading among universities. They're also spreading among funding agencies, for example, in the US, the EU, and global south. This trend will definitely continue, especially with the support it has received from Global Research Council, Science Europe, the G8 Science Ministers, and the World Bank.

With the exception of the UK and the Netherlands, countries adopting new OA policies are learning from the experience of their predecessors and starting with green. I've argued in many places that mandating gold OA is a mistake. But it's a mistake mainly for historical reasons, and historical circumstances will change. Gold OA mandates are foolish today in part because too few journals are OA, and there's no reason to limit the freedom of authors to publish in the journals of their choice. But the percentage of peer-reviewed journals that are OA is growing and will continue to grow. (Today it's about 30%.) Gold OA mandates are also foolish today because gold OA is much more expensive than green OA, and there's no reason to compromise the public interest in order to guarantee revenue for non-adaptive publishers. But the costs of OA journals will decline, as the growing number of OA journals compete for authors, and the money to pay for OA journals will grow as libraries redirect money from conventional journals to OA.

We'll see a rise in policies linking deposit in repositories with research assessment, promotion, and tenure. These policies were pioneered by the University of Liege, and since adopted at institutions in nine countries, and recommended by the Budapest Open Access Initiative, the UK House of Commons Select Committee on Business, Innovation and Skills, and the Mediterranean Open Access Network. Most recently, this kind of policy has been proposed at the national level by the Higher Education Funding Council for England. If it's adopted, it will mitigate the damage of a gold-first policy in the UK. A similar possibility has been suggested for the Netherlands.

I expect we'll see OA in the humanities start to catch up with OA in the sciences, and OA for books start to catch up with OA for articles. But in both cases, the pace of progress has already picked up significantly, and so has the number of people eager to see these two kinds of progress accelerate.

The recent decision that Google's book scanning is fair use means that a much larger swath of print literature will be digitized, if not in every country, then at least in the US, and if not for OA, then at least for searching. This won't open the doors to vaults that have been closed, but it will open windows to help us see what is inside.

Finally, I expect to see evolution in the genres or containers of research. Like most people, I'm accustomed to the genres I grew up with. I love articles and books, both as a reader and author. But they have limitations that we can overcome, and we don't have to drop them to enhance them or to create post-articles and post-books alongside them. The low barriers to digital experimentation mean that we can try out new breeds until we find some that carry more advantages than disadvantages for specific purposes. Last year I sketched out one idea along these lines, which I call an evidence rack, but it's only one in an indefinitely large space constrained only by the limits on our imagination.

Elizabeth: It’s starting to feel like universal open access is no longer “if” but “when”. In the next five years we will see funders and institutions recognize the importance of access and adopt policies that mandate and financially support OA; resistance will fade away, and it will simply be the way research is published. As that happens, I think the OA movement will shift towards tackling other issues in research communication: providing better measures of impact in the form of article level metrics, decreasing the time to publication, and improving reproducibility and utility of research.