Jan 22, 2014

Open Projects - Wikipedia Project Medicine

by

This article is the first in a series highlighting open science projects around the community. You can read the interview this article was based on: edited for clarity, unedited.

Six years ago, Doctor James Heilman was working a night shift in the ER when he came across an error-ridden article on Wikipedia. Someone else might have used the article to dismiss the online encyclopedia, which was then less than half the size it is now. Instead, Heilman decided to improve the article. “I noticed an edit button and realized that I could fix it. Sort of got hooked from there. I’m still finding lots of articles that need a great deal of work before they reflect the best available medical evidence.”

Heilman, who goes by the username Jmh649 on Wikipedia, is now the president of the board of Wiki Project Med. A non-profit corporation created to promote medical content on Wikipedia, WPM contains over a dozen different initiatives aimed at adding and improving articles, building relationships with schools, journals and other medical organizations, and increasing access to research.

One of the initiatives closest to Heilman’s heart is the Translation Task Force, an effort to identify key medical articles and translate them into as many languages as they can. These articles cover common and potentially deadly medical circumstances, such as gastroenteritis (diarrhea), birth control, HIV/AIDS, and burns. With the help of Translators Without Borders, over 3 million words have been translated into about 60 languages. One of these languages is Yoruba, a West African language. Although Yoruba is spoken by nearly 30 million people, there are only a few editors working to translate medical articles into it.

“The first two billion people online by and large speak/understand at least one of the wealthy languages of the world. With more and more people getting online via cellphones that is not going to be true for the next 5 billion coming online. Many of them will find little that they can understand.” Wikipedia Zero, a program which provides users in some developing countries access to Wikipedia without mobile data charges, is increasing access to the site.

“People are, for better or worse, learning about life and death issues through Wikipedia. So we need to make sure that content is accurate, up to date, well-sourced, comprehensive, and accessible. For readers with no native medical literature, Wikipedia may well be the only option they have to learn about health and disease.”

That’s Jake Orlowitz (Ocaasi), WPM’s outreach coordinator. He and Heilman stress that there’s a lot of need for volunteer help, and not just with translating. Of the 80+ articles identified as key, only 31 are ready to be translated. The rest need citations verified, jargon simplified, content updated and restructured, and more.

In an effort to find more expert contributors, WPM has launched a number of initiatives to partner with medical schools and other research organizations. Orlowitz was recently a course ambassador to the UCSF medical school, where students edited Wikipedia articles for credit. He also set up a partnership with the Cochrane Collaboration a non-profit made up of over 30,000 volunteers, mostly medical professionals, who conduct reviews of medical interventions. “We arranged a donation of 100 full access accounts to The Cochrane Library, and we are currently coordinating a Wikipedian in Residence position with them. That person will teach dozens of Cochrane authors how to incorporate their findings into Wikipedia,” explains Orlowitz.

Those who are familiar with how Wikipedia is edited might balk at the thought of contributing. Won’t they be drawn in to “edit wars”, endless battles with people who don’t believe in evolution or who just enjoy conflict? “There are edit wars,” admits Heilman. “They are not that common though. 99% of articles can be easily edited without problems.”

Orlowitz elaborates on some of the problems that arise. “We have a lot of new editors who don't understand evidence quality.” The medical experts they recruit face a different set of challenges. “One difficulty many experts have is that they wish to reference their own primary sources. Or write about themselves. Both those are frowned upon. We also have some drug and device companies that edit articles in their area of business--we discourage this strongly and it's something we keep an eye on.”

And what about legitimate differences of opinion about as yet unsettled medical theories, facts and treatments?

“Wikipedia 'describes debates rather than engaging in them'. We don't take sides, we just summarize the evidence on all sides--in proportion to the quality and quantity of that evidence,” says Orlowitz. Heilman continues: “For example Cochrane reviews state it is unclear if the risk versus benefits of breast cancer screening are positive or negative. The USPSTF is supportive. We state both.” Wikipedia provides detailed guidelines for evaluating sources and dealing with conflicting evidence.

Another reason academics might hesitate before contributing is the poor reputation Wikipedia has in academic circles. Another initiative, the Wikipedia-journal collaboration, states: "One reason some academics express for not contributing to Wikipedia is that they are unable to get the recognition they require for their current professional position. A number of medical journals have agreed in principle to publishing high quality Wikipedia articles under authors' real names following formal peer review.” A pilot paper, adapted from the Wikipedia article on Dengue Fever, is to be published in the Journal of Open Medicine, with more publications hopefully to come.

The stigma against Wikipedia itself is also decreasing. “The usage stats for the lay public, medical students, junior physicians, and doctors, and pharmacists are just mindbogglingly high. It's in the range of 50-90%, even for clinical professionals. We hear a lot that doctors 'jog their memory' with Wikipedia, or use it as a starting point,” says Orlowitz. One 2013 study found that a third or more of general practitioners, specialists and medical professors had used Wikipedia, with over half of physicians in training accessing it. As more diverse kinds of scientific contributions begin to be recognized, Wikipedia edits may make their way onto CVs.

Open science activists may be disappointed to learn that Wikipedia doesn’t require or even prefer open access sources for its articles. “Our policy simply states that our primary concern is article content, and verifi_ability_. That standard is irrespective of how hard or easy it is to verify,” explains Orlowitz. Both Wikipedians personally support open access, and would welcome efforts to supplement closed access citations with open ones. “If there are multiple sources of equal quality that come to the same conclusions we support using the open source ones,” says Heilman. A new project, the Open Access Signalling project aims to help readers quickly distinguish what sources they’ll be able to access.

So what are the best ways for newcomers to get involved? Heilman stresses that editing articles remains one of the most important tasks of the project. This is especially true of people affiliated with universities. “Ironically, since these folks have access to high quality paywalled sources, one great thing they could do would be to update articles with them. We also could explore affiliating a Wikipedia editor with a university as a Visiting Scholar, so they'd have access to the library's catalogue to improve Wikipedia, in the spirit of research affiliates,” says Orlowitz.

Adds Heilman, “If there are institution who would be willing to donate library accounts to Wikipedia's we would appreciate it. This would require having the Wikipedian register in some manner with the university. There are also a number of us who may be willing / able to speak to Universities that wish to learn more about the place of Wikipedia in Medicine.” The two also speak at conferences and other events.

Wiki Project Med, like Wikipedia itself, is an open community - a “do-ocracy”, as Orlowitz calls it. If you’re interested in learning more, or in getting involved, you can check out their project page, which details their many initiatives, or reach out to Orlowitz or the project as a whole on Twitter (@JakeOrlowitz, @WikiProjectMed) or via email (jorlowitz@gmail.com, wikiprojectmed@gmail.com).

Jan 15, 2014

The APA and Open Data: one step forward, two steps back?

by

Photo of Denny
Boorsboom

I was pleasantly surprised when, last year, I was approached with the request to become Consulting Editor for a new APA journal called Archives of Scientific Psychology. The journal, as advertised on its website upon launch, had a distinct Open Science signature. As its motto said, it was an “Open Methodology, Open Data, Open Access journal”. That’s a lot of openness indeed.

When the journal started, the website not only boosted the Open Access feature of the journal, but went on to say that "[t]he authors have made available for use by others the data that underlie the analyses presented in the paper". This was an incredibly daunting move by APA - or so it seemed. Of course, I happily accepted the position.

After a few months, the first papers in Archives were published. Open Data enthusiast Jelte Wicherts of Tilburg University immediately tried to retrieve data for reanalysis. Then it turned out that the APA holds a quite ideosyncratic definition of the word “open”: upon his request, Wicherts was referred to a website that presented a daunting list of requirements for data-requests to fulfill. That was quite a bit more intimidating than the positive tone struck in the editorial that accompanied the launch of the journal.

This didn’t seem open to me at all. So: I approached the editors and said that I could not subscribe to this procedure, given the fact that the journal is supposed to have open data. The editors then informed me that their choice to implement these procedures was an entirely conscious one, and that they stood by it. Their point of view is articulated in their data sharing guidelines. For instance, "next-users of data must formally agree to offer co-authorship to the generator(s) of the data on any subsequent publications" since "[i]t is the opinion of the Archives editors that designing and conducting the original data collection is a scientific contribution that cannot be exhausted after one use of the data; it resides in the data permanently."

Well, that's not my opinion at all. In fact it's quite directly opposed to virtually everything I think is important about openness in scientific research. So I chose to resign my position.

In October 2013, I learned that Wicherts had taken the initiative of exposing the Archives’ policy in an open letter to the editorial board, in which he says:

“[…] I recently learned that data from empirical articles published in the Archives are not even close to being “open”.

In fact, a request for data published in the Archives involves not only a full-blown review committee but also the filling in and signing of an extensive form: http://www.apa.org/pubs/journals/features/arc-data-access-request-form.pdf

This 15-page form asks for the sending of professional resumes, descriptions of the policies concerning academic integrity at one’s institution, explicit research plans including hypotheses and societal relevance, specification of the types of analyses, full ethics approval of the reanalysis by the IRB, descriptions of the background of the research environment, an indication of the primary source of revenue of one’s institution, dissemination plans of the work to be done with the data, a justification for the data request, manners of storage, types of computers and storage media being used, ways of transmitting data between research team members, whether data will be encrypted, and signatures of institutional heads.

The requester of the data also has to sign that (s)he provides an “Offer [of] co-authorship to the data generators on any subsequent publications” and the (s)he will offer to the review committee an “annual data use report that outlines what has been done, that the investigator remains in compliance with the original research proposal, and provide references of any resulting publications.”

In case of non-compliance of any of these stipulations, the requester can face up to a $10,000 fine as well a future prohibition of data access from work published in the Archives.”

A fine? Seriously? Kafkaesque!

Wicherts also notes that “the guidelines with respect to data sharing in the Archives considerably exceed APA’s Ethical Standard 8.14”. Ethical Standard 8.14 is a default that applies to all APA journals, and says:

“After research results are published, psychologists do not withhold the data on which their conclusions are based from other competent professionals who seek to verify the substantive claims through reanalysis and who intend to use such data only for that purpose, provided that the confidentiality of the participants can be protected and unless legal rights concerning proprietary data preclude their release.”

Since this guideline says nothing about fines and co-authorship requirements, we indeed have to conclude that it’s harder to get data from APA’s open science journal, than it is to get data from its regular journals. Picture that!

In response to my resignation and Wicherts' letter, the editors have taken an interesting course of action. Rather than change their policy such that their deeds match their name, they have changed their name to match their deeds. The journal is now no longer an "Open Methodology, Open Data, Open Access Journal" but an "Open Methodology, Collaborative Data Sharing, Open Access Journal".

The APA and open data. One step forward, two steps back.

Jan 8, 2014

When Open Science is Hard Science

by

When it comes to opening up your work there is, ironically, a bit of a secret. Here it is: being open - in open science, open source software, or any other open community - can be hard. Sometimes it can be harder than being closed.

In an effort to attract more people to the cause, advocates of openness tend to tout its benefits. Said benefits are bountiful: increased collaboration and dissemination of ideas, transparency leading to more frequent error checking, improved reproducibility, easier meta-analysis, and greater diversity in participation, just to name a few.

But there are downsides, too. One of those is that it can be difficult to do your research openly. (Note here that I mean well and openly. Taking the full contents of your hard drive and dumping it on a server somewhere might be technically open, but it’s not much use to anyone.)

How is it hard to open up your work? And why?

Closed means privacy.

In the privacy of my own home, I seldom brush my hair. Sometimes I spend all day in my pajamas. I leave my dirty dishes on the table and eat ice cream straight out of the tub. But when I have visitors, or when I’m going out, I make sure to clean up.

In the privacy of a closed access project, you might take shortcuts. You might recruit participants from your own 101 class, or process your data without carefully documenting which steps you took. You’d never intentionally do something unethical, but you might get sloppy.

Humans are social animals. We try to be more perfect for each other than we do for ourselves. This makes openness better, but it also makes it harder.

Two heads need more explanation than one.

As I mentioned above, taking all your work and throwing it online without organization or documentation is not very helpful. There’s a difference between access and accessibility. To create a truly open project, you need to be willing to explain your research to those trying to understand it.

There are numerous routes towards sharing your work, and the most open projects take more than one. You can create stellar documentation of your project. You can point people towards background material, finding good explanations of the way your research methodology was developed or the math behind your data analysis or how the code that runs your stimulus presentation works. You can design tutorials or trainings for people who want to run your study. You can encourage people to ask questions about the project, and reply publicly. You can make sure to do all the above for people at all levels - laypeople, students, and participants as well as colleagues.

Even closed science is usually collaborative, so hopefully your project is decently well documented. But making it accessible to everyone is a project in itself.

New ideas and tools need to be learned.

As long as closed is the default, we’ll need to learn new skills and tools in the process of becoming open, such as version control, format conversion and database management.

These skills aren’t unique to working openly. And if you have a good network of friends and colleagues, you can lean on them to supplement your own expertise. But the fact remains that “going open” isn’t as easy as flipping a switch. Unless you’re already well-connected and well-informed, you’ll have a lot to learn.

People can be exhausting.

Making your work open often means dealing with other people - and not always the people you want to deal with. There are the people who mean well, but end up confusing, misleading, or offending you. There are the people who don’t mean well at all. There are the discussions that go off in unproductive directions, the conversations that turn into conflicts, the promises that get forgotten.

Other people are both a joy and a frustration, in many areas of life beyond open science. But the nature of openness assures you’ll get your fair share. This is especially true of open science projects that are explicitly trying to build community.

It can be all too easy to overlook this emotional labor, but it’s work - hard work, at that.

There are no guarantees.

For all the effort you put into opening up your research, you may find no one else is willing to engage with it. There are plenty of open source software projects with no forks or new contributors, open science articles that are seldom downloaded or science wikis that remain mostly empty, open government tools or datasets that no one uses.

Open access may increase impact on the whole, but there are no promises for any particular project. It’s a sobering prospect to someone considering opening up their research.

How can we make open science easier?

We can advocate for open science while acknowledging the barriers to achieving it. And we can do our best to lower those barriers:

Forgive imperfections. We need to create an environment where mistakes are routine and failures are expected - only then will researchers feel comfortable exposing their work to widespread review. That’s a tall order in the cutthroat world of academia, but we can begin with our own roles as teachers, mentors, reviewers, and internet commentators. Be a role model: encourage others to review your work and point out your mistakes.

Share your skills as well as your research. Talk about your experiences opening up your research with colleagues. Host lab meetings, department events, and conference panels to discuss the practical difficulties. If a training, website, or individual helped you understand some skill or concept, recommend widely. Talking about the individual steps will help the journey seem less intimidating - and will give others a map for how to get there.

Recognize the hard work of others with words and, if you can, financial support. Organization, documentation, mentorship, community management. These are areas that often get overlooked when it comes to celebrating scientific achievement - and allocating funding. Yet many open science projects would fail without leadership in these areas. Contribute what you can and support others who take on these roles.

Collaborate. Open source advocates have been creating tools to help share the work involved in opening research - there’s Software Carpentry, the Open Science Framework, Sage Bionetworks, and Research Compendia, just to name a few. But beyond sharing tools, we can share time and resources. Not every researcher will have the skillset, experience, or personality to quickly and easily open up their work. Sharing efforts across labs, departments and even schools can lighten the load. So can open science specialists, if we create a scientific culture where these specialists are trained, utilized and valued.

We can and should demand open scientific practices from our colleagues and our institutions. But we can also provide guidelines, tools, resources and sympathy. Open science is hard. Let’s not make it any harder.

Jan 1, 2014

Timeline of Notable Open Science Events in 2013 - Psychology

by

Happy New Year! New Year’s is a great time for reflection and resolution, and when I reflect on 2013, I view it with an air of excitement and promise. As a social psychologist, I celebrated with my many of my colleagues in Washington, DC. at the 25th anniversary of the Association for Psychological Science. There were many celebrations including a ‘80s themed dance night at the Convention. However, this year was also marred by the “Crisis of Confidence” in psychological and broader sciences that has been percolating since the turn of the 21st century. Our timeline begins the year with the Perspectives on Psychological Science’s special issue dedicated to addressing this Crisis. Rather than focusing on the problems, papers in this issue suggested solutions and many of those suggestions emerged as projects in 2013. This timeline focuses on these many Open Science Collaboration successes and initiatives and offers a glimpse at the activity directed at reaching the Scientific Utopia envisioned by so many in the OSC.

Maybe when APS celebrates its 50th Anniversary, it will also mark the 25th Anniversary of the year that the tide turned on the bad practices that had led to the “Crisis of Confidence”. Perhaps in addition to a ‘13 themed dance band playing Lorde’s “Royals” or Imagine Dragon’s “Demons”, maybe there will be a theme reflecting on changing science practices. With the COS celebrating a 25th anniversary of its own, let us share your memory of the important events from 2013.

These posts reflect a limited list of psychology-related events that one person noticed. We invite you to add other notable events that you feel are missing from this list, particularly in other scientific areas. Add a comment below with information about any research projects aimed at replication across institutions or initiatives directed at making science practices more transparent.

View the timeline!

Dec 18, 2013

Researcher Degrees of Freedom in Data Analysis

by

The enormous amount of options available for modern data analysis is both a blessing and a curse. On one hand, researchers have specialized tools for any number of complex questions. On the other hand, we’re also faced with a staggering number of equally-viable choices, many times without any clear-cut guidelines for deciding between them. For instance, I just popped open SPSS statistical software and counted 18 different ways to conduct post-hoc tests for a one-way ANOVA. Some choices are clearly inferior (e.g., the LSD test doesn’t adjust p-values for multiple comparisons) but it’s possible to defend the use of many of the available options. These ambiguous choice points are sometimes referred to as researcher degrees of freedom.

In theory, researcher degrees of freedom shouldn’t be a problem. More choice is better, right? The problem arises from two interconnected issues: (a) Ambiguity as to which statistical test is most appropriate and (b) an incentive system where scientists are rewarded with publications, grants, and career stability when their p-values fall below the revered p < .05 criterion. So, perhaps unsurprisingly, when faced with a host of ambiguous options for data analysis, most people settle on the one that achieves statistically significant results. Simmons, Nelson, and Simonsohn (2011) argue that this undisclosed flexibility in data analysis allows people to present almost any data as “significant,” and calls for 10 simple guidelines for reviewers and authors to disclose in every paper – which, if you haven’t read yet are worth checking out. In this post, I will discuss a few guidelines of my own for conducting data analysis in a way that strives to overcome our inherent tendency to be self-serving.

  1. Make as many data analytic decisions as possible before looking at your data. Review the statistical literature and decide on which statistical test(s) will be best before looking at your collected data. Continue to use those tests until enough evidence emerges to change your mind. The important thing is that you make these decisions before looking at your data. Once you start playing with the actual data, your self-serving biases will start to kick in. Do not underestimate your ability for self-deception: Self-serving biases are powerful, pervasive, and apply to virtually everyone. Consider pre-registering your data analysis plan (perhaps using the Open Science Framework to keep yourself honest and to convince future reviewers that you aren’t exploiting researcher degrees of freedom.

  2. When faced with a situation where there are too many equally viable choices, run a small number of the best choices, and report all of them. In this case, decide on 2-5 different tests ahead of time. Report the results of all choices, and make a tentative conclusion based if the majority of these tests agree. For instance, when determining model fit in structural equation modeling, there many different methods you might use. If you can’t figure out which method is best by reviewing the statistical literature – it’s not entirely clear, statisticians disagree about as often as any other group of scientists – then report the results of all tests, and make a conclusion if they all converge on the same solution. When they disagree, make a tentative conclusion based on the majority of tests that agree (e.g., 2 of 3 tests come to the same conclusion). For the record, I currently use CFI, TLI, RMSEA, and SRMR in my own work, and use these even if other fit indices provide more favorable results.

  3. When deciding on a data analysis plan after you’ve seen the data, keep in mind that most researcher degrees of freedom have minimal impact on strong results. For any number of reasons, you might find yourself deciding on a data analysis plan after you’ve played around with the data for a while. At the end of the day, strong data will not be influenced much by researcher degrees of freedom. For instance, results should look much the same regardless of whether you exclude outliers, transform them, or leave them in the data when you have a study with high statistical power. Simmons et al. (2011) specifically recommend that results should be presented (a) with and without covariates, and (b) with and without specific data points excluded, if any were removed. Again, the general idea is that strong results will not change much when you alter researcher degrees of freedom. Thus, I again recommend analyzing the data in a few different ways and looking for convergence across all methods when you’re developing a data analysis plan after seeing the data. This sets the bar higher to try and combat your natural tendency to report just the one analysis that “works.” When minor data analytic choices drastically change the conclusions, this should be a warning sign that your solution is unstable and the results are probably not trustworthy. The number one reason why you have an unstable solution is probably because you have low statistical power. Since you hopefully had a strict data collection end date, the only viable alternative when results are unstable is to replicate the results in a second, more highly-powered study using the same data analytic approach.

At the end of the day, there is no “quick-fix” for the problem of self-serving biases during data analysis so long as the incentive system continues to reward novel, statistically significant results. However, by using the tips in this article (and elsewhere) researchers can focus on finding strong, replicable results by minimizing the natural human tendency to be self-serving.

References

Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22, 1359-1366. doi:10.1177/0956797611417632

Dec 13, 2013

Chasing Paper, Part 3

by

This is part three of a three part post brainstorming potential improvements to the journal article format. Part one is here, part two is here.

The classic journal article is only readable by domain experts.

Journal articles are currently written for domain experts. While novel concepts or terms are usually explained, there is the assumption of a vast array of background knowledge and jargon is the rule, not the exception. While this leads to quick reading for domain experts, it can make for a difficult slog for everyone else.

Why is this a problem? For one thing, it prevents interdisciplinary collaboration. Researchers will not make a habit of reading outside their field if it takes hours of painstaking, self-directed work to comprehend a single article. It also discourages public engagement. While science writers do admirable work boiling hard concepts down to their comprehensible cores, many non-scientists want to actually read the articles, and get discouraged when they can’t.

While opaque scientific writing exists in every format, technologies present new options to translate and teach. Jargon could be linked to a glossary or other reference material. You could be given a plain english explanation of a term when your mouse hovers over it. Perhaps each article could have multiple versions - for domain experts, other scientists, and for laypeople.

Of course, the ability to write accessibly is a skill not everyone has. Luckily, any given paper would mostly use terminology already introduced in previous papers. If researchers could easily credit the teaching and popularization work done by others, they could acknowledge the value of those contributions while at the same time making their own work accessible.

The classic journal article has no universally-agreed upon standards.

Academic publishing, historically, has been a distributed system. Currently, the top three publishers still account for less than half (42%) of all published articles (McGuigan and Russell, 2008). While certain format and content conventions are shared among publishers, generally speaking it’s difficult to propagate new standards, and even harder to enforce them. Not only do standards vary, they are frequently hidden, with most of the review and editing process taking place behind closed doors.

There are benefits to decentralization, but the drawbacks are clear. Widespread adoption of new standards, such as Simmons et al’s 21 Word Solution or open science practices, depends on the hard work and high status of those advocating for them. How can the article format be changed to better accommodate changing standards, while still retaining individual publishers’ autonomy?

One option might be to create a new section of each journal article, a free-form field where users could record whether an article met this or that standard. Researchers could then independently decide what standards they wanted to pay attention to. While this sounds messy, if properly implemented this feature could be used very much like a search filter, yet would not require the creation or maintenance of a centralized database.

A different approach is already being embraced: an effort to make the standards that currently exist more transparent by bringing peer review out into the open. Open peer review allows readers to view an article’s pre-publication history, including the authorship and content of peer reviews, while public peer review allows the public to participate in the review process. However, these methods have yet to be generally adopted.

*

It’s clear that journal articles are already changing. But they may not be changing fast enough. It may be better to forgo the trappings of the journal article entirely, and seek a new system that more naturally encourages collaboration, curation, and the efficient use of the incredible resources at our disposal. With journal articles commonly costing more than $30 each, some might jump at the chance to leave them behind.

Of course, it’s easy to play “what if” and imagine alternatives; it’s far harder to actually implement them. And not all innovations are improvements. But with over a billion dollars spent on research each day in the United States, with over 25,000 journals in existence, and over a million articles published each year, surely there is room to experiment.

Bibliography

Budd, J.M., Coble, Z.C. and Anderson, K.M. (2011) Retracted Publications in Biomedicine: Cause for Concern.

Wright, K. and McDaid, C. (2011). Reporting of article retractions in bibliographic databases and online journals. J Med Libr Assoc. 2011 April; 99(2): 164–167.

McGuigan, G.S. and Russell, R.D. (2008). The Business of Academic Publishing: A Strategic Analysis of the Academic Journal Publishing Industry and its Impact on the Future of Scholarly Publishing. Electronic Journal of Academic and Special Librarianship. Winter 2008; 9(3).

Simmons, J.P., Nelson, L.D. and Simonsohn, U.A. (2012) A 21 Word Solution.

Dec 12, 2013

Chasing Paper, Part 2

by

This is part two of a three part post brainstorming potential improvements to the journal article format. Part one is here, part three is here here.

The classic journal article format is not easily updated or corrected.

Scientific understanding is constantly changing as phenomena are discovered and mistakes uncovered. The classic journal article, however, is static. When a serious flaw in an article is found, the best a paper-based system can do is issue a retraction, and hope that a reader going through past issues will eventually come across the change.

Surprisingly, retractions and corrections continue to go mostly unnoticed in the digital era. Studies have shown that retracted papers go on to receive, on average, more than 10 post-retraction citations, with less than 10% of those citations acknowledging the retraction (Budd et al, 2011). Why is this happening? While many article databases such as PubMed provide retraction notices, the articles themselves are often not amended. Readers accessing papers directly from publishers’ websites, or from previously saved copies, can sometimes miss it. A case study of 18 retracted articles found several which they classified as “high risk of missing [the] notice”, with no notice given in the text of the pdf or html copies themselves (Wright et al, 2011). It seems likely that corrections have even more difficulty being seen and acknowledged by subsequent researchers.

There are several technological solutions which can be tried. One promising avenue would be the adoption of version control. Also called revision control, this is a way of tracking all changes made to a project. This technology has been used for decades in computer science and is becoming more and more popular - Wikipedia and Google Docs, for instance, both use version control. Citations for a paper could reference the version of the paper then available, but subsequent readers would be notified that a more recent version could be viewed. In addition to making it easy to see how articles have been changed, adopting such a system would acknowledge the frequency of retractions and corrections and the need to check for up to date information.

Another potential tool would be an alert system. When changes are made to an article, the authors of all articles which cite it could be notified. However, this would require the maintenance of up-to-date contact information for authors, and the adoption of communications standards across publishers (something that has been accomplished before with initiatives like CrossRef). A more transformative approach would be to view papers not as static documents but as ongoing projects that can be updated and contributed to over time. Projects could be tracked through version control from their very inception, allowing for a kind of pre-registration. Replications and new analyses could be added to the project as they’re completed. The most insightful questions and critiques from the public could lead to changes in new versions of the article.

The classic journal article only recognizes certain kinds of contributions.

When journal articles were first developed in the 1600s, the idea of crediting an author or authors must have seemed straightforward. After all, most research was being done by individuals or very small groups, and there were no such things as curriculum vitae or tenure committees. Over time, academic authorship has become the single most important factor in determining career success for individual scientists. The limitations of authorship can therefore have an incredible impact on scientific progress.

There are two major problems with authorship as it currently functions, and they are sides of the same coin. Authorship does not tell you what, precisely, each author did on a paper. And authorship does not tell you who, precisely, is responsible for each part of a paper. Currently, the authorship model provides only a vague idea of who is responsible for a paper. While this is sometimes elaborated upon briefly in the footnotes, or mentioned in the article, more often readers employ simple heuristics. In psychology, the first author is believed to have led the work, the last author to have provided physical and conceptual resources for the experiment, and any middle authors to have contributed in an unknown but significant way. This is obviously not an ideal way to credit people, and often leads to disputes, with first authorship sometimes misattributed. It has grown increasingly impractical as multiauthor papers have become more and more common. What does authorship on a 500-author paper even mean?

The situation is even worse for people whose contributions are not awarded with authorship. While contributions may be mentioned in the acknowledgements or cited in the body of the paper, neither of these have much impact when scientists are applying for jobs or up for tenure. This gives them little motivation to do work which will not be recognized with authorship. And such work is greatly needed. The development of tools, the collection and release of open data sets, the creation of popularizations and teaching materials, and the deep and thorough review of others’ work - these are all done as favors or side projects, even though they are vital to the progress of research. How can new technologies address these problems? There have been few changes made in this area, perhaps due to the heavy weight of authorship in scientific life, although there are some tools like Figshare which allow users to share non-traditional materials such as datasets and posters in citable (and therefore creditable) form. A more transformative change might be to use the version control system mentioned above. Instead of tracking changes to the article from publishing onwards, it could follow the article from its beginning stages. In that way, each change could be attributed to a specific person.

Another option might simply be to describe contributions in more detail. Currently if I use your methodology wholesale, or briefly mention a finding of yours, I acknowledge you in the same way - a citation. What if, instead, all significant contributions were listed? Although space is not a constraint with digital articles, the human attention span remains limited, and so it might be useful to create common categories for contribution, such as reviewing the article, providing materials, doing analyses, or coming up with an explanation for discussion.

There are two other problems are worth mentioning in brief. First, the phenomenon of ghost authorship, where substantial contributions to the running of a study or preparation of a manuscript go unacknowledged. This is frequently done in industry-sponsored research to hide conflicts of interest. If journal articles used a format where every contribution was tracked, ghost authorship would be impossible. Another issue is the assignment of contact authors, the researchers on a paper who readers are invited to direct questions to. Contact information can become outdated fairly quickly, causing access to data and materials to be lost; if contact information can be changed, or responsibility passed on to a new person, such loss can be prevented.

Dec 11, 2013

Chasing Paper, Part 1

by

This is part one of a three part post. Parts two and three have now been posted.

The academic paper is old - older than the steam engine, the pocket watch, the piano, and the light bulb. The first journal, Philosophical Transactions, was published on March 6th, 1665. Now that doesn’t mean that the journal article format is obsolete - many inventions much older are still in wide use today. But after a third of a millennium, it’s only natural that the format needs some serious updating.

When brainstorming changes, it may be useful to think of the limitations of ink and paper. From there, we can consider how new technologies can improve or even transform the journal article. Some of these changes have already been widely adopted, while others have never even been debated. Some are adaptive, using the greater storage capacity of computing to extend the functions of the classic journal article, while others are transformative, creating new functions and features only available in the 21st century.

The ideas below are suggestions, not recommendations - it may be that some aspects of the journal article format are better left alone. But we all benefit from challenging our assumptions about what an article is and ought to be.

The classic journal article format cannot convey the full range of information associated with an experiment.

Until the rise of modern computing, there was simply no way for researchers to share all the data they collected in their experiments. Researchers were forced to summarize: to gloss over the details of their methods and the reasoning behind their decisions and, of course, to provide statistical analyses in the place of raw data. While fields like particle physics and genetics continue to push the limits of memory, most experimenters now have the technical capacity to share all of their data.

Many journals have taken to publishing supplemental materials, although this rarely encompasses the entirety of data collected, or enough methodological detail to allow for independent replication. There are plenty of explanations for this slow adoption, including ethical considerations around human subjects data, the potential to patent methods, or the cost to journals of hosting this extra materials. But these are obstacles to address, not reasons to give up. The potential benefits are enormous: What if every published paper contained enough methodological detail that it could be independently replicated? What if every paper contained enough raw data that it could be included in meta-analysis? How much of meta-scientific work is never undertaken, because it's dependent on getting dozens or hundreds of contact authors to return your emails, and on universities to properly store data and materials?

Providing supplemental material, no matter how extensive, is still an adaptive change. What might a transformative change look like? Elsevier’s Article of the Future project attempts to answer that question with new, experimental formats that include videos, interactive models, and infographics. These designs are just the beginning. What if articles allowed readers to actually interact with the data and perform their own analyses? Virtual environments could be set up, lowering the barrier to independent verification of results. What if authors reported when they made questionable methodological decisions, and allowed readers, where possible, to see the results when a variable was not controlled for, or a sample was not excluded?

The classic journal article format is difficult to organize, index or search.

New technology has already transformed the way we search the scientific literature. Where before researchers were reliant on catalogues and indexes from publishers, and used abstracts to guess at relevance, databases such as PubMed and Google Scholar allow us to find all mentions of a term, tool, or phenomena across vast swathes of articles. While searching databases is itself a skill, its one that allows us to search comprehensively and efficiently, and gives us more opportunities to explore.

Yet old issues of organization and curation remain. Indexes used to speed the slow process of skimming through physical papers. Now they’re needed to help researchers sort through the abundance of articles constantly being published. With tens of millions of journal articles out there, how can we be sure we’re really accessing all the relevant literature? How can we compare and synthesize the thousands of results one might get on a given search?

Special kinds of articles - reviews and meta-analyses - have traditionally helped us synthesize and curate information. As discussed above, new technologies can help make meta-analyses more common by making it easier for researchers to access information about past studies. We can further improve the search experience by creating more detailed metadata. Metadata, in this context, is the information attached to an article which lets us categorize it without having to read the article itself. Currently, fields like title, author, date, and journal are quite common in databases. More complicated fields less often adopted, but you can find metadata on study type, population, level of clinical trial (where applicable), and so forth. What would truly comprehensive metadata look like? Is it possible to store the details of experimental structure or analysis in machine-readable format - and is that even desirable?

What happens when we reconsider not the metadata but the content itself? Most articles are structurally complex, containing literature reviews, methodological information, data, and analysis. Perhaps we might be better served by breaking those articles down into their constituent parts. What if methods, data, analysis were always published separately, creating a network of papers that were linked but discrete? Would that be easier or harder to organize? It may be that what we need here is not a better kind of journal article, but a new way of curating research entirely.

Dec 9, 2013

New “Reviewer Statement” Initiative Aims to (Further) Improve Community Norms Toward Disclosure

by

Photo of Etienne LeBel

An Open Science Collaboration -- made up of Uri Simonsohn, Etienne LeBel, Don Moore, Leif D. Nelson, Brian Nosek, and Joe Simmons -- is glad to announce a new initiative aiming to improve community norms toward the disclosure of basic methodological information during the peer-review process. Endorsed by the Center for Open Science, the initiative involves a standard reviewer statement that any peer reviewer can include in their review requesting that authors add a statement to the paper confirming that they have disclosed all data exclusions, experimental conditions, assessed measures, and how they determined their samples sizes (following from the 21-word solution; Simmons, Nelson, & Simonsohn, 2012, 2013; see also PsychDisclosure.org; LeBel et al., 2013). Here is the statement, which is available on the Open Science Framework:

"I request that the authors add a statement to the paper confirming whether, for all experiments, they have reported all measures, conditions, data exclusions, and how they determined their sample sizes. The authors should, of course, add any additional text to ensure the statement is accurate. This is the standard reviewer disclosure request endorsed by the Center for Open Science (see http://osf.io/project/hadz3). I include it in every review."

The idea originated from the realization that as peer reviewers, we typically lack fundamental information regarding how the data was collected and analyzed which prevents us from be able to properly evaluate the claims made in a submitted manuscript. Some reviewers interested in requesting such information, however, were concerned that such requests would make them appear selective and/or compromise their anonymity. Discussions ensued and contributors developed a standard reviewer disclosure request statement that overcomes these concerns and allows the community of reviewers to improve community norms toward the disclosure of such methodological information across all journals and articles.

Some of the contributors, including myself, were hoping for a reviewer statement with a bit more teeth. For instance, requesting the disclosure of such information as a requirement before accepting to review an article or requiring the re-review of a revised manuscript once the requested information has been disclosed. The team of contributors, however, ultimately decided that it would be better to start small to get acceptance, in order to maximize the probability that the initiative has an impact in shaping the community norms.

Hence, next time you are invited to review a manuscript for publication at any journal, please remember to include the reviewer disclosure statement!

References

LeBel, E. P., Borsboom, D., Giner-Sorolla, R., Hasselman, F., Peters, K. R., Ratliff, K. A., & Smith, C. T. (2013). PsychDisclosure.org: Grassroots support for reforming reporting standards in psychology. Perspectives on Psychological Science, 8(4), 424-432. doi: 10.1177/1745691613491437

Simmons J., Nelson L. & Simonsohn U. (2011) False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allow Presenting Anything as Significant. Psychological Science, 22(11), 1359-1366.

Simmons J., Nelson L. & Simonsohn U. (2012) A 21 Word Solution. Dialogue: The Official Newsletter of the Society for Personality and Social Psychology, 26(2), 4-7.

Nov 27, 2013

The State of Open Access

by

To celebrate Open Access Week last month, we asked people four questions about the state of open access and how it's changing. Here are some in depth answers from two people working on open access: Peter Suber, Director of the Harvard Office for Scholarly Communication and the Harvard Open Access Project, and Elizabeth Silva, associate editor at the Public Library of Science (PLOS).

How is your work relevant to the changing landscape of Open Access? What would be a successful outcome of your work in this area?

Elizabeth: PLOS is now synonymous with open access publishing, so it’s hard to believe that 10 years ago, when PLOS was founded, most researchers were not even aware that availability of research was a problem. We all published our best research in the best journals. We assumed our colleagues could access it, and we weren’t aware of (or didn’t recognize the problem with) the inability of people outside of the ivory tower to see this work. At that time it was apparent to the founders of PLOS, who were among the few researchers who recognized the problem, that the best way to convince researchers to publish open access would be for PLOS to become an open access publisher, and prove that OA could be a viable business model and an attractive publishing venue at the same time. I think that we can safely say that the founders of PLOS succeeded in this mission, and they did it decisively.

We’re now at an exciting time, where open access in the natural sciences is all but inevitable. We now get to work on new challenges, trying to solve other issues in research communication.

Peter: My current job has two parts. I direct the Harvard Office for Scholarly Communication (OSC), and I direct the Harvard Open Access Project (HOAP). The OSC aims to provide OA to research done at Harvard University. We implement Harvard's OA policies and maintain its OA repository. We focus on peer-reviewed articles by faculty, but are expanding to other categories of research and researchers. In my HOAP work, I consult pro bono with universities, scholarly societies, publishers, funding agencies, and governments, to help them adopt effective OA policies. HOAP also maintains a guide to good practices for university OA policies, manages the Open Access Tracking Project, writes reference pages on federal OA-related legislation, such as FASTR, and makes regular contributions to the Open Access Directory and the catalog of OA journals from society publishers.

To me success would be making OA the default for new research in every field and language. However, this kind of success more like a new plateau than a finish line. We often focus on the goal of OA itself, or the goal of removing access barriers to knowledge. But that's merely a precondition for an exciting range of new possibilities for making use of that knowledge. In that sense, OA is closer to the minimum than the maximum of how to take advantage of the internet for improving research. Once OA is the default for new research, we can give less energy to attaining it and more energy to reaping the benefits, for example, integrating OA texts with open data, improving the methods of meta-analysis and reproducibility, and building better tools for knowledge extraction, text and data mining, question answering, reference linking, impact measurement, current awareness, search, summary, translation, organization, and recommendation.

From the researcher's side, making OA the new default means that essentially all the new work they write, and essentially all the new work they want to read, will be OA. From the publisher's side, making OA the new default means that sustainability cannot depend on access barriers that subtract value, and must depend on creative ways to add value to research that is already and irrevocably OA.

How do you think the lack of Open Access is currently impacting how science is practiced?

Peter: The lack of OA slows down research. It distorts inquiry by making the retrievability of research a function of publisher prices and library budgets rather than author consent and internet connectivity. It hides results that happen to sit in journals that exceed the affordability threshold for you or your institution. It limits the correction of scientific error by limiting the number of eyeballs that can examine new results. It prevents the use of text and data mining to supplement human analysis with machine analysis. It hinders the reproducibility of research by excluding many who would want to reproduce it. At the same time, and ironically, it increases the inefficient duplication of research by scholars who don't realize that certain experiments have already been done.

It prevents journalists from reading the latest developments, reporting on them, and providing direct, usable links for interested readers. It prevents unaffiliated scholars and the lay public from reading new work in which they may have an interest, especially in the humanities and medicine. It blocks research-driven industries from creating jobs, products, and innovations. It prevents taxpayers from maximizing the return on their enormous investment in publicly-funded research.

I assume we're talking about research that authors publish voluntarily, as opposed to notes, emails, and unfinished manuscripts, and I assume we're talking about research that authors write without expectation of revenue. If so, then the lack of OA harms research and researchers without qualification. The lack of OA benefits no one except conventional publishers who want to own it, sell it, and limit the audience to paying customers.

Elizabeth: There is a prevailing idea that those that need access to the literature already have it; that those that have the ability to understand the content are at institutions that can afford the subscriptions. First, this ignores the needs of physicians, educators, science communicators, and smaller institutions and companies. More fundamentally, limiting access to knowledge, so that rests in the hands of an elite 1%, is archaic, backwards, and counterproductive. There has never been a greater urgency to find solutions to problems that fundamentally threaten human existence – climate change, disease transmission, food security – and in the face of this why would we advocate limited dissemination of knowledge? Full adoption of open access has the potential to fundamentally change the pace of scientific progress, as we make this information available to everyone, worldwide.

When it comes to issues of reproducibility, fraud or misreporting, all journals face similar issues regardless of the business model. Researchers design their experiments and collect their data long before they decide the publishing venue, and the quality of the reporting likely won’t change based on whether the venue is OA. I think that these issues are better tackled by requirements for open data and improved reporting. Of course these philosophies are certainly intrinsically linked – improved transparency and access can only improve matters.

What do you think is the biggest reason that people resist Open Access? Do you think there are good reasons for not making a paper open access?

Elizabeth: Of course there are many publishers who resist open access, which reflects a need to protect established revenue streams. In addition to large commercial publishers, there are a lot of scholarly societies whose primary sources of income are the subscriptions for the journals they publish.

Resistance from authors, in my experience, comes principally in two forms. The first is linked to the impact factor, rather than the business model. Researchers are stuck in a paradigm that requires them to publish as ‘high’ as possible to achieve career advancement. While there are plenty of high impact OA publications with which people choose to publish, it just so happens that the highest are subscription journals. We know that open access increases utility, visibility and impact of individual pieces of research, but the fallacy that a high impact journal is equivalent to high impact research persists.

The second reason cited is that the cost is prohibitory. This is a problem everyone at PLOS can really appreciate, and we very much sympathize with authors who do not have the money in their budget to pay author publication charges (APCs). However, it’s a problem that should really be a lot easier to overcome. If research institutions were to pay publication fees, rather than subscription fees, they would save a fortune; a few institutions have realized this and are paying the APCs for authors who choose to go OA. It would also help if funders could recognize publishing as an intrinsic part of the research, folding the APC into the grant. We are also moving the technology forward in an effort to reduce costs, so that savings can be passed onto authors. PLOS ONE has been around for nearly 7 years, and the fees have not changed. This reflects efforts to keep costs as low as we can. Ironically, the biggest of the pay-walled journals already charge authors to publish: for example, it can be between $500 and $1000 for the first color figure, and a few hundred for each additional one; on top of this there are page charges and reprint costs. Not only is the public paying for the research and the subscription, they are paying for papers that they can’t read.

Peter: There are no good reasons for not making a paper OA, or at least for not wanting to.

There are sometimes reasons not to publish in an OA journal. For example, the best journals in your field may not be OA. Your promotion and tenure committee may give you artificial incentives to limit yourself to a certain list of journals. Or the best OA journals in your field may charge publication fees which your funder or employer will not pay on your behalf. However, in those cases you can publish in a non-OA journal and deposit the peer-reviewed manuscript in an OA repository.

The resistance of non-OA publishers is easier to grasp. But if we're talking about publishing scholars, not publishers, then the largest cause of resistance by far is misunderstanding. Far too many researchers still accept false assumptions about OA, such as these 10:

--that the only way to make an article OA is to publish it in an OA journal --that all or most OA journals charge publication fees --that all or most publication fees are paid by authors out of pocket --that all or most OA journals are not peer reviewed --that peer-reviewed OA journals cannot use the same standards and even the same people as the best non-OA journals --that publishing in a non-OA journal closes the door on lawfully making the same article OA --that making work OA makes it harder rather than easier to find --that making work OA limits rather than enhances author rights over it --that OA mandates are about submitting new work to OA journals rather than depositing it in OA repositories, or --that everyone who needs access already has access.

In a recent article in The Guardian I corrected six of the most widespread and harmful myths about OA. In a 2009 article, I corrected 25. And in my 2012 book, I tried to take on the whole legendarium.

How has the Open Access movement changed in the last five years? How do you think it will change in the next five years?

Peter: OA has been making unmistakable progress for more than 20 years. Five years ago we were not in a qualitatively different place. We were just a bit further down the slope from where we are today.

Over the next five years, I expect more than just another five years' worth of progress as usual. I expect five years' worth of progress toward the kind of success I described in my answer to your first question. In fact, insofar as progress tends to add cooperating players and remove or convert resisting players, I expect five years' worth of compound interest and acceleration.

In some fields, like particle physics, OA is already the default. In the next five years we'll see this new reality move at an uneven rate across the research landscape. Every year more and more researchers will be able to stop struggling for access against needless legal, financial, and technical barriers. Every year, those still struggling will have the benefit of a widening circle of precedents, allies, tools, policies, best practices, accommodating publishers, and alternatives to publishers.

Green OA mandates are spreading among universities. They're also spreading among funding agencies, for example, in the US, the EU, and global south. This trend will definitely continue, especially with the support it has received from Global Research Council, Science Europe, the G8 Science Ministers, and the World Bank.

With the exception of the UK and the Netherlands, countries adopting new OA policies are learning from the experience of their predecessors and starting with green. I've argued in many places that mandating gold OA is a mistake. But it's a mistake mainly for historical reasons, and historical circumstances will change. Gold OA mandates are foolish today in part because too few journals are OA, and there's no reason to limit the freedom of authors to publish in the journals of their choice. But the percentage of peer-reviewed journals that are OA is growing and will continue to grow. (Today it's about 30%.) Gold OA mandates are also foolish today because gold OA is much more expensive than green OA, and there's no reason to compromise the public interest in order to guarantee revenue for non-adaptive publishers. But the costs of OA journals will decline, as the growing number of OA journals compete for authors, and the money to pay for OA journals will grow as libraries redirect money from conventional journals to OA.

We'll see a rise in policies linking deposit in repositories with research assessment, promotion, and tenure. These policies were pioneered by the University of Liege, and since adopted at institutions in nine countries, and recommended by the Budapest Open Access Initiative, the UK House of Commons Select Committee on Business, Innovation and Skills, and the Mediterranean Open Access Network. Most recently, this kind of policy has been proposed at the national level by the Higher Education Funding Council for England. If it's adopted, it will mitigate the damage of a gold-first policy in the UK. A similar possibility has been suggested for the Netherlands.

I expect we'll see OA in the humanities start to catch up with OA in the sciences, and OA for books start to catch up with OA for articles. But in both cases, the pace of progress has already picked up significantly, and so has the number of people eager to see these two kinds of progress accelerate.

The recent decision that Google's book scanning is fair use means that a much larger swath of print literature will be digitized, if not in every country, then at least in the US, and if not for OA, then at least for searching. This won't open the doors to vaults that have been closed, but it will open windows to help us see what is inside.

Finally, I expect to see evolution in the genres or containers of research. Like most people, I'm accustomed to the genres I grew up with. I love articles and books, both as a reader and author. But they have limitations that we can overcome, and we don't have to drop them to enhance them or to create post-articles and post-books alongside them. The low barriers to digital experimentation mean that we can try out new breeds until we find some that carry more advantages than disadvantages for specific purposes. Last year I sketched out one idea along these lines, which I call an evidence rack, but it's only one in an indefinitely large space constrained only by the limits on our imagination.

Elizabeth: It’s starting to feel like universal open access is no longer “if” but “when”. In the next five years we will see funders and institutions recognize the importance of access and adopt policies that mandate and financially support OA; resistance will fade away, and it will simply be the way research is published. As that happens, I think the OA movement will shift towards tackling other issues in research communication: providing better measures of impact in the form of article level metrics, decreasing the time to publication, and improving reproducibility and utility of research.

← Previous Next → Page 5 of 6