Dec 13, 2013

Chasing Paper, Part 3

by

This is part three of a three part post brainstorming potential improvements to the journal article format. Part one is here, part two is here.

The classic journal article is only readable by domain experts.

Journal articles are currently written for domain experts. While novel concepts or terms are usually explained, there is the assumption of a vast array of background knowledge and jargon is the rule, not the exception. While this leads to quick reading for domain experts, it can make for a difficult slog for everyone else.

Why is this a problem? For one thing, it prevents interdisciplinary collaboration. Researchers will not make a habit of reading outside their field if it takes hours of painstaking, self-directed work to comprehend a single article. It also discourages public engagement. While science writers do admirable work boiling hard concepts down to their comprehensible cores, many non-scientists want to actually read the articles, and get discouraged when they can’t.

While opaque scientific writing exists in every format, technologies present new options to translate and teach. Jargon could be linked to a glossary or other reference material. You could be given a plain english explanation of a term when your mouse hovers over it. Perhaps each article could have multiple versions - for domain experts, other scientists, and for laypeople.

Of course, the ability to write accessibly is a skill not everyone has. Luckily, any given paper would mostly use terminology already introduced in previous papers. If researchers could easily credit the teaching and popularization work done by others, they could acknowledge the value of those contributions while at the same time making their own work accessible.

The classic journal article has no universally-agreed upon standards.

Academic publishing, historically, has been a distributed system. Currently, the top three publishers still account for less than half (42%) of all published articles (McGuigan and Russell, 2008). While certain format and content conventions are shared among publishers, generally speaking it’s difficult to propagate new standards, and even harder to enforce them. Not only do standards vary, they are frequently hidden, with most of the review and editing process taking place behind closed doors.

There are benefits to decentralization, but the drawbacks are clear. Widespread adoption of new standards, such as Simmons et al’s 21 Word Solution or open science practices, depends on the hard work and high status of those advocating for them. How can the article format be changed to better accommodate changing standards, while still retaining individual publishers’ autonomy?

One option might be to create a new section of each journal article, a free-form field where users could record whether an article met this or that standard. Researchers could then independently decide what standards they wanted to pay attention to. While this sounds messy, if properly implemented this feature could be used very much like a search filter, yet would not require the creation or maintenance of a centralized database.

A different approach is already being embraced: an effort to make the standards that currently exist more transparent by bringing peer review out into the open. Open peer review allows readers to view an article’s pre-publication history, including the authorship and content of peer reviews, while public peer review allows the public to participate in the review process. However, these methods have yet to be generally adopted.

*

It’s clear that journal articles are already changing. But they may not be changing fast enough. It may be better to forgo the trappings of the journal article entirely, and seek a new system that more naturally encourages collaboration, curation, and the efficient use of the incredible resources at our disposal. With journal articles commonly costing more than $30 each, some might jump at the chance to leave them behind.

Of course, it’s easy to play “what if” and imagine alternatives; it’s far harder to actually implement them. And not all innovations are improvements. But with over a billion dollars spent on research each day in the United States, with over 25,000 journals in existence, and over a million articles published each year, surely there is room to experiment.

Bibliography

Budd, J.M., Coble, Z.C. and Anderson, K.M. (2011) Retracted Publications in Biomedicine: Cause for Concern.

Wright, K. and McDaid, C. (2011). Reporting of article retractions in bibliographic databases and online journals. J Med Libr Assoc. 2011 April; 99(2): 164–167.

McGuigan, G.S. and Russell, R.D. (2008). The Business of Academic Publishing: A Strategic Analysis of the Academic Journal Publishing Industry and its Impact on the Future of Scholarly Publishing. Electronic Journal of Academic and Special Librarianship. Winter 2008; 9(3).

Simmons, J.P., Nelson, L.D. and Simonsohn, U.A. (2012) A 21 Word Solution.

Dec 12, 2013

Chasing Paper, Part 2

by

This is part two of a three part post brainstorming potential improvements to the journal article format. Part one is here, part three is here here.

The classic journal article format is not easily updated or corrected.

Scientific understanding is constantly changing as phenomena are discovered and mistakes uncovered. The classic journal article, however, is static. When a serious flaw in an article is found, the best a paper-based system can do is issue a retraction, and hope that a reader going through past issues will eventually come across the change.

Surprisingly, retractions and corrections continue to go mostly unnoticed in the digital era. Studies have shown that retracted papers go on to receive, on average, more than 10 post-retraction citations, with less than 10% of those citations acknowledging the retraction (Budd et al, 2011). Why is this happening? While many article databases such as PubMed provide retraction notices, the articles themselves are often not amended. Readers accessing papers directly from publishers’ websites, or from previously saved copies, can sometimes miss it. A case study of 18 retracted articles found several which they classified as “high risk of missing [the] notice”, with no notice given in the text of the pdf or html copies themselves (Wright et al, 2011). It seems likely that corrections have even more difficulty being seen and acknowledged by subsequent researchers.

There are several technological solutions which can be tried. One promising avenue would be the adoption of version control. Also called revision control, this is a way of tracking all changes made to a project. This technology has been used for decades in computer science and is becoming more and more popular - Wikipedia and Google Docs, for instance, both use version control. Citations for a paper could reference the version of the paper then available, but subsequent readers would be notified that a more recent version could be viewed. In addition to making it easy to see how articles have been changed, adopting such a system would acknowledge the frequency of retractions and corrections and the need to check for up to date information.

Another potential tool would be an alert system. When changes are made to an article, the authors of all articles which cite it could be notified. However, this would require the maintenance of up-to-date contact information for authors, and the adoption of communications standards across publishers (something that has been accomplished before with initiatives like CrossRef). A more transformative approach would be to view papers not as static documents but as ongoing projects that can be updated and contributed to over time. Projects could be tracked through version control from their very inception, allowing for a kind of pre-registration. Replications and new analyses could be added to the project as they’re completed. The most insightful questions and critiques from the public could lead to changes in new versions of the article.

The classic journal article only recognizes certain kinds of contributions.

When journal articles were first developed in the 1600s, the idea of crediting an author or authors must have seemed straightforward. After all, most research was being done by individuals or very small groups, and there were no such things as curriculum vitae or tenure committees. Over time, academic authorship has become the single most important factor in determining career success for individual scientists. The limitations of authorship can therefore have an incredible impact on scientific progress.

There are two major problems with authorship as it currently functions, and they are sides of the same coin. Authorship does not tell you what, precisely, each author did on a paper. And authorship does not tell you who, precisely, is responsible for each part of a paper. Currently, the authorship model provides only a vague idea of who is responsible for a paper. While this is sometimes elaborated upon briefly in the footnotes, or mentioned in the article, more often readers employ simple heuristics. In psychology, the first author is believed to have led the work, the last author to have provided physical and conceptual resources for the experiment, and any middle authors to have contributed in an unknown but significant way. This is obviously not an ideal way to credit people, and often leads to disputes, with first authorship sometimes misattributed. It has grown increasingly impractical as multiauthor papers have become more and more common. What does authorship on a 500-author paper even mean?

The situation is even worse for people whose contributions are not awarded with authorship. While contributions may be mentioned in the acknowledgements or cited in the body of the paper, neither of these have much impact when scientists are applying for jobs or up for tenure. This gives them little motivation to do work which will not be recognized with authorship. And such work is greatly needed. The development of tools, the collection and release of open data sets, the creation of popularizations and teaching materials, and the deep and thorough review of others’ work - these are all done as favors or side projects, even though they are vital to the progress of research. How can new technologies address these problems? There have been few changes made in this area, perhaps due to the heavy weight of authorship in scientific life, although there are some tools like Figshare which allow users to share non-traditional materials such as datasets and posters in citable (and therefore creditable) form. A more transformative change might be to use the version control system mentioned above. Instead of tracking changes to the article from publishing onwards, it could follow the article from its beginning stages. In that way, each change could be attributed to a specific person.

Another option might simply be to describe contributions in more detail. Currently if I use your methodology wholesale, or briefly mention a finding of yours, I acknowledge you in the same way - a citation. What if, instead, all significant contributions were listed? Although space is not a constraint with digital articles, the human attention span remains limited, and so it might be useful to create common categories for contribution, such as reviewing the article, providing materials, doing analyses, or coming up with an explanation for discussion.

There are two other problems are worth mentioning in brief. First, the phenomenon of ghost authorship, where substantial contributions to the running of a study or preparation of a manuscript go unacknowledged. This is frequently done in industry-sponsored research to hide conflicts of interest. If journal articles used a format where every contribution was tracked, ghost authorship would be impossible. Another issue is the assignment of contact authors, the researchers on a paper who readers are invited to direct questions to. Contact information can become outdated fairly quickly, causing access to data and materials to be lost; if contact information can be changed, or responsibility passed on to a new person, such loss can be prevented.

Dec 11, 2013

Chasing Paper, Part 1

by

This is part one of a three part post. Parts two and three have now been posted.

The academic paper is old - older than the steam engine, the pocket watch, the piano, and the light bulb. The first journal, Philosophical Transactions, was published on March 6th, 1665. Now that doesn’t mean that the journal article format is obsolete - many inventions much older are still in wide use today. But after a third of a millennium, it’s only natural that the format needs some serious updating.

When brainstorming changes, it may be useful to think of the limitations of ink and paper. From there, we can consider how new technologies can improve or even transform the journal article. Some of these changes have already been widely adopted, while others have never even been debated. Some are adaptive, using the greater storage capacity of computing to extend the functions of the classic journal article, while others are transformative, creating new functions and features only available in the 21st century.

The ideas below are suggestions, not recommendations - it may be that some aspects of the journal article format are better left alone. But we all benefit from challenging our assumptions about what an article is and ought to be.

The classic journal article format cannot convey the full range of information associated with an experiment.

Until the rise of modern computing, there was simply no way for researchers to share all the data they collected in their experiments. Researchers were forced to summarize: to gloss over the details of their methods and the reasoning behind their decisions and, of course, to provide statistical analyses in the place of raw data. While fields like particle physics and genetics continue to push the limits of memory, most experimenters now have the technical capacity to share all of their data.

Many journals have taken to publishing supplemental materials, although this rarely encompasses the entirety of data collected, or enough methodological detail to allow for independent replication. There are plenty of explanations for this slow adoption, including ethical considerations around human subjects data, the potential to patent methods, or the cost to journals of hosting this extra materials. But these are obstacles to address, not reasons to give up. The potential benefits are enormous: What if every published paper contained enough methodological detail that it could be independently replicated? What if every paper contained enough raw data that it could be included in meta-analysis? How much of meta-scientific work is never undertaken, because it's dependent on getting dozens or hundreds of contact authors to return your emails, and on universities to properly store data and materials?

Providing supplemental material, no matter how extensive, is still an adaptive change. What might a transformative change look like? Elsevier’s Article of the Future project attempts to answer that question with new, experimental formats that include videos, interactive models, and infographics. These designs are just the beginning. What if articles allowed readers to actually interact with the data and perform their own analyses? Virtual environments could be set up, lowering the barrier to independent verification of results. What if authors reported when they made questionable methodological decisions, and allowed readers, where possible, to see the results when a variable was not controlled for, or a sample was not excluded?

The classic journal article format is difficult to organize, index or search.

New technology has already transformed the way we search the scientific literature. Where before researchers were reliant on catalogues and indexes from publishers, and used abstracts to guess at relevance, databases such as PubMed and Google Scholar allow us to find all mentions of a term, tool, or phenomena across vast swathes of articles. While searching databases is itself a skill, its one that allows us to search comprehensively and efficiently, and gives us more opportunities to explore.

Yet old issues of organization and curation remain. Indexes used to speed the slow process of skimming through physical papers. Now they’re needed to help researchers sort through the abundance of articles constantly being published. With tens of millions of journal articles out there, how can we be sure we’re really accessing all the relevant literature? How can we compare and synthesize the thousands of results one might get on a given search?

Special kinds of articles - reviews and meta-analyses - have traditionally helped us synthesize and curate information. As discussed above, new technologies can help make meta-analyses more common by making it easier for researchers to access information about past studies. We can further improve the search experience by creating more detailed metadata. Metadata, in this context, is the information attached to an article which lets us categorize it without having to read the article itself. Currently, fields like title, author, date, and journal are quite common in databases. More complicated fields less often adopted, but you can find metadata on study type, population, level of clinical trial (where applicable), and so forth. What would truly comprehensive metadata look like? Is it possible to store the details of experimental structure or analysis in machine-readable format - and is that even desirable?

What happens when we reconsider not the metadata but the content itself? Most articles are structurally complex, containing literature reviews, methodological information, data, and analysis. Perhaps we might be better served by breaking those articles down into their constituent parts. What if methods, data, analysis were always published separately, creating a network of papers that were linked but discrete? Would that be easier or harder to organize? It may be that what we need here is not a better kind of journal article, but a new way of curating research entirely.