Dec 11, 2013

Chasing Paper, Part 1


This is part one of a three part post. Parts two and three have now been posted.

The academic paper is old - older than the steam engine, the pocket watch, the piano, and the light bulb. The first journal, Philosophical Transactions, was published on March 6th, 1665. Now that doesn’t mean that the journal article format is obsolete - many inventions much older are still in wide use today. But after a third of a millennium, it’s only natural that the format needs some serious updating.

When brainstorming changes, it may be useful to think of the limitations of ink and paper. From there, we can consider how new technologies can improve or even transform the journal article. Some of these changes have already been widely adopted, while others have never even been debated. Some are adaptive, using the greater storage capacity of computing to extend the functions of the classic journal article, while others are transformative, creating new functions and features only available in the 21st century.

The ideas below are suggestions, not recommendations - it may be that some aspects of the journal article format are better left alone. But we all benefit from challenging our assumptions about what an article is and ought to be.

The classic journal article format cannot convey the full range of information associated with an experiment.

Until the rise of modern computing, there was simply no way for researchers to share all the data they collected in their experiments. Researchers were forced to summarize: to gloss over the details of their methods and the reasoning behind their decisions and, of course, to provide statistical analyses in the place of raw data. While fields like particle physics and genetics continue to push the limits of memory, most experimenters now have the technical capacity to share all of their data.

Many journals have taken to publishing supplemental materials, although this rarely encompasses the entirety of data collected, or enough methodological detail to allow for independent replication. There are plenty of explanations for this slow adoption, including ethical considerations around human subjects data, the potential to patent methods, or the cost to journals of hosting this extra materials. But these are obstacles to address, not reasons to give up. The potential benefits are enormous: What if every published paper contained enough methodological detail that it could be independently replicated? What if every paper contained enough raw data that it could be included in meta-analysis? How much of meta-scientific work is never undertaken, because it's dependent on getting dozens or hundreds of contact authors to return your emails, and on universities to properly store data and materials?

Providing supplemental material, no matter how extensive, is still an adaptive change. What might a transformative change look like? Elsevier’s Article of the Future project attempts to answer that question with new, experimental formats that include videos, interactive models, and infographics. These designs are just the beginning. What if articles allowed readers to actually interact with the data and perform their own analyses? Virtual environments could be set up, lowering the barrier to independent verification of results. What if authors reported when they made questionable methodological decisions, and allowed readers, where possible, to see the results when a variable was not controlled for, or a sample was not excluded?

The classic journal article format is difficult to organize, index or search.

New technology has already transformed the way we search the scientific literature. Where before researchers were reliant on catalogues and indexes from publishers, and used abstracts to guess at relevance, databases such as PubMed and Google Scholar allow us to find all mentions of a term, tool, or phenomena across vast swathes of articles. While searching databases is itself a skill, its one that allows us to search comprehensively and efficiently, and gives us more opportunities to explore.

Yet old issues of organization and curation remain. Indexes used to speed the slow process of skimming through physical papers. Now they’re needed to help researchers sort through the abundance of articles constantly being published. With tens of millions of journal articles out there, how can we be sure we’re really accessing all the relevant literature? How can we compare and synthesize the thousands of results one might get on a given search?

Special kinds of articles - reviews and meta-analyses - have traditionally helped us synthesize and curate information. As discussed above, new technologies can help make meta-analyses more common by making it easier for researchers to access information about past studies. We can further improve the search experience by creating more detailed metadata. Metadata, in this context, is the information attached to an article which lets us categorize it without having to read the article itself. Currently, fields like title, author, date, and journal are quite common in databases. More complicated fields less often adopted, but you can find metadata on study type, population, level of clinical trial (where applicable), and so forth. What would truly comprehensive metadata look like? Is it possible to store the details of experimental structure or analysis in machine-readable format - and is that even desirable?

What happens when we reconsider not the metadata but the content itself? Most articles are structurally complex, containing literature reviews, methodological information, data, and analysis. Perhaps we might be better served by breaking those articles down into their constituent parts. What if methods, data, analysis were always published separately, creating a network of papers that were linked but discrete? Would that be easier or harder to organize? It may be that what we need here is not a better kind of journal article, but a new way of curating research entirely.

Share on: TwitterFacebookGoogle+Email