Jul 30, 2014

Open-source software for science


A little more than three years ago I started working on OpenSesame, a free program for the easy development of experiments, mostly oriented at psychologists and neuroscientists. The first version of OpenSesame was the result of a weekend-long hacking sprint. By now, OpenSesame has grown into a substantial project, with a small team of core developers, tens of occasional contributors, and about 2500 active users.

Because of my work on OpenSesame, I've become increasingly interested in open-source software in general. How is it used? Who makes it? Who is crazy enough to invest time in developing a program, only to give it away for free? Well ... quite a few people, because open source is everywhere. Browsers like Firefox and Chrome. Operating systems like Ubuntu and Android. Programming languages like Python and R. Media players like VLC. These are all examples of open-source programs that many people use on a daily basis.

But what about specialized scientific software? More specifically: Which programs do experimental psychologists and neuroscientists use? Although this varies from person to person, a number of expensive, closed-source programs come to mind first: E-Prime, SPSS, MATLAB, Presentation, Brainvoyager, etc. Le psychonomist moyen is not really into open source.

In principle, there are open-source alternatives to all of the above programs. Think of PsychoPy, R, Python, or FSL. But I can imagine the frown on the reader's face: Come on, really? These freebies are not nearly as good as 'the real thing', are they? But this, although true to some extent, merely raises another question: Why doesn't the scientific community invest more effort in the development of open-source alternatives? Why do we keep accepting inconvenient licenses (no SPSS license at home?), high costs ($995 for E-Prime 2 professional), and scripts written in proprietary languages that cannot easily be shared between labs. This last point has become particularly relevant with the recent focus on replication and transparency. How do you perform a direct replication of an experiment if you do not have the required software? And what does transparency even mean if we cannot run each other's scripts?

Despite widespread skepticism, I suspect that most scientists feel that open source is ideologically preferable over proprietary scientific software. But open source suffers from an image problem. For example, a widely shared misconception is that open-source software is buggy, whereas proprietary software is solid and reliable. But even though quality is subjective--and due to cognitive dissonance strongly biased in favor of expensive software!--this belief is not consistent with reality: Reports have shown that open-source software contains about half as many errors per line of code as proprietary software.

Another misconception is that developing (in-house) open-source software is expensive and inefficient. This is essentially a prisoners dilemma. Of course, for an individual organization it is often more expensive to develop software than to purchase a commercial license. But what if scientific organizations would work together to develop the software that they all need: You write this for me, I write this for you? Would open source still be inefficient then?

Let's consider this by first comparing a few commercial packages: E-Prime, Presentation, and Inquisit. These are all programs for developing experiments. Yet the wheel has been re-invented for each program. All overlapping functionality has been re-designed and re-implemented anew, because vendors of proprietary software dislike few things as much as sharing code and ideas. (This is made painfully clear by numerous patent wars.) Now, let's compare a few open-source programs: Expyriment, OpenSesame, and PsychoPy. These too are all programs for developing experiments. And these too have overlapping functionality. But you can use these programs together. Moreover, they build on each other's functionality, because open-source licenses allow developers to modify and re-use each other's code. The point that I'm trying to make is not that open-source programs are better than their proprietary counterparts. Everyone can decide that for him or herself. The crucial point is that the development process of open-source software is collaborative and therefore efficient. Certainly in theory, but often in practice as well.

So it is clear that open-source software has many advantages, also--maybe even especially so--for science. Therefore, development of open-source software should be encouraged. How could universities and other academic organizations contribute to this?

A necessary first step is to acknowledge that software needs time to mature. There are plenty of young researchers, technically skilled and brimming with enthusiasm, who start a software project. Typically, this is software that they developed for their own research, and subsequently made freely available. If you are lucky, your boss allows this type of frivolous fun, as long the 'real' work doesn't suffer. And maybe you can even get a paper out of it, for example in Behavior Research Methods, Journal of Neuroscience Methods, or Frontiers in Neuroinformatics. But it is often forgotten that software needs to be maintained. Bugs need to be fixed. Changes in computers and operating systems require software updates. Unmaintained software spoils like an open carton of milk.

And this is where things get awkward, because universities don't like maintenance. Developing new software is one thing. That's innovation, and somewhat resembles doing research. But maintaining software after the initial development stage is over is not interesting at all. You cannot write papers about maintenance, and maintenance does not make an attractive grant proposal. Therefore, a lot of software ends up 'abandonware', unmaintained ghost pages on development sites like GitHub, SourceForge, or Google Code.

Ideally, universities would encourage maintenance of open-source scientific software. The message should be: Once you start something, go through with it. They should recognize that the development of high-quality software requires stamina. This would be an attitude change, and would require that universities get over their publication fetish. Because the value of a program is not in the papers that have been written about it, but in the scientists that use it. Open-source scientific software has a very concrete and self-evident impact for which developers should be rewarded. Without incentives, they won't make the high-quality software that we all need!

In other words, developers could use a bit of encouragement and support, and this is currently lacking. I recently attended the APS convention, where I met Jeffrey Spies, one of the founders of the Center for Open Science (COS). As readers of this blog probably know, the COS is an American organization that (among many other things) facilitates development of open-source scientific software. They provide advice, support promising projects, and build networks. (Social, digital, and a mix of both, like this blog!) A related organization that focuses more specifically on software development is the Mozilla Science Lab (MSL). I think that the COS and MSL do great work, and provide models that could be adopted by other organizations. For example, I currently work for the CNRS, the French organization for fundamental research. The CNRS is very large, and could easily provide sustained support for the development of high-quality open-source projects. And the European Research Council could easily do so as well. However, these large research organization do not appear to recognize the importance of software development. They prefer to invest all of their budget in individual research projects, rather than invest a small part of it in the development and maintenance of the software that these research projects need.

In summary, a little systematic support would do wonders for the quality and availability of open-source scientific software. Investing in the future, is that not what science is about?

A Dutch version of this article initially appeared in De Psychonoom, the magazine of the Dutch psychonomic society. This article has been translated and updated for the OSC blog.

Share on: TwitterFacebookGoogle+Email