I hope that you all know that MEE publishes applications paper, which we make freely available for everyone to read, and the software is (of course) downloadable too.
A couple of times over the last few weeks we have been asked to update some of the software code in the Applications. This presents us with a problem: whilst the occasional update is OK, if we commit to adding every update that is released, as could end up spending significant time putting up updates, which is inefficient all round. On the other hand we obviously want to provide access to the latest, best, software.
So, what to do? Should we allow all updates, only some (e.g. bug fixes), or none at all? We need a policy on this, so everyone knows the situation, but we have not decided what the policy should be. Now, we could come up with something half-baked on our own, but we thought it would be better to ask for some help, so we can come up with a plan that will suit everyone.
There are a number of issues:
- As a journal, MEE needs to have some level of permanence and stability: when you cite something you need to be confident that it’s the same when someone reads the citation.
- Permanence and stability of links is also important: if we link to a page we want to be reasonable confident it will still be there in 5 years time.
- We don’t want to be constantly changing code on the website: it could take up a lot of our time, and we aren’t set up to be a software repository like SourceForge.
- There are different levels of changes: from bug fixes through minor tweaks to total re-writes of the code. We don’t have to deal with them in the same way, but we should obviously be consistent.
So, what exactly should we do? How can we balance our desire to provide a stable record with providing access to up to date code? My current thought is that we should provide bug fixes, but perhaps not bigger changes. But we should also provide links to data repositories, and provide access that way. My worry with this is that the web pages we link to should be stable, so they don’t disappear 6 months after we put the link up.
I just googled “OpenBUGS”, and the first link I got was to the now-defunct Helsinki pages that I set up years ago – luckily I was prodded to put in a link to the newer pages (and – even better – those pages are current). How can we ensure our links remain current for some time (several years, at least) after an application is published? Or do we have to worry about this?
I’m sure these questions have been raised before, and other journals have found their own solutions. If you have any thoughts, or links, or wise words then please comment and help us reach the right decision.
My opinion would be to require the paper to point at a proper software repository as well as to the specific version of the software repository that was referenced in the paper. Then that will always contain both the exact version referred to in the paper, as well as the most up to date version, and it provides clear options for people to provide documentation as to what has changed.
For me, the paper is a static description of a snapshot at a particular time. Unless it contains serious errors (of the type that merit a formal correction), it shouldn’t really change. The software is a different kind of object that should be changing and improving, so it needs a different type of repository, best served by things like Sourceforge, Github, Bitbucket etc. If the journal provides a comment functionality then it is possible to provide pointers to updates. A related policy question is when does the degree of change merit a new paper.
FWIW on ORC I would fork the software repository and place it in a journal specific repo, but point back to the “proper” repo as we don’t want to split the community. We talked at some length about the concept of “update papers” that might have a light touch peer review process but didn’t reach any specific conclusions on that. BMC provides commenting facilities so that covered a lot of what was needed.
The Journal of Statistical Software might be a useful guide to follow. Papers submitted there contain the version of code used to write the paper; the stuff being documented *in* the paper is provided as a tar ball etc. That should never change because of the need to have a fixed point in time reference to code just as we have a version of record of a paper. Any serious errors in the paper and hence the package as published in the version of record should be handled as if they were an error in a paper and a retraction, addendum or corrigendum published to update the published version of record.
If authors use a repository where development takes place or they deposit new version of code in online archives like CRAN, then they should be encouraged to include such details in manuscripts. If those go stale then that is an issue with any citation of a web-based resource and we shouldn’t worry too much. At least you have access to the version of record.
I don’t see how MEE can have it’s own repo for code that tracks bug fixes. That would place too high a burden on developers and the journal staff to maintain and whatever choice you made in terms of type of repo, whether it did version control etc would inevitably not please everyone. If I fix bugs in my svn sources for one of my R packages, I don’t want to have to use a different system to correct the same bugs in the package I documented in a paper to MEE some months or years ago. Most third-party repos require permissive licences and there is no requirement (that I could see) that Applications papers had to supply code – a non-free binary would suffice?
It would help if the journal had some minimal requirements that authors had to meet to publish code in an Applications paper, that a version of record tar ball (or binary?) is archived as Supplementary material, and authors were encouraged to document where development or new version of the package can be obtained.
I agree! I don’t think MEE should be updating software that has been published with a paper, any software published with a journal article should be a stable link which gives reproduceable results, rather than being something which might change as software gets updated. And there should also be a semi-reliable link to where a current version of the software can hopefully be found. Most authors would actually include such a link already – if you write a paper on some new software but forget to tell readers how they can get it you’ve missed an opportunity!
Why not encourage authors to deposit the code in repositories that already exist (GitHub, GoogleCode) and in the manuscript reference and link to the particular version of the program that was used. Then, the reader/user would be automatically taken to the original published software version and be able to see whether any further changes to the code have been committed (whether major or minor). Although, it is unlikely that GoogleCode will disappear, just in case MEE could automate the mirroring of the published code from GitHub and GoogleCode to ensure that nothing gets lost.
That would be fine *if* Applications were limited to open source code, which they don’t appear to be. What if people don’t want to release open code? Should they be barred from writing a paper? Also, the list of acceptable repos would have to be quite large; there are few things that engender as fierce a flame war was asking what is the best version control system 😉
I don’t know, perhaps MEE could say they would only publish software descriptions that are of open source software, and that the code has to be on github/bitbucket/etc.
I would require authors to deposit code in ideally GitHub, or Sourforge, or bitbucket, etc., Then MEE could simply include a prominent link to the code repository so that interested parties could go get it. Dryad just added versioning capability, but versioning of code has been solved more elegantly on sites like GitHub.