The “Friedrich principles” for bioinformatics software

I’ve just come back from Biohackathon 2012 in Toyama, an annual event, traditionally hosted in Japan, where users of semantic web technologies (such as RDF and SPARQL) in biology and bioinformatics come together to work on projects. This was a nice event with an open and productive atmosphere, and I got a lot out of attending. I participated in a little project that is not quite ready to be released to the wider public yet. More on that in the future.

Recently I’ve also had a paper accepted at the PRIB (Pattern Recognition in Bioinformatics) conference, jointly with Gabriel Keeble-GagnÃ¨re. The paper is a slight mismatch for the conference, as it is really focussing on software engineering more than pattern recognition as such. In this paper, titled “An Open Framework for Extensible Multi-Stage Bioinformatics Software” (arxiv) we make a case for a new set of software development principles for experimental software in bioinformatics, and for big data sciences in general. We provide a software framework that supports application development with these principles – Friedrich – and illustrate its application by describing a de novo genome assembler we have developed.

The actual gestation of this paper in fact occurred in the reverse order from the above. In 2010, we begun development on the genome assembler, at the time a toy project. As it grew, it became a software framework, and eventually something of a design philosophy. We hope to keep building on these ideas and demonstrate their potential more thoroughly in the near future.

For the time being, these are the “Friedrich principles” in no particular order.

Expose internal structure.
Conserve dimensionality maximally. (“Preserve intermediate data”)
Multi-stage applications. (Experimental and “production”, and moving between the two)
Flexibility with performance.
Minimal finality.
Ease of use.

Particularly striking here is (I think) the idea that internal structure should be exposed. This is the opposite of encapsulation, an important principle in software engineering. We believe that when the users are researchers, they are better served by transparent software, since the workflows are almost never final but subject to constant revision. But of course, the real trick is knowing what to make transparent and what to hide – an economy is still needed.

Monomorphic