Tag: platforms

The “Friedrich principles” for bioinformatics software

September 13th, 2012 — 12:51am

I’ve just come back from Biohackathon 2012 in Toyama, an annual event, traditionally hosted in Japan, where users of semantic web technologies (such as RDF and SPARQL) in biology and bioinformatics come together to work on projects. This was a nice event with an open and productive atmosphere, and I got a lot out of attending. I participated in a little project that is not quite ready to be released to the wider public yet. More on that in the future.

Recently I’ve also had a paper accepted at the PRIB (Pattern Recognition in Bioinformatics) conference, jointly with Gabriel Keeble-Gagnère. The paper is a slight mismatch for the conference, as it is really focussing on software engineering more than pattern recognition as such. In this paper, titled “An Open Framework for Extensible Multi-Stage Bioinformatics Software” (arxiv) we make a case for a new set of software development principles for experimental software in bioinformatics, and for big data sciences in general. We provide a software framework that supports application development with these principles – Friedrich – and illustrate its application by describing a de novo genome assembler we have developed.

The actual gestation of this paper in fact occurred in the reverse order from the above. In 2010, we begun development on the genome assembler, at the time a toy project. As it grew, it became a software framework, and eventually something of a design philosophy. We hope to keep building on these ideas and demonstrate their potential more thoroughly in the near future.

For the time being, these are the “Friedrich principles” in no particular order.

  • Expose internal structure.
  • Conserve dimensionality maximally. (“Preserve intermediate data”)
  • Multi-stage applications. (Experimental and “production”, and moving between the two)
  • Flexibility with performance.
  • Minimal finality.
  • Ease of use.

Particularly striking here is (I think) the idea that internal structure should be exposed. This is the opposite of encapsulation, an important principle in software engineering. We believe that when the users are researchers, they are better served by transparent software, since the workflows are almost never final but subject to constant revision. But of course, the real trick is knowing what to make transparent and what to hide – an economy is still needed.

Comment » | Bioinformatics, Computer science, Software development

Doing generality right

April 2nd, 2010 — 11:00am

Many software developers, while making a tool to solve a specific problem, heed the siren call of generality. By making a few specific changes, they can turn the tool into a general framework for solving a larger class of problems. And then, with a few more changes, an even larger class of problems, and so on. This often turns into a trap, and there is a risk that the end of the line is an over-generalised tool that isn’t very good at solving any problem, because the specificity that was present in the first place was part of why it was powerful. In this way, constraints can equal freedom.

Sometimes, though, the generalizers get it right. These are often moments of exceptional and lasting innovation. One example of such a system is the fabulously influential (but today, not that widely used) programming languageSmalltalk. Invented by the former jazz guitarist and subsequent Turing award winnerAlan Kay, Smalltalk was released as one of the first true object-oriented programming languages in 1980. It is probably still ahead of its time. It runs on a virtual machine, it has reflection, everything is an object, and the separation between applications is blurred in favour of a big object box. On runningSqueak, a popular Smalltalk implementation, with its default system image today, users discover that all the objects on the screen, including the IDE to develop and debug objects, appear to follow the same rules. No objects seem to have special privileges.

Another such system is an application that used to be shipped on Mac computers in the distant past,Hypercard. Hypercard enabled ordinary users to create highly customized software using the idea of filing cards in a drawer as the underlying model, blurring the line between end users and developers through its accessibility. I haven’t had the privilege to use it myself, but it seems like this was as powerful as it was because it served up a homogenous and familiar model, where everything was a card, and yet the cards had considerable scope for modification and special features. Even though, in some ways, this system appears to be a database, the cards didn’t need to have the same format, for instance. (Are we seeing this particular idea being recycled in a more enterprisey form inCouchDB?)

There are more examples of successful highly general design: the Unix file system, TCP/IP sockets and so on. They all have in common that they are easy to think about as a mental model, since a universal set of rules apply to all objects, they scale well in different directions when used for different purposes, and they give the user a satisfying sense of empowerment, blurring the line between work and play to draw on the user’s natural creativity. Successful general systems are the ones that can be easily applied in quite varied situations without tearing in the seams.

While not widely used by industrial programmers today, Smalltalk was incredibly influential. In 1981 Objective-C was created by Brad Cox and Tom Love, directly inspired by what the Smalltalk designers had done. Objective-C was subsequently used as the language of choice for NeXTStep, and later for Apple’s MacOS X when Apple bought NeXT. Today it’s seeing a big surge in popularity thanks to devices like the iPhone, on which it is also used. In 1995 Java was introduced, owing a great deal of its design to Objective-C, but also introducing features such as a universal virtual machine and garbage collection, which Objective-C didn’t have at the time. In some sense, both Objective-C and Java are blends of the C-family languages and Smalltalk. Tongue in cheek, we might say that it seems evolution in industrial programming these days consists of finding blends that contain less of the C model and more of smalltalk or functional programming.

Comment » | Philosophy, Software development

Where is Java going?

November 22nd, 2009 — 2:51pm


Today, Java is one of the most popular programming languages. Introduced in 1995, it rests on a tripod of the language itself, its libraries, and the JVM. In the TIOBE programming language league charts, it has been at the top for as long as the measurements have been made (since 2002), overtaken by C only for a brief period due to measurement irregularities.

Yet not all is Sun-shine in Java world. Sun Microsystems is about to be taken over by Oracle, pending EU approval. (EU is really dragging its feet in this matter but it seems unlikely they would really reject the merger). Larry Ellison has voiced strong support for Java and for Sun’s way of developing software, so maybe this is really not a threat by itself. But how far can the language itself go?

The Java language was carefully designed to be relatively easy to understand and work with. James Gosling, its creator, has called it a blue collar language, meaning it was designed for industrial, real world use. In a world where C++ was the de facto standard for OO programming, Java was a big step forward in terms of ease of development, with its lack of pointers and strong type system – to say nothing of its garbage collection. Many classes of common programming errors were removed altogether. However, in the interests of simplicity and clarity, some tradeoffs were made. The language’s detractors today point to problems such as excessive verbosity, the lack of closures, the limited generics, and the checked exceptions.

For some time there has been a lot of exciting alternative languages available on the JVM. Clojure is a Lisp dialect. Scala, the only non-Java JVM language I have used extensively, mixes the functional and object oriented paradigms. Languages like JPython and JRuby basically exist to allow scripting and interoperability with popular scripting languages on the JVM.

Today it seems as if the JVM and the standardized libraries will be Java’s most prominent legacy. The language itself will not go away for a long time either – considering that many companies still maintain or develop in languages like Cobol and Fortran, we will probably be maintaining Java code 30 years from now (what a sad thought!), but newer and more modern JVM languages will probably take turns being number one. The JVM and the libraries guarantee that we will be able to mix them relatively easily anyway, unless they stray too far from the standard with their custom features.

So in hindsight, developing this intermediate layer, this virtual machine – and disseminating it so widely –  was a stroke of genius. Will it be that in future programming models we have even more standardized middle layers, and not just one?

Meanwhile, there’s a lot of debate about the process being used to shape and define Java. For a long time, Sun employed something called the Java Community Process, JCP, which was supposed to ensure openness. Some people proclaim that the openness has ended. To take one example, very recently, Sun announced that there will be support for closures in Java 7, after first announcing that there would be no support for closures in Java 7. The process by which this decision has been managed has been described as not being a community effort. Some aspects of Java are definitely up in the air these days.

Comment » | Software development

The future of the web browser

August 6th, 2009 — 12:19am

Internet ExplorerThe web browser, it is safe to say, has gone from humble origins to being the single most widely used piece of desktop software (based on my own usage, but I don’t think I’m untypical). This development continues today. The battles being fought and the tactical decisions being made here reach a very large audience and have a big impact.

When exactly did the web browser make the transition from being a hypertext viewer to an application platform? This transition seems in retrospect to have been a very fluid affair. Forms with buttons, combo boxes and lists were supported very early. Javascript came in not too long after. When the XmlHttpRequest was introduced it wasn’t long until AJAX took off, paving the way for today’s “rich” web browser applications.

A couple of years ago I had a personal project ongoing for some time. I had decided that web browsers weren’t designed for the kind of tasks they were being made to do (displaying applications), and I wanted to make a new kind of application platform for delivering applications over the web. Today I’m convinced that this would never have succeeded. Even if I had gotten the technology right (which I don’t think I was ever close to), I would have had no way of achieving mass adoption. Incremental developments of the web browser have, however, placed a new kind of application platform in the hands of the masses. Today the cutting edge seems to be browsers like Google’s Chrome, aggressively optimised for application delivery. But some new vegetables have been added to the browser soup.


Google’s GWT web toolkit has been available for some time. This framework makes it easier to develop AJAX applications. Some hardcore AJAX developers may consider it immature, but these frameworks are going to be increasingly popular since they bridge the differences between browsers very smoothly, I think. What’s interesting is that the same company is developing GWT and Chrome though. The two sides of the browser-application equation have a common creator. This helps both: GWT can become more popular if Chrome is a popular browser, and Chrome can become more popular if GWT is a popular framework. Google can make and has made GWT apps run very fast with the Chrome browser (I tested this personally with some things I’ve been hacking on). The sky is the limit here; they can easily add special native features in the browser that GWT alone can hook into.

Microsoft have something a little bit similar with their Silverlight, which while not playing quite the same role, has a co-beneficial relationship with Internet Explorer.


Everyone’s favorite browser, Firefox, recently passed 1 billion downloads. Firefox doesn’t really have a web development kit of their own as I understand it. It just tries to implement the standards well. Which is fair and good, but it demotes FF from the league of agenda setters to people who play catch up, in some sense. Though, it must be said, the rich variety of plugins available for FF might go a long way to remedy this.

All this, and I haven’t even touched on Google’s recent foray into the OS market with “Chrome OS”…

Comment » | Uncategorized

Back to top