Category: Software development

Covid-19 and time

May 12th, 2021 — 12:11pm

I can now conclusively answer the question raised at the end of my blog post from December 2019: the 2020s are not a decade of orderly peace. What a strange year. But weren’t years always strange?

Time passes not only quantitatively but also qualitatively. A year spent with Covid-19 seems to have passed differently from a year without it. Maybe boredom has increased (possibly for the better), maybe focus has increased. For many there has been, and continues to be, untold suffering. The mRNA vaccine, based on a technique that was overlooked by academia for a long time, at this moment looks to become one of the great success stories of science. Possibly and hopefully. Many countries have aggressively and successfully vaccinated large swathes of their populations; Japan is unfortunately still in the early stages of this process. The pandemic year almost seems to have removed us physically from the world before, opening up a widening chasm that we may or may not be able to cross again. Looking across that chasm from 2021, both the present time and that time before the pandemic seem alien and strange.

Around this time last year, when lockdowns were starting to happen in some countries, the prospect of a two week lockdown seemed to me an unbearable burden. (To date Japan has not had a hard lockdown.) Today, of course, that seems like it would have been a small price to pay, given what some other countries, like Taiwan, have achieved. I must not be as good at deferring present benefits for future rewards as I thought.

On a different note, for the technically-minded: During the past year I was able to channel a lot of energy into research on k-mer counting, a genomic data problem. A medium post (in the early stages of this work) and eventually a paper were published. The main achievement was to find a novel way of grouping k-mers — short genomic subsequences — into evenly sized bins, which greatly speeds up many kinds of processing. This would not have been possible without previous work on FastKmer (Ferraro Petrillo et al) and the new randomized algorithm PASHA for generating compact universal hitting sets (Ekim et al). This work may also be an interesting case study on the low hanging fruit available when aggressive engineering, specifically for the purpose of improving parallelism, is applied to existing algorithms.

Comment » | Bioinformatics, Computer science, Life, Philosophy, Software development

Reactive software and the outer world

February 12th, 2016 — 11:13am

At Scala Matsuri a few weeks ago (incidentally, an excellent conference), I was fortunate to be able to attend Jonas Bonér’s impassioned talk about resilience and reactive software. His theme: “without resilience, nothing else matters”.

At the core of it is a certain way of thinking about the ways that complex systems fail. Importantly, complex systems are not the same as complicated systems, although in everyday speech we tend to confuse the two. Perhaps a related or even identical question is: how do composite systems fail?

Using a terminology that originates with the Erlang language, Bonér talked about the “error kernel”, which is the part of a software system that must never fail, no matter what. As long as this innermost part stays alive, other parts are allowed to fail. There are mechanisms to replace, restart or route around failures in the outer parts.

This style of design leads to a well-structured failure and supervision hierarchy. Maybe this style of thinking itself is the most important contribution. In most software systems being designed today, the possibility of errors or failures is often a second class citizen, swept under the carpet, and certainly not part of a carefully considered structure of possibilities of failure. What if this structure becomes a primary concern?

Once errors are well structured and organised in a hierarchy, it also becomes easy to decide what to do when errors occur. The hierarchy structure clearly indicates which parts of a system have become defunct and need to be replaced or bypassed. Recoverability – being able to crash safely – at every level takes the software system a little bit closer, it seems, to biological systems.

Biological systems, Bonér pointed out, usually operate with some degree of inherent failure, be it disease, weakness, mutations or environmental stress. Perfect functioning is not typical, and it seems to me that for most organisms such a state may not even exist.

Recoverability at every level, resilience, and error hierarchies – “let it fail” – is truly a significant and very humble way of thinking about software. It means that as the developer, I acknowledge that the software I am writing does not control the universe (although as a developer I often fall prey to that illusion). The active principle, the “prime mover”, is somewhere outside the scope that I control. When it produces some unforeseen circumstance, we must respond properly. Reactive software to me seems to quietly acknowledge this order of things.

I have only had a very brief opportunity to try out Akka, Typesafe’s actor framework, in my projects so far, but I felt inspired by Boner’s talk and hope to use it more extensively in the future.

Comment » | Bioinformatics, Computer science, Philosophy, Software development

The bounded infinity of language

August 9th, 2014 — 5:48pm

Works of art, including film, painting, sculpture, literature and poetry, have a seemingly inexhaustible quality. As we keep confronting them, renewing our relationship with them over time, we continually extract more meaning from them. Some works truly appear to be bottomless. Reaching the bottom easily is, of course, a sure sign that a work will not have much lasting value.

Out of the forms listed above, (written) poetry and literature have the particular property that they are crafted out of a demonstrably finite medium: text. A finite alphabet, finite vocabulary, and a finite number of pages. As long as one disregards the effect of details such as paper quality, typography and binding, perfect copies can be made; the text can indeed be transcribed in its entirety without information loss. Somehow, reading Goethe on a Kindle is an experience that still holds power, although he presumably never intended his books to be read on Kindles (and some might argue that reading him in this way is ignoble).

How is it then that the evocative power of something finite can seem to be boundless? This curious property is something we might call the poetic or metaphorical qualities of a text. (Works of film, painting, sculpture and so on most likely also have this power, but it is trickier to demonstrate that they are grounded in a finite medium.) Through this mysterious evocative power, the elements that make up a work of art allow us to enter into an infinity that has been enclosed in a finite space. It will be argued that what is evoked comes as much from the reader as from the text, but this duality applies to all sensation.

With this in mind we turn, once again, to programming and formal “languages”. Terms in programming languages receive their meaning through a formal semantics that describes, mathematically, how the language is to be translated into an underlying, simpler language. This process takes place on a number of levels, and eventually the lowest underlying language is machinery. This grounds the power of a program to command electrons. But this is something different from the meaning of words in a natural language. The evocative power described above is clearly absent, and computer programs today do not transcend their essential finitude. With brute force, we could train ourselves to read source code metaphorically or poetically, but in most languages I know, this would result in strained, awkward and limited metaphors. (Perhaps mostly because programming languages to a large extent reference a world different from the human world.)

Consider how this inability to transcend finitude impacts our ability to model a domain in a given programming language. With an already formal domain, such as finance or classical mechanics, it is simple since what needs to happen is a mere translation. On the other hand, other domains, such as biology, resist formalisation  – and perhaps this is one of their essential properties. Here we would like to draw on the evocative, poetic, and metaphorical capacities of natural language – for the sake of program comprehension and perhaps also to support effective user interfaces – while also writing practical programs. But we have yet to invent a formal language that is both practical and evocative to the point that works of art could be created in it.

an ancient pond / a frog jumps in / the splash of water

(Bashou, 1686)

1 comment » | Computer science, Philosophy, Software development

Small Tools for Bioinformatics

February 21st, 2014 — 3:04pm

Pjotr Prins has published a Small Tools Manifesto for Bioinformatics, which is well worth a read for anyone who develops bioinformatics software.

In essence it’s about increased adoption of the Unix design philosophy. I fully support the manifesto, which in many ways is reminiscent of the ideas that me and Gabriel Keeble-Gagnere presented in our Friedrich paper at PRIB2012. The idea of designing software as small parts that can be recombined freely, instead of as a huge black box with a glossy surface, is an extremely powerful one, particularly in the research space.

Comment » | Bioinformatics, Computer science, Software development

Equipmental visibility and barriers to understanding

July 12th, 2013 — 9:28pm

The following is an excerpt from a text I am currently in the process of writing, which may or may not be published in this form. The text is concerned with the role of software in the scientific research process, and what happens when researchers must interact with software instead of hardware equipment, and finally the constraints that this places on the software development process.

Technological development since the industrial revolution has made equipment more intricate. Where we originally had gears, levers and pistons, we progressed via tape, vacuum tubes and punch cards to solid state memory, CPUs and wireless networks. The process of the elaboration of technology has also been the process of its hiding from public view. An increasing amount of complexity is packed into compact volumes and literally sealed into “black boxes”. This does not render the equipment inaccessible, but it does make it harder to understand and manipulate as soon as one wants to go outside of the operating constraints that the designers foresaw. As we have already noted, this poses problems to the scientific method. Scientists are human, and they engage with their equipment through the use of their five senses. Let us suggest a simple rule of thumb: the more difficult equipment is to see, touch, hear etc., the more difficult it becomes to understand it and modify its function. The evolution of technology has happened at the expense of its visibility. The user-friendly interface that provides a simple means of interacting with a complex piece of machinery, which initially is very valuable, can often become a local maximum that is difficult to escape if one wants to put the equipment to new and unforeseen uses. We may note two distinct kinds of user-friendly interfaces: interfaces where the simplified view closely approximates the genuine internals of the machinery, and interfaces where the simplified view uses concepts and metaphors that have no similarity to those internals. The former kind of interface we will call an authentic simplification, the latter an inauthentic simplification.

Of course, software represents a very late stage in the progression from simple and visible to complex and hidden machinery. Again we see how software can both accelerate and retard scientific studies. Software can perform complex information processing, but it is much harder to interrogate than physical equipment: the workings are hidden, unseen. The inner workings of software, which reside in source code, are notoriously hard to communicate. A programmer watching another programmer at work for hours may not fully be able to understand what kind of work is being done, even if both are highly skilled, unless a disciplined coding style and development methodology is being used. Software is by its very nature something hidden away from human eyes: from the very beginning it is written in artificial languages, which are then gradually compiled into even more artificial languages for the benefit of the processor that is to interpret them. Irreversible, one-way transformations are essential to the process of developing and executing software. This leads to what might be called a nonlinearity when software equipment is being used as part of an experimental setup. Whereas visible, tangible equipment generally yields more information about itself when inspected, and whereas investigators generally have a clear idea how hard it is to inspect or modify such equipment, software equipment often requires an unknown expenditure of effort to inspect or modify – unknown to all except those programmers who have experience working with the relevant source code, and even they will sometimes have a limited ability to judge how hard it would be to make a certain change (software projects often finish over time and over budget, but almost never under time or under budget). This becomes a severe handicap for investigators. A linear amount of time, effort and resources spent understanding or modifying ordinary equipment will generally have clear payoffs, but the inspection and modification of software equipment will be a dark area that investigators, unless they are able to collaborate well with programmers, will instinctively avoid.

To some degree these problems are inescapable, but we suggest the maximal use of authentic simplification in interfaces as a remedy. In addition, it is desirable to have access to multiple levels of detail in the interface, so that each level is an authentic simplification of the level below. In such interface strata, layers have the same structure and only differ in the level of detail. Thus, investigators are given, as far as possible, the possibility of smooth progression from minimal understanding to full understanding of the software. The bottom level interface should in its conceptual structure be very close to the source code itself.

Comment » | Bioinformatics, Computer science, Philosophy, Software development

Back to top