Category: Bioinformatics


The minimal genome of Craig Venter’s Syn3.0

March 28th, 2016 — 6:21pm

The J Craig Venter Institute has published a paper detailing the genome of their new Syn3.0 synthetic organism. The major accomplishment was to construct a viable cell with a synthetic, extremely small genome: only 473 genes and about 500 kbp.

Even though it is considered to be fully “synthetic”, this genome is not built from scratch. Instead, the starting point is the Mycoplasma genitalium bacterium, from which genes and regions are deleted to produce something that is much smaller, but still viable. This means that even this fully synthetic genome still contains regions and functionalities that are not fully understood. M. genitalium was also the basis for JCVI’s Syn1.0, which was produced in 2008, but the genome of Syn3.0 is the smallest so far – “smaller than that of any autonomously replicating cell found in nature”. Syn3.0 should be a very valuable starting point for developing an explicit understanding of the basic gene frameworks needed by any cell for its survival – the “operating system of the cell” in the words of the authors.

Since so many genes are still basically not understood, the authors could not rely entirely on logic and common sense when choosing what genes to remove. They used an approach that introduced random mutations into the starting organism, and then checked which mutations where viable and which were not. This allowed them to classify genes as essential, inessential or quasi-essential (!). The deletion of essential genes would cause the cell to simply die. The deletion of quasi-essential genes would not kill it, but would dramatically slow its replication rate, severely crippling it. The final Syn3.0 organism has a doubling time of about 3 hours.

Some of the points I took away from this readable and interesting paper were:

Synthetic biology methods are starting to resemble software development methods. The authors describe a design-build-test (DBT) cycle that involve several nontrivial methods, such as in silico design, oligonucleotide synthesis, yeast cloning, insertion into the bacteria, testing, and then (perhaps) sequencing to go back to computers and figure out what went wrong or what went well. Thus, a feedback loop between the cells and the in silico design space is set up.

A very small genome needs a very tightly controlled environment to survive. The medium (nutrient solution) that Syn3.0 lives in apparently contains almost all the nutrients and raw materials it could possibly need from its environment. This means that many genes that would normally be useful for overcoming adverse conditions, perhaps for synthesising nutrients that are not available from the environment, are now redundant and can be removed. So when thinking about genome design, it seems we really have to think about how everything relates to a specific environment.

The mechanics of getting a synthetic genome into a living cell are still complex. A huge amount of wet-lab (and, presumably, dry-lab) processes are still needed to get the genome from the computer into something viable in a cell culture. However, things are going much faster than in 2008, and it’s interesting to think about where this field might be in 2021.

 

Comment » | Bioinformatics

Method and object. Horizons for technological biology

March 22nd, 2016 — 10:32pm

(This post is an attempt at elaborating the ideas I outlined in my talk at Bio-pitch in February.)

The academic and investigative relationship to biology – our discourse about biology – is becoming increasingly technological. In fields such as bioinformatics and computational biology, the technological/instrumental relationship to nature is always at work, constructing deterministic models of phenomena. By using these models, we may repeatedly extract predictable results from nature. An example would be a cause-effect relationship like: exposing a cell to heat causes “heat shock proteins” to be transcribed and translated.

The implicit understanding in all of these cases is that nature can be turned into engineering. Total success, in this understanding, would amount to one or both of the following:

  1. Replacement/imitation as success. If we can replace the phenomena under study by its model (concretely, a machine or a simulation), we have achieved success.
  2. Control as success. If we can consistently place the phenomena under study in verifiable, fully defined states, we have achieved success. (Note that this ideal implies that we also possess perfect powers of observation, down to a hypothetical “lowest level”).

These implicitly held ideals are not problematic as long as we acknowledge that they are mere ideals. They are very well suited as horizons for these fields to work under, since they stimulate the further development of scientific results. But if we forget that they are ideals and begin to think that they really can become realities, or if we prematurely think that biology really must be like engineering, we might be in trouble. Such a belief conflates the object of study with our relatedness to that object. It misunderstands the role of the equipment-based relationship. The model – and associated machines, software, formulae. et cetera – is equipment that constitutes our relatedness to the phenomena. It cannot be the phenomena themselves.

Closely related to the ideals of replacement and control is the widespread application of abstraction and equality in engineering-like fields (and their application to new fields that are presently being clad in the trappings of engineering, such as biology). Abstraction and equality – – the notion that two entities, instances, moments, etc., are in some way the same – allow us to introduce an algebra, to reason in the general and not in specifics. And this is of course what computers do. It also means that two sequences of actions (laboratory protocols for example), although they are different sequences, or the same sequence but at different instances in time, can lead to the same result. Just as 3+1 and 2+2 both “equal” 4. In other words, history becomes irrelevant, the specific path taken no longer means very much. But it is not clear that this can ever truly be the case outside of an algebra, and that is what risks being forgotten.

We might call all this the emergence of technological biology, or technological nature, the conquest of biology by λόγος, et cetera. The principal danger seems to be the conflation of method with object, of abstraction with the specific. And here we see clearly how something apparently simple – studying RNA expression levels in the software package R, for example – opens up the deepest metaphysical abysses. One of the most important tasks right now, then, would be the development of a scientific and technological culture that keeps the benefits of the technological attitude without losing sight of a more basic non-technological relatedness. The path lies open…

Comment » | Bioinformatics, Computer science, Philosophy

Is bioinformatics possible?

February 21st, 2016 — 5:43pm

I recently gave a talk at the Bio-Pitch event at the French-Japanese institute. I was fortunate to be able to speak about some of the ideas I’ve been developing here among so many interesting projects (MetaPhorest, HTGAA, Yoko Shimizu, Tupac Bio, Bento Lab etc).

The topic of my talk was “Is bioinformatics possible”? A deliberate provocation, since of course many people including myself work with this every day. I simply mean to suggest that there are intrinsic problems in the field that are not usually discussed or thought about, and that it might be valuable to confront those problems.

The slides are available, if anyone is interested.

The bigger topic that is hinted at, but not discussed, might be the instrumental relationship of humans to nature. I hope to return to this problem soon.

1 comment » | Bioinformatics, Computer science, Philosophy

Reactive software and the outer world

February 12th, 2016 — 11:13am

At Scala Matsuri a few weeks ago (incidentally, an excellent conference), I was fortunate to be able to attend Jonas Bonér’s impassioned talk about resilience and reactive software. His theme: “without resilience, nothing else matters”.

At the core of it is a certain way of thinking about the ways that complex systems fail. Importantly, complex systems are not the same as complicated systems, although in everyday speech we tend to confuse the two. Perhaps a related or even identical question is: how do composite systems fail?

Using a terminology that originates with the Erlang language, Bonér talked about the “error kernel”, which is the part of a software system that must never fail, no matter what. As long as this innermost part stays alive, other parts are allowed to fail. There are mechanisms to replace, restart or route around failures in the outer parts.

This style of design leads to a well-structured failure and supervision hierarchy. Maybe this style of thinking itself is the most important contribution. In most software systems being designed today, the possibility of errors or failures is often a second class citizen, swept under the carpet, and certainly not part of a carefully considered structure of possibilities of failure. What if this structure becomes a primary concern?

Once errors are well structured and organised in a hierarchy, it also becomes easy to decide what to do when errors occur. The hierarchy structure clearly indicates which parts of a system have become defunct and need to be replaced or bypassed. Recoverability – being able to crash safely – at every level takes the software system a little bit closer, it seems, to biological systems.

Biological systems, Bonér pointed out, usually operate with some degree of inherent failure, be it disease, weakness, mutations or environmental stress. Perfect functioning is not typical, and it seems to me that for most organisms such a state may not even exist.

Recoverability at every level, resilience, and error hierarchies – “let it fail” – is truly a significant and very humble way of thinking about software. It means that as the developer, I acknowledge that the software I am writing does not control the universe (although as a developer I often fall prey to that illusion). The active principle, the “prime mover”, is somewhere outside the scope that I control. When it produces some unforeseen circumstance, we must respond properly. Reactive software to me seems to quietly acknowledge this order of things.

I have only had a very brief opportunity to try out Akka, Typesafe’s actor framework, in my projects so far, but I felt inspired by Boner’s talk and hope to use it more extensively in the future.

Comment » | Bioinformatics, Computer science, Philosophy, Software development

Mysteries of the scientific method

November 7th, 2015 — 10:48am

mitScientific method can be understood as the following steps: formulating a hypothesis, designing an experiment, carrying out experiments, and drawing conclusions. Conclusions can feed into hypothesis formulation again, in order for a different (related or unrelated) hypothesis to be tested, and we have a cycle. This feedback can also take place via a general theory that conclusions contribute to and hypotheses draw from. The theory gets to represent everything we have learned about the domain so far. Some of the steps may be expanded into sub-steps, but in principle this cycle is how we generally think of science.

This looks quite simple, but is it really? Let’s think about hypothesis formulation and drawing conclusions. In both of these steps, the results are bounded by our imagination and intuition. Thus, something that doesn’t ever enter anybody’s imagination will not be established as scientific fact. In view of this, we should hope that scientists do have vivid imaginations. It is easy to imagine that there might be very powerful findings out there, on the other side of our current scientific horizon, that nobody has yet been creative enough to speculate about. It is not at all obvious that we can see the low hanging fruit or even survey this mountainous landscape well – particularly in an age of hyper-specialisation.

But scientists’ imaginations are probably quite vivid in many cases – thankfully. Ideas come to scientists from somewhere, and some ideas persist more strongly than others. Some ideas seduce scientists to years of hard labour, even when the results are meagre at first. Clearly this intuition and sense that something is worth investigating is absolutely crucial to high quality results.

A hypothesis might be: there is a force that make bodies with mass attract to each other, in a way that is inversely proportional to the squared distance between them. To formulate this hypothesis we need concepts such as force, bodies, mass, distance, attraction. Even though the hypothesis might be formulated in mere words, these words all depend on experience and practices – and thus equipment (even if the equipment used in some cases is simply our own bodies). If this hypothesis is successfully proven, then a new concept becomes available: the law of gravity. This concept in turn may be incorporated into new hypotheses and experiments, paving the way for ever higher and more complex levels of science and scientific phenomena.

Our ability to form hypotheses, to construct equipment and to draw conclusions, seem to be human capacities that are not easy to automate.

Entities such as matter, energy, atoms and electrons become accessible – I submit – primarily through the concepts and equipment that give access to them. In a world with an alternate history different from ours, it is conceivable that entirely different concepts and ideas would explain the same phenomena that are explained by our physics. For science to advance, new equipment and new concepts need to be constructed continually. This process is almost itself an organic growth.

Can we have automated science? Do we no longer need scientific theory? (!?) Can computers one day carry out our science for us? Only if either: a) science is not an essentially human activity, or b) computers become able to take on this human essence, including the responsibility for growing the conceptual-equipmental boundary. Data mining in the age of “big data” is not enough, since this (as far as I know) operates with a fixed equipmental boundary. As such, it would only be a scientific aid and not a substitute for the whole process. Can findings that do not result in concepts and theories ever be called scientific?

If computer systems ever start designing and building new I/O-devices for themselves, maybe something in the way of “artificial science” could be achieved. But it is not clear that the intuition guiding such a system could be equivalent to the human intuition that guides science. It might proceed on a different path altogether.

1 comment » | Bioinformatics, Computer science, Philosophy

Back to top