Tag: academic

Interactive toxicogenomics

May 4th, 2017 — 10:14am

If you work in toxicology or drug discovery, you might be familiar with the database Open TG-GATEs, a large transcriptomics database that catalogues gene expression response to well-known drugs and toxins. This database was developed by Japan’s Toxicogenomics Project during many years, as a private-public sector partnership, and remains a very valuable resource. As with many large datasets, despite the open-ness, accessing and working with this data can require considerable work. Data must always be placed in a context, and these contexts must be continually renewed. One user-friendly interface to simplify access to this data is Toxygates, which I begun developing as a postdoc at NIBIOHN in the Mizuguchi Lab in 2012 (and am still the lead developer of). As a web application, Toxygates lets you look at data of interest in context, together with annotations such as gene ontology terms and metabolic pathways, as well as visualisation tools.

We are now releasing a new major version of Toxygates, which, among many other new features, allows you to perform and visualise gene set clustering analyses directly in the web browser. Gene sets can also be easily characterised through an enrichment function, which is supported by the TargetMine data warehouse. Last but not least, users can now upload their own data and cluster and analyse it in context, together with the Open TG-GATEs data.

Our new paper in Scientific Reports documents the new version of Toxygates and illustrates the use of the new functions through a case study performed on the hepatotoxic drug WY-14643. If you are curious, give it a try.

When I begun the development as a quick prototype, I had no idea that the project would still be evolving many years later. Toxygates represents considerable work and many learning experiences for me as a researcher and software engineer, and I’m very grateful to everybody who has collaborated with us, supported the project, and made our journey possible.


Comment » | Bioinformatics

The minimal genome of Craig Venter’s Syn3.0

March 28th, 2016 — 6:21pm

The J Craig Venter Institute has published a paper detailing the genome of their new Syn3.0 synthetic organism. The major accomplishment was to construct a viable cell with a synthetic, extremely small genome: only 473 genes and about 500 kbp.

Even though it is considered to be fully “synthetic”, this genome is not built from scratch. Instead, the starting point is the Mycoplasma genitalium bacterium, from which genes and regions are deleted to produce something that is much smaller, but still viable. This means that even this fully synthetic genome still contains regions and functionalities that are not fully understood. M. genitalium was also the basis for JCVI’s Syn1.0, which was produced in 2008, but the genome of Syn3.0 is the smallest so far – “smaller than that of any autonomously replicating cell found in nature”. Syn3.0 should be a very valuable starting point for developing an explicit understanding of the basic gene frameworks needed by any cell for its survival – the “operating system of the cell” in the words of the authors.

Since so many genes are still basically not understood, the authors could not rely entirely on logic and common sense when choosing what genes to remove. They used an approach that introduced random mutations into the starting organism, and then checked which mutations where viable and which were not. This allowed them to classify genes as essential, inessential or quasi-essential (!). The deletion of essential genes would cause the cell to simply die. The deletion of quasi-essential genes would not kill it, but would dramatically slow its replication rate, severely crippling it. The final Syn3.0 organism has a doubling time of about 3 hours.

Some of the points I took away from this readable and interesting paper were:

Synthetic biology methods are starting to resemble software development methods. The authors describe a design-build-test (DBT) cycle that involve several nontrivial methods, such as in silico design, oligonucleotide synthesis, yeast cloning, insertion into the bacteria, testing, and then (perhaps) sequencing to go back to computers and figure out what went wrong or what went well. Thus, a feedback loop between the cells and the in silico design space is set up.

A very small genome needs a very tightly controlled environment to survive. The medium (nutrient solution) that Syn3.0 lives in apparently contains almost all the nutrients and raw materials it could possibly need from its environment. This means that many genes that would normally be useful for overcoming adverse conditions, perhaps for synthesising nutrients that are not available from the environment, are now redundant and can be removed. So when thinking about genome design, it seems we really have to think about how everything relates to a specific environment.

The mechanics of getting a synthetic genome into a living cell are still complex. A huge amount of wet-lab (and, presumably, dry-lab) processes are still needed to get the genome from the computer into something viable in a cell culture. However, things are going much faster than in 2008, and it’s interesting to think about where this field might be in 2021.


Comment » | Bioinformatics

Equipmental visibility and barriers to understanding

July 12th, 2013 — 9:28pm

The following is an excerpt from a text I am currently in the process of writing, which may or may not be published in this form. The text is concerned with the role of software in the scientific research process, and what happens when researchers must interact with software instead of hardware equipment, and finally the constraints that this places on the software development process.

Technological development since the industrial revolution has made equipment more intricate. Where we originally had gears, levers and pistons, we progressed via tape, vacuum tubes and punch cards to solid state memory, CPUs and wireless networks. The process of the elaboration of technology has also been the process of its hiding from public view. An increasing amount of complexity is packed into compact volumes and literally sealed into “black boxes”. This does not render the equipment inaccessible, but it does make it harder to understand and manipulate as soon as one wants to go outside of the operating constraints that the designers foresaw. As we have already noted, this poses problems to the scientific method. Scientists are human, and they engage with their equipment through the use of their five senses. Let us suggest a simple rule of thumb: the more difficult equipment is to see, touch, hear etc., the more difficult it becomes to understand it and modify its function. The evolution of technology has happened at the expense of its visibility. The user-friendly interface that provides a simple means of interacting with a complex piece of machinery, which initially is very valuable, can often become a local maximum that is difficult to escape if one wants to put the equipment to new and unforeseen uses. We may note two distinct kinds of user-friendly interfaces: interfaces where the simplified view closely approximates the genuine internals of the machinery, and interfaces where the simplified view uses concepts and metaphors that have no similarity to those internals. The former kind of interface we will call an authentic simplification, the latter an inauthentic simplification.

Of course, software represents a very late stage in the progression from simple and visible to complex and hidden machinery. Again we see how software can both accelerate and retard scientific studies. Software can perform complex information processing, but it is much harder to interrogate than physical equipment: the workings are hidden, unseen. The inner workings of software, which reside in source code, are notoriously hard to communicate. A programmer watching another programmer at work for hours may not fully be able to understand what kind of work is being done, even if both are highly skilled, unless a disciplined coding style and development methodology is being used. Software is by its very nature something hidden away from human eyes: from the very beginning it is written in artificial languages, which are then gradually compiled into even more artificial languages for the benefit of the processor that is to interpret them. Irreversible, one-way transformations are essential to the process of developing and executing software. This leads to what might be called a nonlinearity when software equipment is being used as part of an experimental setup. Whereas visible, tangible equipment generally yields more information about itself when inspected, and whereas investigators generally have a clear idea how hard it is to inspect or modify such equipment, software equipment often requires an unknown expenditure of effort to inspect or modify – unknown to all except those programmers who have experience working with the relevant source code, and even they will sometimes have a limited ability to judge how hard it would be to make a certain change (software projects often finish over time and over budget, but almost never under time or under budget). This becomes a severe handicap for investigators. A linear amount of time, effort and resources spent understanding or modifying ordinary equipment will generally have clear payoffs, but the inspection and modification of software equipment will be a dark area that investigators, unless they are able to collaborate well with programmers, will instinctively avoid.

To some degree these problems are inescapable, but we suggest the maximal use of authentic simplification in interfaces as a remedy. In addition, it is desirable to have access to multiple levels of detail in the interface, so that each level is an authentic simplification of the level below. In such interface strata, layers have the same structure and only differ in the level of detail. Thus, investigators are given, as far as possible, the possibility of smooth progression from minimal understanding to full understanding of the software. The bottom level interface should in its conceptual structure be very close to the source code itself.

Comment » | Bioinformatics, Computer science, Philosophy, Software development

The “Friedrich principles” for bioinformatics software

September 13th, 2012 — 12:51am

I’ve just come back from Biohackathon 2012 in Toyama, an annual event, traditionally hosted in Japan, where users of semantic web technologies (such as RDF and SPARQL) in biology and bioinformatics come together to work on projects. This was a nice event with an open and productive atmosphere, and I got a lot out of attending. I participated in a little project that is not quite ready to be released to the wider public yet. More on that in the future.

Recently I’ve also had a paper accepted at the PRIB (Pattern Recognition in Bioinformatics) conference, jointly with Gabriel Keeble-Gagnère. The paper is a slight mismatch for the conference, as it is really focussing on software engineering more than pattern recognition as such. In this paper, titled “An Open Framework for Extensible Multi-Stage Bioinformatics Software” (arxiv) we make a case for a new set of software development principles for experimental software in bioinformatics, and for big data sciences in general. We provide a software framework that supports application development with these principles – Friedrich – and illustrate its application by describing a de novo genome assembler we have developed.

The actual gestation of this paper in fact occurred in the reverse order from the above. In 2010, we begun development on the genome assembler, at the time a toy project. As it grew, it became a software framework, and eventually something of a design philosophy. We hope to keep building on these ideas and demonstrate their potential more thoroughly in the near future.

For the time being, these are the “Friedrich principles” in no particular order.

  • Expose internal structure.
  • Conserve dimensionality maximally. (“Preserve intermediate data”)
  • Multi-stage applications. (Experimental and “production”, and moving between the two)
  • Flexibility with performance.
  • Minimal finality.
  • Ease of use.

Particularly striking here is (I think) the idea that internal structure should be exposed. This is the opposite of encapsulation, an important principle in software engineering. We believe that when the users are researchers, they are better served by transparent software, since the workflows are almost never final but subject to constant revision. But of course, the real trick is knowing what to make transparent and what to hide – an economy is still needed.

Comment » | Bioinformatics, Computer science, Software development

Identity games

May 14th, 2012 — 10:53pm

I’ve recently seen the film Tinker, Tailor, Soldier, Spy, based on John le Carré’s novel with the same name. In the 1970’s a TV series based on the same novel, with Alec Guinness as George Smiley, was very popular in Britain. This film, with Gary Oldman as the protagonist, is supposed to be something like an update for the new generation.

It is a very good film indeed. (I cannot remember the last time I was so gripped by a film shortly after its release.) I was also inspired to read several of le Carré’s novels, including but not limited to Tinker, Tailor, Soldier, Spy. What they have in common is a subtle, rich portrayal of the spy trade from the viewpoint of Britain during the cold war; a world that seems to be, increasingly, a thing of the past. Voice recognition, social profiling and data mining seems to be taking the place of a good chunk of what le Carré calls tradecraft – the concrete skills that spies with 1970’s technology need in order to perform their work on the ground in enemy territory – and computer scientists like myself are to blame.

While being hailed as the anti-Ian Fleming due to his relatively gritty realism, Le Carré is not without his own spy romanticism. But the bleakness inherent in the work comes through on every page.

In his commentary on the film, le Carré states that

[The world of spies is] not so far from corporate life, from the ordinary world. At the time of writing the novel, I thought that there was a universality that I could exploit. The book definitely resonated with the public; people wanted to reference their lives in terms of conspiracy, and that remains central to the relationship between man and the institutions he creates.

There is something profound in this. Spies are merely concentrated versions of something that we all are ourselves, something that we must be every day. Spies project false personalities in order to gain access and information, either about enemy assets or about other spies. They hide to survive, and they hide so that they may uncover a kind of truth. With a view to the spy as the most concentrated form of a certain kind of existence, let us take a look at some other forms that this existence may take.

The modern professional. To be professional means to effectively project a professional identity in the workplace. To be unprofessional almost always means that too much of another, possibly more genuine personality shines through – one has become too unrestrained. The professional needs to always be projecting, to a degree, in order to remain compatible with the workplace and retain his income and career prospects. Young people are socialised into this condition very early – at career workshops, students learn how to polish their CVs, how to embellish their record, and to hide their flaws. This is essentially a partial course in spycraft. But all this is only at the entry level. When any kind of sophisticated politics enters the organisation – as it does – the professional may be pushed ever closer to the spy. A recruiter: “Too bad that we couldn’t hire him, he seemed genuine.”

The academic. The academic can be thought of as a special version of the professional with some essential differences. First, professionals do not yet have universal records that follow them around for their entire lifetime – much of the “record” that they create, which is associated with the persona they are supposed to project, exists only in the memory of people and of one organisation. Academics build their records with units such as publications and conference attendance. Publications in particular form an atomic record that does not go away. On the other hand, the everyday life of the academic may – possibly – be less artificial than that of the professional, since focus is on the production of publishable units, not on pleasing people in one’s surroundings as much as possible.

The philosopher.  Philosophers seek to uncover some hidden truth about the world. In this sense, they are spies without enemies. The philosopher lives among people with a view to analysing them and understanding their behaviour, so that he can explain it to them. But most of the time the philosopher is likely to be a flaneur or a quiet observer, like the spy often is: someone who seeks to learn something hidden from situations that other participants may regard as being routine and their everyday existence. In this sense spies may have something in common with philosophers.

Here I have highlighted a phenomenon but not made any recommendations. Maybe it’s for the better that we are all a little bit like spies. Masks of some kind are worn in most social interactions, not just the ones above, and they are not a recent phenomenon. Exposing something like a true inner self requires that the inner self remains static long enough for it to be possible to expose. But the difference between most social relationships and the relationships we have with institutions today is that the former can change or dissolve naturally to fit spontaneous changes in people’s characters or needs. Relationships between people and modern institutions do not seem to be capable of this dynamic as of yet.

1 comment » | Life, Philosophy

Back to top