Tag: semantic web


The “Friedrich principles” for bioinformatics software

September 13th, 2012 — 12:51am

I’ve just come back from Biohackathon 2012 in Toyama, an annual event, traditionally hosted in Japan, where users of semantic web technologies (such as RDF and SPARQL) in biology and bioinformatics come together to work on projects. This was a nice event with an open and productive atmosphere, and I got a lot out of attending. I participated in a little project that is not quite ready to be released to the wider public yet. More on that in the future.

Recently I’ve also had a paper accepted at the PRIB (Pattern Recognition in Bioinformatics) conference, jointly with Gabriel Keeble-Gagnère. The paper is a slight mismatch for the conference, as it is really focussing on software engineering more than pattern recognition as such. In this paper, titled “An Open Framework for Extensible Multi-Stage Bioinformatics Software” (arxiv) we make a case for a new set of software development principles for experimental software in bioinformatics, and for big data sciences in general. We provide a software framework that supports application development with these principles – Friedrich – and illustrate its application by describing a de novo genome assembler we have developed.

The actual gestation of this paper in fact occurred in the reverse order from the above. In 2010, we begun development on the genome assembler, at the time a toy project. As it grew, it became a software framework, and eventually something of a design philosophy. We hope to keep building on these ideas and demonstrate their potential more thoroughly in the near future.

For the time being, these are the “Friedrich principles” in no particular order.

  • Expose internal structure.
  • Conserve dimensionality maximally. (“Preserve intermediate data”)
  • Multi-stage applications. (Experimental and “production”, and moving between the two)
  • Flexibility with performance.
  • Minimal finality.
  • Ease of use.

Particularly striking here is (I think) the idea that internal structure should be exposed. This is the opposite of encapsulation, an important principle in software engineering. We believe that when the users are researchers, they are better served by transparent software, since the workflows are almost never final but subject to constant revision. But of course, the real trick is knowing what to make transparent and what to hide – an economy is still needed.

Comment » | Bioinformatics, Computer science, Software development

“True Knowledge”: Another search engine

June 16th, 2009 — 12:16am

I previously commented on Wolfram Alpha and PowerSet. Fisheye Perspective now brings my attention to another “answer engine” as they are called these days: True Knowledge. You have to sign up for an account in order to test it, which I have yet to do, but one feature that’s immediately appealing is that users can add and edit content. This was apparently one of the main design principles. But is this then just an alternative to Wikipedia? Not necessarily, as it also has an inference system (it can deduce facts from other facts). And it has an API for programmatic access. I can think of many interesting uses for an online user-edited inference-enabled knowledge base, if they can get the details right. These things are still in their infancy (I hope, since I want them to be better).

2 comments » | Uncategorized

Two new-ish search engines

May 26th, 2009 — 7:59am

Recently, while reading about methods for manipulating RDF, I discovered the search engine PowerSet. More recently, Wolfram Research’s Wolfram Alpha launched. There’s been no shortage of new search engines in the past year or so – Cuil is one that was much publicized but ended up remarkably useless – but these two still impress me.

PowerSet impresses me because of its interface – I can easily see what a particular match is about without leaving the list of search results. Speeding up the typical use cases like this is very important for usability.

Wolfram Alpha impresses me because of the quality of the results. Maybe I’m in the minority thinking this – the press seems to have been giving it mostly negative reviews. Clearly WA is not intended as a Google replacement, but perhaps it was described as being one at some point. Today, being available to the public, it’s something different. It lets me look at data, mostly of the quantitative sort, and make all sorts of semi-interactive charts and comparisons. Here are some searches I liked: earthquakes in Japan, 1 cup of coffee, Tokyo to Osaka. I especially like the interactive earthquake graph.

WA is not without its problems though. Sometimes it’s hard to figure out what kind of queries you can make. I found the above mostly by experimentation. If they exposed more details about their data model and what they knew about each kind of object, maybe this would be easier. Right now I’m wondering why I can do a query like “largest cities” but not “largest cities in mexico”, for instance. I suppose this is mainly a question of maturity both on behalf of the system and of its users, though.

Search engines like PowerSet and WA are indicative of a broader trend towards semantics in computing and internet usage. While the semantic web isn’t here yet in the sense that we don’t have a semantic web browser or a unified way of querying the internet, clearly services that are based very heavily on semantic models are becoming mainstream. More on the impact of this in a future post.

1 comment » | Uncategorized

Research idea: a snapshot

May 19th, 2009 — 3:08am

As part of an application form I had to fill out recently, I had to write a summary of my research ideas. Of course this changes all the time, since I’m still searching for a precise topic (and probably will be for a long time). But this is what a snapshot of those thoughts, taken now, looks like:

One of the most important problems in software engineering is reducing the impact of change. To this end, recently methods such as inversion of control (dependency injection) have become popular, in order to reduce the coupling to concrete interfaces. However, even with these schemes, there is still a dependency on specific names and abstract interfaces.  My project aims to investigate the possible use of semantic methods to address this problem. In essence, I want to allow developers to use semantic interfaces rather than syntactic ones to describe and access their components. 

Specifically, I am investigating techniques commonly used in the context of Semantic Web Services, such as ontologies and semantic/syntactic mediation, and their applicability to this problem. 

We may regard services as being somewhat large scale components. However, I am interested in applying these methods not just for large scale services distributed across the web, but also for small and numerous software components running in a single process. In such a setting, performance and scalability are important issues to investigate, in addition to the usual problems of reliability, correctness of composition, etc.

Comment » | Uncategorized

Back to top