Tag: scala

The year and decade in review. 2020s: orderly peace?

December 30th, 2019 — 8:49am

2019 comes to a close, and with it the 2010s. Below are a few thoughts on these periods of time.

The most significant book I’ve read in 2019 is probably Hannah Arendt’s The Origins of Totalitarianism. The German title, literally “Elements and Origins of Totalitarian Rule” more closely reflects the contents of this monograph. Arendt explores antisemitism, imperialism and totalitarianism to form a grand analysis of totalitarian forms of government, which she considers to be genuinely new and unprecedented. Those who make it through the somewhat slow early chapters will be richly rewarded. It’s a very timely book – although written in the 1950’s, most of the ideas feel like they could be from last week. Elements of totalitarian rule are absolutely something we should worry about.

Another notable book from this year has been Edward Snowden’s Permanent Record. Aside from the obvious political dynamite, I found myself relating to a lot of the experiences he had growing up. Perhaps this is a generational story. In the late 90s, the Internet suddenly became relatively mainstream and for a short while, it was a very special place, seemingly full of utopian promise and all kinds of possibilities and exploration. For many born in the mid-80s this coincided with our teenage years.

I’ve lived in Japan throughout the 2010s, the final part of the Heisei (平成) era. In 2019 this era came to a close and we are now officially in Reiwa (令和). I can’t easily summarise the 2010s. Both my personal life and Japan seem to have undergone great change during this time, and sometimes it’s hard to separate one from the other. The Fukushima incident in 2011 was perhaps a watershed moment that Japan is still grappling with. Although the future of nuclear power has not yet been resolved, the country’s response to such a tense incident has in many ways been admirable, and the famous Japanese virtue (sometimes a double-edge sword) of stability certainly came through. The surrounding world is also changing, and Japan, though still a relatively separate culture, is becoming considerably more open and mixed as a society, perhaps out of necessity. Tourism and labour imports have both increased significantly. This raises interesting questions about what kind of society Japan might be in 10 – 20 years.

During the decade I have had diverse personal and professional experiences. I lived in Tokyo, Osaka, then Tokyo again. I was able to complete a PhD thesis. I visited many countries for the first time, and became interested in bioinformatics (mainly as a field in which to apply fundamental computer science and software engineering). I took up several new hobbies, obtained permanent residency in Japan, and was able to improve my Japanese to the point of reading novels, although I’m still not quite where I’d like to be with the language. I’ve been reading a lot of philosophy and general literature and tried to systematically develop a worldview (fragments of which sometimes appear on this blog). Not everything I tried to do worked out the way I expected, but the learning has felt very valuable, and I do feel much wiser and more capable about my approach to many things. I expect to be sincerely expressing the same sentiment in the year 2029, though.

One technical focus this year was improving my Spark (and Scala) skills and developing an algorithm for De Bruijn graph compaction (similar to what Bcalm does). I was pleased with the efficient research process I was able to achieve, probably my best ever on this kind of project. In terms of my professional path, the overall trend for me seems to be towards smaller firms and greater independence. (Although I remain with Lifematics, I will now also be available for consulting and contracting opportunities in bioinformatics as well as general software development. If you are reading this and think you would like to work with me, do get in touch.)

Thus ends a politically very strange decade, from a global perspective, and we enter the brave new world of the 2020s. Will it be a time of “orderly peace”, as the name 令和 suggests?

1 comment » | Bioinformatics, Computer science, Life, Philosophy

The “Friedrich principles” for bioinformatics software

September 13th, 2012 — 12:51am

I’ve just come back from Biohackathon 2012 in Toyama, an annual event, traditionally hosted in Japan, where users of semantic web technologies (such as RDF and SPARQL) in biology and bioinformatics come together to work on projects. This was a nice event with an open and productive atmosphere, and I got a lot out of attending. I participated in a little project that is not quite ready to be released to the wider public yet. More on that in the future.

Recently I’ve also had a paper accepted at the PRIB (Pattern Recognition in Bioinformatics) conference, jointly with Gabriel Keeble-Gagnère. The paper is a slight mismatch for the conference, as it is really focussing on software engineering more than pattern recognition as such. In this paper, titled “An Open Framework for Extensible Multi-Stage Bioinformatics Software” (arxiv) we make a case for a new set of software development principles for experimental software in bioinformatics, and for big data sciences in general. We provide a software framework that supports application development with these principles – Friedrich – and illustrate its application by describing a de novo genome assembler we have developed.

The actual gestation of this paper in fact occurred in the reverse order from the above. In 2010, we begun development on the genome assembler, at the time a toy project. As it grew, it became a software framework, and eventually something of a design philosophy. We hope to keep building on these ideas and demonstrate their potential more thoroughly in the near future.

For the time being, these are the “Friedrich principles” in no particular order.

  • Expose internal structure.
  • Conserve dimensionality maximally. (“Preserve intermediate data”)
  • Multi-stage applications. (Experimental and “production”, and moving between the two)
  • Flexibility with performance.
  • Minimal finality.
  • Ease of use.

Particularly striking here is (I think) the idea that internal structure should be exposed. This is the opposite of encapsulation, an important principle in software engineering. We believe that when the users are researchers, they are better served by transparent software, since the workflows are almost never final but subject to constant revision. But of course, the real trick is knowing what to make transparent and what to hide – an economy is still needed.

Comment » | Bioinformatics, Computer science, Software development

What makes a good programming language?

September 2nd, 2011 — 6:00pm

New programming languages are released all the time. History is littered with dead ones. There are also many long time survivors in good shape, as well as geriatric languages on life support.

What makes a programming language attractive and competitive? How can we evaluate its quality? There are many different aspects of this problem.

Ease of reading and writing, or: directness of the mapping between the problem in your head and the model you are creating on the computer. This can be highly domain dependent, for instance languages such as LaTeX, Matlab and R are designed with specific problems in mind and cater to users from that domain. Their limits show quickly when you try to stretch them beyond their envisioned purpose. Speaking of general programming languages, I think Python deserves to be mentioned as a language that is extremely readable and writable. It has other shortcomings though – see below. Prolog is also highly read- and writable if it suits your problem.

Runtime performance. Arguably this is one of the few reasons to bother with using C++. For the majority of programming projects though, performance is much less of a problem than one might think, especially if one considers how close the performance of many JVM languages get to C++. When programmers think about their overall productivity and effectiveness in developing and maintaining a system, C++ is often not the best choice, obviously.

Scalability to large teams. The key property here is: does the language do anything to help me, as a developer, work with code that other people wrote? Ease of maintenance may be strongly correlated with usability in large teams. An anti-pattern here is languages that allow for solving the same problem in a huge amount of ways with very variable syntax. For instance, Perl and C++ can lead to notoriously unmaintainable code if used carelessly. Some say that Scala also suffers from this problem. Basically, the language helps here if it prevents me from doing things that other developers might not expect, and that I might forget to document or communicate. This is why Gosling famously called Java a blue collar language; it restricts you enough to make teamwork quite practical. It even restricts the layout of your source file hierarchy. (Now we begin to see that some goals are in conflict with each other).

Scalability to large systems. This is related to the preceding property, but whereas team scalability seems to be mainly about avoiding the creation of code fragments that surprise people other than their creators, system size scalability seems to be about avoiding the creation of code fragments that surprise other code fragments. Here one needs invariants, good type checking, static constraints of all kinds. Scripting languages like Perl and Python, lacking static typing completely, are some of the worst in this regard, since we cannot even be sure at startup time that methods we try to invoke on objects exist at all (Python).

Scalability over time (maintainability). If there is both system size scalability and team scalability, then the system is also likely to be able to live for a long time without great troubles.

Developer efficiency and rapid prototyping. Depending on the nature of the system being developed, this may depend on several different properties listed above.

Availability of quality tools. Mature runtime environments, such as the JVM, have many more high quality tools and IDEs available than a language than Ruby. Mature languages also have more compilers for more different architectures available.

These points begin to give us an idea of how we can evaluate programming languages. However, I also believe that making a good language and making people use it is largely about luck and factors outside the design itself. Just like there’s a big step between imagining and specifying an utopian society and making that social order an actuality, there’s a big step between designing an ideal programming language and achieving widespread adoption for it. We have seen a way forward though: with generalised runtime environments such as the JVM and the CLR, we may develop and deploy languages that take advantage of a lot of existing infrastructure much more easily than before. And what I hope for is in fact that it becomes even easier to deploy new languages, and that new languages are as interoperable as possible (insofar as it doesn’t constrain their design), so that we could see more competition, more evolution and more risk taking in the PL space.

Comment » | Computer science, Software development

Pointers in programming languages

August 26th, 2011 — 12:21am

It is likely that few features cause as much problems as pointers and references in statement-oriented languages, such as C, C++ and Java. They are powerful, yes, and they allow us to control quite precisely how a program is to represent something. We can use them to conveniently compose objects and data without the redundancy of replicating information massively. In languages like C they are even more powerful than in Java, since just about any part of memory can be viewed as if it were just about anything through the use of pointer arithmetic, which is indeed frightening.

But they also complicate reasoning about programs enormously. Both human reasoning and automated reasoning. Pointers allow any part of the program to have side effects in any other part of the program (if we have a reference to an object that originated there), and they make it very hard to reason about the properties that an object might have at a given point in time (since we generally have no idea who might hold a reference to it – it is amazing that programmers are forced to track this in their heads, more or less). In my effort to design my own language, multiple pointers to the same objects – aliases – have come back from time to time to bite me and block elegant, attractive designs. I believe that this is a very hard problem to design around. Aliased pointers set up communication channels between arbitrary parts of a program.

Nevertheless attempts have been made, in academia and in research labs, to solve this problem. Fraction-based permissions track how many aliases exist and endow each alias with specific permissions to access the object that is referred to. Ownership analysis forces access to certain objects to go through special, “owning” objects. Unique or “unshared” pointers in some language extensions restrict whether aliases may be created or not. But so far no solution has been extremely attractive and convenient, and none has made it into mainstream languages. (I know that someone Philipp Haller made a uniqueness plugin for the Scala compiler, but it is not in wide use, I believe.)

If we are to attempt further incremental evolution of the C-family languages, aliased pointers are one of the most important issues we can attack in my opinion.

2 comments » | Computer science, Software development

Reviewing the second year of Monomorphic

June 30th, 2011 — 8:19pm

In May 2010 I reviewed the state of Monomorphic as a blog. Since it’s now been almost 13 months since that time, let’s evaluate what’s happened in the meantime. Where am I, how did I get here, and where do I go next?

The rate of publication has decreased. Prior to the last evaluation, 55 posts had been published – about one per week. Since then, only 22 new posts have been added. This is partly because I’ve had more academic tasks to carry out, a condition that is set to intensify gradually from here on, and partly because I tried to change my standards for what I wanted to blog about (in some vague, as of yet unspecified way).

Scala is still a very popular topic to blog about, and rightly so, but I no longer feel that I should write about it for the sake of doing so. Others do a much better job of writing about Scala than I could do, because they spend all their time more time with that language. Incidentally, I’m delighted to see that companies are still switching to Scala quite eagerly, and that Martin Odersky and others launched the company Typesafe to help others with the transition. Learning Scala has honestly been one of the most empowering experiences I’ve had as a programmer, and I believe that there is a vast space of possibilities that has yet to be explored in the language. Maybe it’s not a language for everybody (I postpone my judgment on this for now), but if it were in the hands of the right teams with the right discipline, the world would be in a better state. Also, the Scala IDE for Eclipse has been vastly, vastly improved since 13 months ago, at which time it could barely be used.

I’ve become more and more interested in philosophy over the past 18 months or so, and this started to show up in the blog during this interval, with more and more entries tentatively trying to delineate philosophical questions or positions. Initially I was focussing almost only on Nietzsche, but recently I’ve also been reading a lot of Foucault, as well as some others. I’ve probably not been very pedagogical in writing down my thoughts on these topics, but I fear I will never be a pedagogical writer unless I go through some initial struggling attempts. The ideas I’m most interested in currently are causality (I believe that we don’t understand it at all) and free will (I believe that its existence is highly questionable, but very fruitful to criticise and reason about).

Popularity. By far my most popular post has been this little note on Nomura’s Jellyfish. If I put Google adwords on just that post, I would probably make a lot of money without annoying any other readers. For some reason Google directs a lot of people googling jellyfish to this site. As if programming and philosophy are not more interesting things to Google. Other than that, the Scala posts have been very popular, and following them, Continuous computing, Type theory and Politicization of mathematics… were able to attract some attention.

From now on, until early next year, I have to focus more and more on finishing my Ph.D. studies; it remains to see how this will affect my blogging.

Comment » | Computer science, Life, Philosophy

Back to top