Posts tagged “software engineering”.

Gregorian misery

The Gregorian calendar has been in use since 1582. Among its features is a moderately complicated rule for leap years: if n mod 4 is 0, then n is a leap year. However, if n mod 100 is 0, then n is not a leap year, unless n is a multiple of 400.

In addition, we live in a world with timezones and regional differences in when countries go on and off daylight savings time, if they have such a system. As yet another example of Japanese rationality, Japan does not have a DST system.

Implementing date and time computations correctly can be very hard for computer programmers and is invariably a source of many hidden bugs that may take a long time to discover. Yesterday, a large amount of Sony’s Playstation 3 game consoles stopped working normally.  This was later fixed. There was speculation the error was due to incorrect leap year handling. It wouldn’t be the first time this occurred if this was indeed the reason.

In a software company where I used to work, there would usually be massive troubles every time some country went on or off daylight savings time, or any other time calculation hit a sensitive spot. I’m fairly sure that the world’s software systems, including government, finance, insurance, health care, suffer untold billions of damage every year due to the complexity of the system. Maybe we should simplify it.

I suggest having “years” with 365 x 4 + 1 = 1461 days instead of the usual year for starters. This would move the leap year problem ahead until year 2100, when the next special rule comes in. By that time, software engineering technology should have improved enough that this should no longer be an issue, I hope. If not, we can invent another system by then. Let’s also scrap all daylight savings time everywhere. It’s easy to do and the savings would be huge.

Tips for academics who develop software

Academics and practitioners, having rather different goals in life, tend to approach software development in quite different ways. No doubt there are many things each side of the fence can learn from the other, but I think academics in particular could often benefit quite a lot by adopting some of the practices used in industrial development. And not just computer science academics!

A common misconception is that these techniques only are useful with large projects and large teams. I find, though, that they can help reduce much of the growth pains even in small projects, helping them reach maturity much faster.

Use version control. Classical, but invalid, counter arguments include “it’s a hassle and too much work to set up”, or “there’s only one person working on this project anyway”. Even if it’s only you, you will benefit massively from being able to undo your changes far back in time. It will let you experiment safely. Plus, setup is no longer an issue with free and easy-to-use services like github and bitbucket. My tool of choice is now Mercurial, and I used to use SVN. And there are many other good choices.

Use a debugger. If there is a debugger available for your language, and there most certainly is, then you should use it to find nontrivial errors, rather than extensive printf style testing.

Don’t optimise prematurely, but when you need to, use a profiler. Profilers tell you where a program’s performance bottlenecks are. You can profile things like heap usage (what classes use most space in Java, for instance) and CPU usage (which functions use the most CPU time). For Java, I’ve discovered that the NetBeans IDE has a very good built in profiler. Eclipse also has one, but it didn’t work on Mac last time I checked. For C/C++, GProf used to be good and probably still is.

Use unit testing wisely. All of the above apply even to very small projects, but I think some projects are too small to need unit tests, at least initially. You be the judge. I find that unit tests can have a lot of benefit when applied to the fragile, complicated parts of a system, where many different things interlock. If you are ambitious you can also write tests first and code later — test driven development.

Use a good IDE if you can. For a language like Java, where you have to type a lot of code to get something done and spread out your code across lots of files, a good IDE that can generate boilerplate code and navigate quickly can really speed up your work. It’s beneficial for other languages too. But I have no problem with people who use pure vim or emacs, after all these are practically IDEs.

I believe that honing your software development skills as an academic can pay off. Also see: Daniel Lemire on why you should open source your projects. (I will get around to doing this eventually, I promise ;-) )

Nietzsche on software (?)

In his first amendment to Human, All Too Human (1886), entitled Miscellaneous Maxims and Opinions, Friedrich Nietzsche states that

300. HOW FAR EVEN IN THE GOOD THE HALF MAY BE MORE THAN THE WHOLE. — In all things that are constructed to last and demand the service of many hands, much that is less good must be made the rule, although the organiser knows what is better and harder very well.He will calculate that there will never be a lack of persons  who can correspond to the rule, and he knows that the middling good is the rule. — The youth seldom sees this point, and as an innovator thinks how marvelously he is in the right and how strange is the blindness of others. (Helen Zimmern transl.)

Friedrich Nietzsche did not describe software making – I can only assume that he was describing authors and ideologists – but this seems to capture the difficulties of software development only too well. And it seems to give a recipe for how to overcome the communication difficulties (abandon exotic, over-refined solutions and focus on an easily understood middle ground, so that everybody can get together and comprehend the architecture). This was originally published in 1886.

With that, merry christmas!

A wikipedia of algorithms

Here’s something I’ve wanted to see for some time, but probably don’t have time to work on myself.

It would be nice if there was a wikipedia-like web site for code and algorithms. Just the common ones to start with, but perhaps more specialised ones over time. Of course the algorithms should be available in lots of different languages. This would in fact be one of the main points, so that people could compare good style and see how things should be done for different languages. In addition, there should be an in-browser editor, just like on Wikipedia (but perhaps with syntax highlighting) so people can make changes easily.

Furthermore, there should be unit tests for every algorithm, and these should be user-editable in the same way as the main code. In an ideal world, the web site would automatically run the unit tests every time there’s a change to some algorithm and check in a new version of the code to a versioned repository. People could then trust with reasonable confidence that the code is valid and safe. However, if the system were to be as open as Wikipedia is, such a system wouldn’t work, since users could write unit tests with malicious code. So I suspect volunteers would have to download, inspect, and run the unit tests regularly, and perhaps there would be a meta-moderation system of some kind, allowing senior members to promote changes to the official repository. In the meantime, everybody should be allowed to see and edit changes on the wiki immediately, but they would be marked as “untested” or “unsafe”.

User interface would be very important since this kind of site needs to be fun and easy to use regularly.

Has this kind of project already been carried out by someone? I can find some things by googling. The Code Wiki appears to once have been a wikipedia of code, but it seems defunct, C# only, and now they’re selling a book with the contents of the site! Algorithm Wiki has many algorithms in different languages, but the user interface is awkward and littered with obstructive advertising, the code is hard to browse, and it doesn’t make for a usable quick reference. They seem to have gotten off to a good start though. Any others?

Edit: Rosetta Code seems to be the most mature and useful such site out there today.

Scala and actors

Programming with actors was a new concept to me until I tried it out in Scala. It’s appears to be one of Scala’s most celebrated features, judging by the official blurb. Actors was a daunting word at first but it really ends up being a very simple concept.

Actors are a programming model for concurrent programming. With conventional mutex/monitor based programming in Java, say, programmers hold and release locks (the synchronized keyword) to achieve safe concurrency. Condition variables are used for thread communication (the wait and notify family of functions on java.lang.Object). Communication is synchronous: a typical case would be that you change some condition, invoke notifyAll to wake up threads waiting on that condition, and then they can take over the relevant lock and proceed to do some processing.

An actor is a unit of execution with an asynchronous message queue. Actors can receive messages from other actors or send messages to other actors at any time, however, the messages wait in the receiving actor’s “mailbox” until the actor has time to receive it.

As a simple example, let’s develop a program that converts text files to upper case using actors. The program will have an “Input” actor, an “Output” actor, and a number of “UpperCase” actors that do the processing. First the Input actor:

import scala.actors._
import java.io._
 
class Input(in: BufferedReader) extends Actor {
	def act() {
	  while(true) {
	    receive {
	      case Next => { sender ! Line(in.readLine()) }
	    }
	  }
	}
}

It’s worth noting that the Actor system is implemented completely in the libraries, outside of the core language. Actors are not first class constructs, but sometimes look as if they were. The act method is where actors begin their execution. The receive method causes them to block and wait for a message, which we may pattern match on. The sender variable corresponds to whoever sent the last message received, and the ‘!’ operator sends a message. So whenever this actor receives the Next message, it will respond with the next line from a buffered reader.

Then, the UpperCase actor:

import scala.actors._
 
case class Next
case class Line(x: String)
 
class UpperCase(input: Actor, out: Actor) extends Actor {
	def act() {
		while(true)
		{
			input ! Next
			receive {
			case Line(x:String) => { out ! x.toUpperCase() }
			}
		}
	}
}

This actor is created with in- and output actors as its constructor parameters. It continually asks the input actor for a new line, converts it to upper case, and sends it to the output actor. Also note the case classes here, which are for pattern matching only. They are a bit like algebraic data types in Haskell.

Finally, the Output actor:

import scala.actors._
 
class Output extends Actor {
	def act() {
		while(true)
		{
			receive {
			case x:String => { println(x) }
			}
		}
	}
}

And then we have to tie it all together:

import java.io._
 
object Demonstration {
 
  val reader = new BufferedReader(new InputStreamReader(System.in))
 
  def main(args: Array[String]) {
 
    val in = new Input(reader)
    in.start
 
    val out = new Output()
    out.start
 
    1.to(5).foreach(x => {
      val tr = new UpperCase(in, out)
      tr.start
    })
  }
}

Here I abuse the foreach notation slightly to create 5 parallel text processors. Each actor runs on its own thread (though there are ways to prevent this if one wants very large numbers of actors). Now of course, the lines will probably be output in the wrong order. Another obvious shortcoming is that there is no clean shutdown protocol that terminates all the actors when the input stream is fully read. Solving these problems is outside of the scope of this article.

Some other interesting resources on actors: the official tutorial, the papers (slightly more academic but accessible to the monomorphic reader, I imagine). Debasish highlights how actors can be used to get threadless concurrency, Erlang-style.

Searching and creating

We distinguish between inventions and discoveries. You can own the intellectual property rights to an invention, but not to a discovery (you can’t patent the discovery of mercury or selenium, for instance). Inventions are meant to be created, and discoveries are meant to be sought for. But sometimes, the line between invention and discovery is blurry.

We cannot own the rights to mathematical structures or theorems, since they follow directly from axioms. Anyone with a mathematical education would come to the same results within the same axiomatic system. The creation of a mathematical theorem can be said to be a search process, hence the term “discovery” and not “invention”.

We can own the rights to music and paintings, since these are considered to be inventions. But isn’t the process that leads to a painting or work of music being created also a search process? Doesn’t the artist search for possible combinations that work together, in a — albeit very large and continuous — search space? But this is considered to be creation/synthesis rather than search.

The software developer is, at least sometimes, somewhere in between. A vision of a user interface that interacts with end users in a certain way can perhaps be said to come from the same large, continuous space as music and paintings come from. But given the constraints imposed by such a vision, and by the platform on which the system is to be built, the available libraries, the languages, etc, I would say that the construction of much of desktop/consumer software is a search problem. We look for combinations of components that fit the constraints, and when we have decided on this combination, we must connect the pieces together correctly. The space of possible solutions here, at least for someone who follows good design principles, is in essence much smaller than the music/painting search space. Of course there are considerations of taste and style, but they are completely irrelevant to the compiled product. They are a programmer aid.

Artificial intelligence problems are defined as search problems. But what are search problems, and what are “creational” problems, precisely? Is it merely a question of the size of the search/design space?

The ego fallacy

A senior manager at a company I used to work at once said that (making) software is a very social activity. I didn’t have much experience, and was very surprised at the time, since I had never thought about the human aspect of software development. But of course this aspect is extremely important. For example, in any setting with more than one programmer working on a project, the need for well functioning communication is huge, as much as in any other job I suspect. Projects often fail due to a lack of communication.

Another human side to software development is that some developers, this author included, easily start seeing the code they write as their own intellectual turf. If somebody challenges the developer’s practices or code, offering a better solution, it will be met with massive resistance. Partly out of laziness, but partly, I think, out of a desire to protect their territory and their legacy.

I do this myself more often than I would like. And it leads to bad results because it creates obstacles to communication and means that team members pull in different directions. Thus, somehow the incentives are wrong. If everybody’s goal were to allow the team to deliver a good product quickly, this would not happen. Why is it that your goal after some time with a project sometimes becomes to defend what you have created? Why do we identify with the code we wrote, and not with the bigger project?

This doesn’t mean that looking to your own interests or to your ego is a bad thing – rather that it’s easy to be shortsighted about what is in your best interests.

Software roundup

I enjoy experimenting with new software  just to see what people come up with. There’s just so much unknown software to discover. I suspect most people find something they like and then stick with it until it doesn’t work anymore, but there’s something to be said for proactively replacing your software and searching for better things. Here are some things I’ve taken a liking to recently.

New iPhone OS: Apple released version 3.0 a few days ago. It’s been all positive so far. I get spotlight search for the phone, the keyboard feels more responsive, it finally has copy and paste, Youtube appears to have higher quality, and a slew of other features. (Also MMS which I don’t really need in Japan).

GWT: I started playing with the Google Web Toolkit just for fun. For those who don’t know, it’s an API for developing AJAX based web applications using pure Java, which is then compiled to client side Javascript and a Java servlet. It turns out I can be extremely productive with it – I found that it lets me develop fairly advanced web applications using my existing skills. There’s a very high reward/effort ratio that makes me excited. It feels like I don’t need to learn Ruby on Rails properly when I have GWT given that I’m very comfortable developing in Java.. but we’ll see.

Fluid.app: A web browser enhancement for the Mac that allows you to create separate “applications” from web sites you visit often. This means they will show up in the task bar and application switcher, have their own icon, and occupy less screen space. It sounds simple, but it’s a revelation. (And if you’re like me, you tend to have 15+ tabs open in your web browsers constantly, which is a poor way of managing windows).

Chandler: Like many others, I found out about this little calendar and note manager by reading Scott Rosenberg’s Dreaming in Code , which chronicled the misfortunes of an open source startup project. It went 1.0 last year after many years of development. The impression you get from the book is that the developers had a lot of bad luck despite setting out with the right ambitions. This is now an old debate, but the tool is actually usable today – I’ve been using it every day to manage myself for 3 months. Aside from slight bugs, it feels very smart sometimes, thanks to its unique user interface and features.

The problem with standards

Standardised formats are essential to connections in the digital society. On the hardware side, USB is so ubiquitous and well defined that I can connect essentially any peripheral to any PC. In many cases I can even expect them to work without drivers. For sound, the 3.5 mm headphone plug has been ubiquitous as long as I can remember. For video, there is VGA, DVI, and so on. Even if some of these change every now and then, we can always buy converters to convert from one format to another. The changes happen infrequently enough that this is a bearable burden.

On the software side, from a user perspective, it gets tougher. In terms of document formats, everybody is expected to have a web browser, to be able to read PDF documents, and to play MP3 files. There’s also a certain expectation that people can read and edit Microsoft Office documents, but in general, unless people have Microsoft Office itself, they are likely to use some program that does an imperfect job of converting to and from the office formats. More exotic formats people do not expect to share unless they know that the receiver has the same software.

One of the worst situations (and maybe one of the sources of these troubles) is software architecture. To look at just one subset of this universe, let’s consider some popular Java frameworks. There are many Java frameworks. I can think of Struts, Spring, Enterprise JavaBeans, Tomcat, JBoss, Websphere, OSGI. Each one is trying to be a universal, or near universal, component based framework for large scale Java applications. (Some of these are intended specifically for web applications, but the distinction is a bit arbitrary these days). Each one defines a particular behaviour and API that its users need to fit into.

One problem is that once you’ve developed your application in one of these frameworks, it’s tough to move to a different one. At best it’s “just” a matter of using a different API. At worst, the new framework has different expectations of component behaviour and forces you to either refactor your old application majorly or write lots of adapters.

Software has this particular kind of stickiness: objects designed in one context tend to be difficult or impossible to move to other contexts. Here are some problems brought about by this fact:

  • Across several different devices and services, I have many different lists of my various contacts. For instance, I have an address book on my computer, which fortunately is shared with my iPhone. However, there’s also a list that Gmail maintains, one list on MSN messenger, one on ICQ, one on Facebook, one on the Playstation Network… etc. In a world with perfect software, I would have a single contact list that could be updated and accessed from any of these services (and I would still be able to selectively hide things from any one service, if I wanted to).
  • Most social networks support some kind of status update feature (Twitter, Facebook, …) but they generally can’t share this information with each other. There’s a tool called ping.fm which is able to update lots of networks at once, but this is just for a single feature. And it took a third party tool to do it. And I have to give it all of my logins and passwords.
  • The health care systems in many countries are a mess (UK’s NHS and probably most US systems fall in this category). Different medical offices in different medical IT systems establish different journals, but they have no way of sharing the journal if the patient wants to use a different care provider. As experiences in the UK showed, achieving the ability to share these records can be very difficult sometimes.

So how can we design really future proof standards? Here are my intuitive, unqualified answers:

Favor openness over closedness. Allow unknown entities to enter your protocol at many different stages and in many different roles. Even if you don’t design for this, it will tend to happen. Better to design for it.

Allow extensibility in many directions. Frameworks often allow you to make more specific versions of their components, but it’s rare that you can generalize or inject behaviour at a middle level of a concept hierarchy.

Use semantic metadata. Ontologies are under heavy research in computer science. They are formal classifications of concepts and their relations. One popular format for developing ontologies is RDF(S); for an example of what the concepts can look like, check this RDF(s) directory. I will have reason to come back to this topic in the future.

One example of a brilliant long-lasting standard: TCP/IP. Somehow in this case, the designers got something miraculously right at the first try. I’m amazed at how we keep building on this standard every day.

Research idea: a snapshot

As part of an application form I had to fill out recently, I had to write a summary of my research ideas. Of course this changes all the time, since I’m still searching for a precise topic (and probably will be for a long time). But this is what a snapshot of those thoughts, taken now, looks like:

One of the most important problems in software engineering is reducing the impact of change. To this end, recently methods such as inversion of control (dependency injection) have become popular, in order to reduce the coupling to concrete interfaces. However, even with these schemes, there is still a dependency on specific names and abstract interfaces.  My project aims to investigate the possible use of semantic methods to address this problem. In essence, I want to allow developers to use semantic interfaces rather than syntactic ones to describe and access their components. 

Specifically, I am investigating techniques commonly used in the context of Semantic Web Services, such as ontologies and semantic/syntactic mediation, and their applicability to this problem. 

We may regard services as being somewhat large scale components. However, I am interested in applying these methods not just for large scale services distributed across the web, but also for small and numerous software components running in a single process. In such a setting, performance and scalability are important issues to investigate, in addition to the usual problems of reliability, correctness of composition, etc.