Posts categorized “Software development”.

Gregorian misery

The Gregorian calendar has been in use since 1582. Among its features is a moderately complicated rule for leap years: if n mod 4 is 0, then n is a leap year. However, if n mod 100 is 0, then n is not a leap year, unless n is a multiple of 400.

In addition, we live in a world with timezones and regional differences in when countries go on and off daylight savings time, if they have such a system. As yet another example of Japanese rationality, Japan does not have a DST system.

Implementing date and time computations correctly can be very hard for computer programmers and is invariably a source of many hidden bugs that may take a long time to discover. Yesterday, a large amount of Sony’s Playstation 3 game consoles stopped working normally.  This was later fixed. There was speculation the error was due to incorrect leap year handling. It wouldn’t be the first time this occurred if this was indeed the reason.

In a software company where I used to work, there would usually be massive troubles every time some country went on or off daylight savings time, or any other time calculation hit a sensitive spot. I’m fairly sure that the world’s software systems, including government, finance, insurance, health care, suffer untold billions of damage every year due to the complexity of the system. Maybe we should simplify it.

I suggest having “years” with 365 x 4 + 1 = 1461 days instead of the usual year for starters. This would move the leap year problem ahead until year 2100, when the next special rule comes in. By that time, software engineering technology should have improved enough that this should no longer be an issue, I hope. If not, we can invent another system by then. Let’s also scrap all daylight savings time everywhere. It’s easy to do and the savings would be huge.

Tips for academics who develop software

Academics and practitioners, having rather different goals in life, tend to approach software development in quite different ways. No doubt there are many things each side of the fence can learn from the other, but I think academics in particular could often benefit quite a lot by adopting some of the practices used in industrial development. And not just computer science academics!

A common misconception is that these techniques only are useful with large projects and large teams. I find, though, that they can help reduce much of the growth pains even in small projects, helping them reach maturity much faster.

Use version control. Classical, but invalid, counter arguments include “it’s a hassle and too much work to set up”, or “there’s only one person working on this project anyway”. Even if it’s only you, you will benefit massively from being able to undo your changes far back in time. It will let you experiment safely. Plus, setup is no longer an issue with free and easy-to-use services like github and bitbucket. My tool of choice is now Mercurial, and I used to use SVN. And there are many other good choices.

Use a debugger. If there is a debugger available for your language, and there most certainly is, then you should use it to find nontrivial errors, rather than extensive printf style testing.

Don’t optimise prematurely, but when you need to, use a profiler. Profilers tell you where a program’s performance bottlenecks are. You can profile things like heap usage (what classes use most space in Java, for instance) and CPU usage (which functions use the most CPU time). For Java, I’ve discovered that the NetBeans IDE has a very good built in profiler. Eclipse also has one, but it didn’t work on Mac last time I checked. For C/C++, GProf used to be good and probably still is.

Use unit testing wisely. All of the above apply even to very small projects, but I think some projects are too small to need unit tests, at least initially. You be the judge. I find that unit tests can have a lot of benefit when applied to the fragile, complicated parts of a system, where many different things interlock. If you are ambitious you can also write tests first and code later — test driven development.

Use a good IDE if you can. For a language like Java, where you have to type a lot of code to get something done and spread out your code across lots of files, a good IDE that can generate boilerplate code and navigate quickly can really speed up your work. It’s beneficial for other languages too. But I have no problem with people who use pure vim or emacs, after all these are practically IDEs.

I believe that honing your software development skills as an academic can pay off. Also see: Daniel Lemire on why you should open source your projects. (I will get around to doing this eventually, I promise ;-) )

Why Scala? The mixing of imperative and functional style

Scala is a little wonderland sprinkled with useful things you can mix and match as you like to improve your coding experience while staying on the Java platform. The Option classes, the structural case matching, the compact declarations, lazy evaluation… the list goes on. But at the heart of it is the decision to mix freely the functional and imperative programming styles.

How does this work in practice?

  • Statements can have side effects, like in Java
  • The final statement evaluated in a function is its return value by default
  • Every statement evaluates to a value, even control flow statements like if… else, unlike in Java

The bottom line is that some problems call for a functional programming style, and others for an imperative one. Scala doesn’t force you into a mold, it just gives you what you need to express what you’d like to express. This can lead to very compact code. Here’s a function that recursively finds all files ending in .java starting in a given directory. The File class here is the standard Java java.io.File!

Remember, the last expression evaluated is the return value.

 def findJavaFiles(dir: File): List[File] = {
    val files = dir.listFiles()
    val javaFiles = files.filter({_.getName.endsWith(".java")})
    val dirs = files.filter({_.isDirectory})
    javaFiles.toList ++ dirs.flatMap{findJavaFiles(_)}
  }

But we can write it even more compactly at the expense of some clarity:

 def findJavaFiles(dir: File) = {
    val files = dir.listFiles()
    files.filter(_.getName.endsWith(".java")).toList ++
 files.filter(_.isDirectory).flatMap{findJavaFiles(_)}
  }

Now write this function in Java and see how many lines you end up with.

Nietzsche on software (?)

In his first amendment to Human, All Too Human (1886), entitled Miscellaneous Maxims and Opinions, Friedrich Nietzsche states that

300. HOW FAR EVEN IN THE GOOD THE HALF MAY BE MORE THAN THE WHOLE. — In all things that are constructed to last and demand the service of many hands, much that is less good must be made the rule, although the organiser knows what is better and harder very well.He will calculate that there will never be a lack of persons  who can correspond to the rule, and he knows that the middling good is the rule. — The youth seldom sees this point, and as an innovator thinks how marvelously he is in the right and how strange is the blindness of others. (Helen Zimmern transl.)

Friedrich Nietzsche did not describe software making – I can only assume that he was describing authors and ideologists – but this seems to capture the difficulties of software development only too well. And it seems to give a recipe for how to overcome the communication difficulties (abandon exotic, over-refined solutions and focus on an easily understood middle ground, so that everybody can get together and comprehend the architecture). This was originally published in 1886.

With that, merry christmas!

An immutable MultiMap for Scala

The Scala collections library (in version 2.7.7) has a MultiMap trait for mutable collections, but none for immutable ones. I hacked something up to use while waiting for an official version. I’m finding this to work well, but I don’t have much experience with collections design, so it’s likely to have some flaws. Also, this is a class and not a trait, so you can’t use it with any map you like. And from a concurrency perspective, maybe it’s sometimes better to use backing collections other than the HashSet and the HashMap.

 
import scala.collection.immutable._
 
/**
A multimap for immutable member sets (the Scala libraries 
only have one for mutable sets). 
*/
class MultiMap[A, B](val myMap: Map[A, Set[B]]) {
 
	def this() = this(new HashMap[A, Set[B]])
 
	def +(kv: Tuple2[A, B]): MultiMap[A, B] = {
	  val set = if (myMap.contains(kv._1)) {
		  myMap(kv._1) + kv._2
	  } else {
		  new HashSet() + kv._2	     	   
	  }
 
	  new MultiMap[A, B](myMap + ((kv._1, set)))
	}
 
	def -(kv: Tuple2[A, B]): MultiMap[A, B] = {
	  if (!myMap.contains(kv._1)) {
	    throw new Exception("No such key")
	  }
	  val set = myMap(kv._1) - kv._2
	  if (set.isEmpty) {
	    new MultiMap[A, B](myMap - kv._1)
	  } else {
		  new MultiMap[A, B](myMap + ((kv._1, set)))
	  }
	}
 
	def entryExists(kv: Tuple2[A, B]): Boolean = {
	  if (!myMap.contains(kv._1)) {
	    false
	  } else {
	    myMap(kv._1).contains(kv._2)
	  }
	}
 
    def keys = myMap.keys
 
     def values: Iterator[Set[B]] = myMap.values
 
    def getOrElse(key: A, elval: Collection[B]): Collection[B] = {      
      myMap.getOrElse(key, elval)
    }
 
    def apply(key: A) = myMap(key)
 
 
 
}

Usage:

 
   var theMultiMap = new MultiMap[String, Int]()
 
   theMultiMap += (("george", 1))
   theMultiMap += (("george", 3))
   theMultiMap += (("bob", 2))
   theMultiMap -= (("george", 1))

A wikipedia of algorithms

Here’s something I’ve wanted to see for some time, but probably don’t have time to work on myself.

It would be nice if there was a wikipedia-like web site for code and algorithms. Just the common ones to start with, but perhaps more specialised ones over time. Of course the algorithms should be available in lots of different languages. This would in fact be one of the main points, so that people could compare good style and see how things should be done for different languages. In addition, there should be an in-browser editor, just like on Wikipedia (but perhaps with syntax highlighting) so people can make changes easily.

Furthermore, there should be unit tests for every algorithm, and these should be user-editable in the same way as the main code. In an ideal world, the web site would automatically run the unit tests every time there’s a change to some algorithm and check in a new version of the code to a versioned repository. People could then trust with reasonable confidence that the code is valid and safe. However, if the system were to be as open as Wikipedia is, such a system wouldn’t work, since users could write unit tests with malicious code. So I suspect volunteers would have to download, inspect, and run the unit tests regularly, and perhaps there would be a meta-moderation system of some kind, allowing senior members to promote changes to the official repository. In the meantime, everybody should be allowed to see and edit changes on the wiki immediately, but they would be marked as “untested” or “unsafe”.

User interface would be very important since this kind of site needs to be fun and easy to use regularly.

Has this kind of project already been carried out by someone? I can find some things by googling. The Code Wiki appears to once have been a wikipedia of code, but it seems defunct, C# only, and now they’re selling a book with the contents of the site! Algorithm Wiki has many algorithms in different languages, but the user interface is awkward and littered with obstructive advertising, the code is hard to browse, and it doesn’t make for a usable quick reference. They seem to have gotten off to a good start though. Any others?

Edit: Rosetta Code seems to be the most mature and useful such site out there today.

Where is Java going?

creative

Today, Java is one of the most popular programming languages. Introduced in 1995, it rests on a tripod of the language itself, its libraries, and the JVM. In the TIOBE programming language league charts, it has been at the top for as long as the measurements have been made (since 2002), overtaken by C only for a brief period due to measurement irregularities.

Yet not all is Sun-shine in Java world. Sun Microsystems is about to be taken over by Oracle, pending EU approval. (EU is really dragging its feet in this matter but it seems unlikely they would really reject the merger). Larry Ellison has voiced strong support for Java and for Sun’s way of developing software, so maybe this is really not a threat by itself. But how far can the language itself go?

The Java language was carefully designed to be relatively easy to understand and work with. James Gosling, its creator, has called it a blue collar language, meaning it was designed for industrial, real world use. In a world where C++ was the de facto standard for OO programming, Java was a big step forward in terms of ease of development, with its lack of pointers and strong type system – to say nothing of its garbage collection. Many classes of common programming errors were removed altogether. However, in the interests of simplicity and clarity, some tradeoffs were made. The language’s detractors today point to problems such as excessive verbosity, the lack of closures, the limited generics, and the checked exceptions.

For some time there has been a lot of exciting alternative languages available on the JVM. Clojure is a Lisp dialect. Scala, the only non-Java JVM language I have used extensively, mixes the functional and object oriented paradigms. Languages like JPython and JRuby basically exist to allow scripting and interoperability with popular scripting languages on the JVM.

Today it seems as if the JVM and the standardized libraries will be Java’s most prominent legacy. The language itself will not go away for a long time either – considering that many companies still maintain or develop in languages like Cobol and Fortran, we will probably be maintaining Java code 30 years from now (what a sad thought!), but newer and more modern JVM languages will probably take turns being number one. The JVM and the libraries guarantee that we will be able to mix them relatively easily anyway, unless they stray too far from the standard with their custom features.

So in hindsight, developing this intermediate layer, this virtual machine – and disseminating it so widely –  was a stroke of genius. Will it be that in future programming models we have even more standardized middle layers, and not just one?

Meanwhile, there’s a lot of debate about the process being used to shape and define Java. For a long time, Sun employed something called the Java Community Process, JCP, which was supposed to ensure openness. Some people proclaim that the openness has ended. To take one example, very recently, Sun announced that there will be support for closures in Java 7, after first announcing that there would be no support for closures in Java 7. The process by which this decision has been managed has been described as not being a community effort. Some aspects of Java are definitely up in the air these days.

Programming languages are about people

Programming languages are more about people and less about machines.

Programming languages are about staying inside the limitations of people’s minds and their ability to keep track of and work with abstractions. If people had no such limitations, they could code in assembly language all the time.

Programming languages and supporting tools and environments are the interface between people and the raw instruction set of a computer (or a bigger entity, like a network of computers). When we design programming languages, we must take into account not only what the machine environment will do and how it changes, but also how people create the software, how they modify it, how they think about it. Maybe programming languages should even be designed with business processes in mind, in some cases.

But this is also a question of what kind of programmer we want to cultivate. The language shapes the programmer, too.

Scala and actors

Programming with actors was a new concept to me until I tried it out in Scala. It’s appears to be one of Scala’s most celebrated features, judging by the official blurb. Actors was a daunting word at first but it really ends up being a very simple concept.

Actors are a programming model for concurrent programming. With conventional mutex/monitor based programming in Java, say, programmers hold and release locks (the synchronized keyword) to achieve safe concurrency. Condition variables are used for thread communication (the wait and notify family of functions on java.lang.Object). Communication is synchronous: a typical case would be that you change some condition, invoke notifyAll to wake up threads waiting on that condition, and then they can take over the relevant lock and proceed to do some processing.

An actor is a unit of execution with an asynchronous message queue. Actors can receive messages from other actors or send messages to other actors at any time, however, the messages wait in the receiving actor’s “mailbox” until the actor has time to receive it.

As a simple example, let’s develop a program that converts text files to upper case using actors. The program will have an “Input” actor, an “Output” actor, and a number of “UpperCase” actors that do the processing. First the Input actor:

import scala.actors._
import java.io._
 
class Input(in: BufferedReader) extends Actor {
	def act() {
	  while(true) {
	    receive {
	      case Next => { sender ! Line(in.readLine()) }
	    }
	  }
	}
}

It’s worth noting that the Actor system is implemented completely in the libraries, outside of the core language. Actors are not first class constructs, but sometimes look as if they were. The act method is where actors begin their execution. The receive method causes them to block and wait for a message, which we may pattern match on. The sender variable corresponds to whoever sent the last message received, and the ‘!’ operator sends a message. So whenever this actor receives the Next message, it will respond with the next line from a buffered reader.

Then, the UpperCase actor:

import scala.actors._
 
case class Next
case class Line(x: String)
 
class UpperCase(input: Actor, out: Actor) extends Actor {
	def act() {
		while(true)
		{
			input ! Next
			receive {
			case Line(x:String) => { out ! x.toUpperCase() }
			}
		}
	}
}

This actor is created with in- and output actors as its constructor parameters. It continually asks the input actor for a new line, converts it to upper case, and sends it to the output actor. Also note the case classes here, which are for pattern matching only. They are a bit like algebraic data types in Haskell.

Finally, the Output actor:

import scala.actors._
 
class Output extends Actor {
	def act() {
		while(true)
		{
			receive {
			case x:String => { println(x) }
			}
		}
	}
}

And then we have to tie it all together:

import java.io._
 
object Demonstration {
 
  val reader = new BufferedReader(new InputStreamReader(System.in))
 
  def main(args: Array[String]) {
 
    val in = new Input(reader)
    in.start
 
    val out = new Output()
    out.start
 
    1.to(5).foreach(x => {
      val tr = new UpperCase(in, out)
      tr.start
    })
  }
}

Here I abuse the foreach notation slightly to create 5 parallel text processors. Each actor runs on its own thread (though there are ways to prevent this if one wants very large numbers of actors). Now of course, the lines will probably be output in the wrong order. Another obvious shortcoming is that there is no clean shutdown protocol that terminates all the actors when the input stream is fully read. Solving these problems is outside of the scope of this article.

Some other interesting resources on actors: the official tutorial, the papers (slightly more academic but accessible to the monomorphic reader, I imagine). Debasish highlights how actors can be used to get threadless concurrency, Erlang-style.

First steps with Scala: XML pull parsing

I’m now going to share some of the results of my recent experiments with the Scala programming language. In May I wrote that I had started looking at it. I’ve been using it to make some support tools that I needed for research work since.

First a disclaimer: It’s been 4+ years since I did serious work with a functional programming language (Haskell, in first year of university), so my style is imperative-sprinkled-with-functional rather than the opposite. Also, since I haven’t spent that much time with this language yet, I’m bound to be making obvious mistakes. That said, I’m happy to be able to recommend Scala to pretty much anyone at this point. The learning curve is not steep if you know Java, and it allows for a variety of approaches depending on who you are.

For this particular tool, I needed to parse XML files, edit the contents of certain tags, and spit the data back out again. I’d like to show what I ended up with and point out some of Scala’s powerful features. Let’s first look at some interesting parts, and then the entirety.

1
2
3
object XMLTool {
  val interLink = """\[\[(.*)\]\]""".r
}

Scala lets you define objects as well as classes. Objects are singletons and can be referred to by name. Otherwise they are like classes; they participate in the type hierarchy.
Scala has three kinds of declarations: val, var and def. Values are evaluates once and cannot be reassigned. Vars are variables which can be reassigned. Defs are definitions and can as such be functions or values. My understanding is that they are lazily evaluated. The type of this val declaration is inferred by the highly powerful type system automatically using Hindley Milner type inference. One of my biggest surprises with Scala is how little type information the programmer has to provide, yet how powerful the static checking is. Incidentally, the .r at the end is a shortcut for turning the string into a regular expression object.

1
2
3
4
5
6
def main(args : Array[String]) : Unit = {
 
    val p = new XMLEventReader().initialize(Source.fromFile(args(0)))
    p.foreach(matchEvent)
 
  }

This function is the analogue of Java’s public static void main() and actually compiles to the same bytecode. Unlike in Java, types come after the variable name, separated from it by a colon. We can tell that we’re dealing with a functional language when we see foreach being applied to a function which I’ll declare next:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
def matchEvent(ev: XMLEvent) = {
    ev match {
      case EvElemStart(_, "text", _, _) => { 
        readingText = true
        print(backToXml(ev))
      }
      case EvElemStart(_, _, _, _) => { print(backToXml(ev)) }
      case EvText(text) => {
        if (readingText) print(filterText(text)) else print(text) 
      } 
      case EvElemEnd(_, "text") => {
        readingText = false
        print(backToXml(ev))
      }
      case EvElemEnd(_, _) => { print(backToXml(ev)) }
      case _ => {}
    }
  }

Here we see pattern matching in action. We can match on lots of things, including types, partially instantiated types, strings and regular expressions. This style of programming is encouraged in FP languages, unlike in imperative ones. By matching on something like EvElemStart(_, "text", _, _) I’m looking for XML tags whose name is “text”, and I don’t care about their namespace or attributes. _ is a wildcard character.

Incidentally, it’s perfectly fine for me to leave out the return type of this function. Scala will infer that the return type is Unit (which vaguely corresponds to void in Java).

Here’s the whole thing:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
import scala.xml._
import scala.xml.pull._
import scala.io.Source
 
object XMLTool {
  val interLink = """\[\[(.*)\]\]""".r
  var readingText = false
 
  def main(args : Array[String]) : Unit = {
 
    val p = new XMLEventReader().initialize(Source.fromFile(args(0)))
    p.foreach(matchEvent)
 
  }
 
  def matchEvent(ev: XMLEvent) = {
    ev match {
      case EvElemStart(_, "text", _, _) => { 
        readingText = true
        print(backToXml(ev))
      }
      case EvElemStart(_, _, _, _) => { print(backToXml(ev)) }
      case EvText(text) => {
        if (readingText) print(filterText(text)) else print(text) 
      } 
      case EvElemEnd(_, "text") => {
        readingText = false
        print(backToXml(ev))
      }
      case EvElemEnd(_, _) => { print(backToXml(ev)) }
      case _ => {}
    }
  }
 
  def backToXml(ev: XMLEvent) = {
    ev match {
      case EvElemStart(pre, label, attrs, scope) => {
        "<" + label + attrsToString(attrs) + ">"
      }
      case EvElemEnd(pre, label) => {
        "</" + label + ">"
      }
      case _ => ""
    }
  }
 
  def attrsToString(attrs:MetaData) = {
    attrs.length match {
      case 0 => ""
      case _ => attrs.map( (m:MetaData) => " " + m.key + "='" + m.value +"'" ).reduceLeft(_+_)
    }
  }
 
  def filterText(text: String) = {
    val matches = interLink.findAllIn(text)
    if (matches.hasNext) matches.reduceLeft(_+_) else ""
  }
}

So the purpose of this program is to read the XML, remove everything inside <text> tags that doesn’t match the interLink regular expression, and output the XML again. Towards the end, note how pleasant map and reduceLeft are for string processing – in Java I can’t really think of a succinct way of expressing the same notion.

Another couple of disclaimers: someone brought to my attention that there’s a very compact way of doing XPath queries in Scala, which probably makes my pattern matching on EvElemStart unnecessarily verbose. (Here’s a blog post on the xpath technique) Also, there was no particular reason for me to use pull parsing – push parsing might have been more natural, but I started down that path and this is what I ended up with. It works.

You can tell that I still have an imperative style from the way I use the readingText state variable to keep track of what the program is doing. A much more functional style program is probably hiding behind this one. Fortunately Scala is very forgiving towards people who mix styles like this.

My experience has been that it’s quite easy to get started and do useful things with Scala, once you get past the initial ideas (such as the difference between objects and classes, traits, val/def/var, declaration syntax). I would recommend it to anyone doing things with the JVM.