Posts categorized “Software development”.

Where is Java going?

creative

Today, Java is one of the most popular programming languages. Introduced in 1995, it rests on a tripod of the language itself, its libraries, and the JVM. In the TIOBE programming language league charts, it has been at the top for as long as the measurements have been made (since 2002), overtaken by C only for a brief period due to measurement irregularities.

Yet not all is Sun-shine in Java world. Sun Microsystems is about to be taken over by Oracle, pending EU approval. (EU is really dragging its feet in this matter but it seems unlikely they would really reject the merger). Larry Ellison has voiced strong support for Java and for Sun’s way of developing software, so maybe this is really not a threat by itself. But how far can the language itself go?

The Java language was carefully designed to be relatively easy to understand and work with. James Gosling, its creator, has called it a blue collar language, meaning it was designed for industrial, real world use. In a world where C++ was the de facto standard for OO programming, Java was a big step forward in terms of ease of development, with its lack of pointers and strong type system – to say nothing of its garbage collection. Many classes of common programming errors were removed altogether. However, in the interests of simplicity and clarity, some tradeoffs were made. The language’s detractors today point to problems such as excessive verbosity, the lack of closures, the limited generics, and the checked exceptions.

For some time there has been a lot of exciting alternative languages available on the JVM. Clojure is a Lisp dialect. Scala, the only non-Java JVM language I have used extensively, mixes the functional and object oriented paradigms. Languages like JPython and JRuby basically exist to allow scripting and interoperability with popular scripting languages on the JVM.

Today it seems as if the JVM and the standardized libraries will be Java’s most prominent legacy. The language itself will not go away for a long time either – considering that many companies still maintain or develop in languages like Cobol and Fortran, we will probably be maintaining Java code 30 years from now (what a sad thought!), but newer and more modern JVM languages will probably take turns being number one. The JVM and the libraries guarantee that we will be able to mix them relatively easily anyway, unless they stray too far from the standard with their custom features.

So in hindsight, developing this intermediate layer, this virtual machine – and disseminating it so widely –  was a stroke of genius. Will it be that in future programming models we have even more standardized middle layers, and not just one?

Meanwhile, there’s a lot of debate about the process being used to shape and define Java. For a long time, Sun employed something called the Java Community Process, JCP, which was supposed to ensure openness. Some people proclaim that the openness has ended. To take one example, very recently, Sun announced that there will be support for closures in Java 7, after first announcing that there would be no support for closures in Java 7. The process by which this decision has been managed has been described as not being a community effort. Some aspects of Java are definitely up in the air these days.

Programming languages are about people

Programming languages are more about people and less about machines.

Programming languages are about staying inside the limitations of people’s minds and their ability to keep track of and work with abstractions. If people had no such limitations, they could code in assembly language all the time.

Programming languages and supporting tools and environments are the interface between people and the raw instruction set of a computer (or a bigger entity, like a network of computers). When we design programming languages, we must take into account not only what the machine environment will do and how it changes, but also how people create the software, how they modify it, how they think about it. Maybe programming languages should even be designed with business processes in mind, in some cases.

But this is also a question of what kind of programmer we want to cultivate. The language shapes the programmer, too.

Scala and actors

Programming with actors was a new concept to me until I tried it out in Scala. It’s appears to be one of Scala’s most celebrated features, judging by the official blurb. Actors was a daunting word at first but it really ends up being a very simple concept.

Actors are a programming model for concurrent programming. With conventional mutex/monitor based programming in Java, say, programmers hold and release locks (the synchronized keyword) to achieve safe concurrency. Condition variables are used for thread communication (the wait and notify family of functions on java.lang.Object). Communication is synchronous: a typical case would be that you change some condition, invoke notifyAll to wake up threads waiting on that condition, and then they can take over the relevant lock and proceed to do some processing.

An actor is a unit of execution with an asynchronous message queue. Actors can receive messages from other actors or send messages to other actors at any time, however, the messages wait in the receiving actor’s “mailbox” until the actor has time to receive it.

As a simple example, let’s develop a program that converts text files to upper case using actors. The program will have an “Input” actor, an “Output” actor, and a number of “UpperCase” actors that do the processing. First the Input actor:

import scala.actors._
import java.io._
 
class Input(in: BufferedReader) extends Actor {
	def act() {
	  while(true) {
	    receive {
	      case Next => { sender ! Line(in.readLine()) }
	    }
	  }
	}
}

It’s worth noting that the Actor system is implemented completely in the libraries, outside of the core language. Actors are not first class constructs, but sometimes look as if they were. The act method is where actors begin their execution. The receive method causes them to block and wait for a message, which we may pattern match on. The sender variable corresponds to whoever sent the last message received, and the ‘!’ operator sends a message. So whenever this actor receives the Next message, it will respond with the next line from a buffered reader.

Then, the UpperCase actor:

import scala.actors._
 
case class Next
case class Line(x: String)
 
class UpperCase(input: Actor, out: Actor) extends Actor {
	def act() {
		while(true)
		{
			input ! Next
			receive {
			case Line(x:String) => { out ! x.toUpperCase() }
			}
		}
	}
}

This actor is created with in- and output actors as its constructor parameters. It continually asks the input actor for a new line, converts it to upper case, and sends it to the output actor. Also note the case classes here, which are for pattern matching only. They are a bit like algebraic data types in Haskell.

Finally, the Output actor:

import scala.actors._
 
class Output extends Actor {
	def act() {
		while(true)
		{
			receive {
			case x:String => { println(x) }
			}
		}
	}
}

And then we have to tie it all together:

import java.io._
 
object Demonstration {
 
  val reader = new BufferedReader(new InputStreamReader(System.in))
 
  def main(args: Array[String]) {
 
    val in = new Input(reader)
    in.start
 
    val out = new Output()
    out.start
 
    1.to(5).foreach(x => {
      val tr = new UpperCase(in, out)
      tr.start
    })
  }
}

Here I abuse the foreach notation slightly to create 5 parallel text processors. Each actor runs on its own thread (though there are ways to prevent this if one wants very large numbers of actors). Now of course, the lines will probably be output in the wrong order. Another obvious shortcoming is that there is no clean shutdown protocol that terminates all the actors when the input stream is fully read. Solving these problems is outside of the scope of this article.

Some other interesting resources on actors: the official tutorial, the papers (slightly more academic but accessible to the monomorphic reader, I imagine). Debasish highlights how actors can be used to get threadless concurrency, Erlang-style.

First steps with Scala: XML pull parsing

I’m now going to share some of the results of my recent experiments with the Scala programming language. In May I wrote that I had started looking at it. I’ve been using it to make some support tools that I needed for research work since.

First a disclaimer: It’s been 4+ years since I did serious work with a functional programming language (Haskell, in first year of university), so my style is imperative-sprinkled-with-functional rather than the opposite. Also, since I haven’t spent that much time with this language yet, I’m bound to be making obvious mistakes. That said, I’m happy to be able to recommend Scala to pretty much anyone at this point. The learning curve is not steep if you know Java, and it allows for a variety of approaches depending on who you are.

For this particular tool, I needed to parse XML files, edit the contents of certain tags, and spit the data back out again. I’d like to show what I ended up with and point out some of Scala’s powerful features. Let’s first look at some interesting parts, and then the entirety.

1
2
3
object XMLTool {
  val interLink = """\[\[(.*)\]\]""".r
}

Scala lets you define objects as well as classes. Objects are singletons and can be referred to by name. Otherwise they are like classes; they participate in the type hierarchy.
Scala has three kinds of declarations: val, var and def. Values are evaluates once and cannot be reassigned. Vars are variables which can be reassigned. Defs are definitions and can as such be functions or values. My understanding is that they are lazily evaluated. The type of this val declaration is inferred by the highly powerful type system automatically using Hindley Milner type inference. One of my biggest surprises with Scala is how little type information the programmer has to provide, yet how powerful the static checking is. Incidentally, the .r at the end is a shortcut for turning the string into a regular expression object.

1
2
3
4
5
6
def main(args : Array[String]) : Unit = {
 
    val p = new XMLEventReader().initialize(Source.fromFile(args(0)))
    p.foreach(matchEvent)
 
  }

This function is the analogue of Java’s public static void main() and actually compiles to the same bytecode. Unlike in Java, types come after the variable name, separated from it by a colon. We can tell that we’re dealing with a functional language when we see foreach being applied to a function which I’ll declare next:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
def matchEvent(ev: XMLEvent) = {
    ev match {
      case EvElemStart(_, "text", _, _) => { 
        readingText = true
        print(backToXml(ev))
      }
      case EvElemStart(_, _, _, _) => { print(backToXml(ev)) }
      case EvText(text) => {
        if (readingText) print(filterText(text)) else print(text) 
      } 
      case EvElemEnd(_, "text") => {
        readingText = false
        print(backToXml(ev))
      }
      case EvElemEnd(_, _) => { print(backToXml(ev)) }
      case _ => {}
    }
  }

Here we see pattern matching in action. We can match on lots of things, including types, partially instantiated types, strings and regular expressions. This style of programming is encouraged in FP languages, unlike in imperative ones. By matching on something like EvElemStart(_, "text", _, _) I’m looking for XML tags whose name is “text”, and I don’t care about their namespace or attributes. _ is a wildcard character.

Incidentally, it’s perfectly fine for me to leave out the return type of this function. Scala will infer that the return type is Unit (which vaguely corresponds to void in Java).

Here’s the whole thing:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
import scala.xml._
import scala.xml.pull._
import scala.io.Source
 
object XMLTool {
  val interLink = """\[\[(.*)\]\]""".r
  var readingText = false
 
  def main(args : Array[String]) : Unit = {
 
    val p = new XMLEventReader().initialize(Source.fromFile(args(0)))
    p.foreach(matchEvent)
 
  }
 
  def matchEvent(ev: XMLEvent) = {
    ev match {
      case EvElemStart(_, "text", _, _) => { 
        readingText = true
        print(backToXml(ev))
      }
      case EvElemStart(_, _, _, _) => { print(backToXml(ev)) }
      case EvText(text) => {
        if (readingText) print(filterText(text)) else print(text) 
      } 
      case EvElemEnd(_, "text") => {
        readingText = false
        print(backToXml(ev))
      }
      case EvElemEnd(_, _) => { print(backToXml(ev)) }
      case _ => {}
    }
  }
 
  def backToXml(ev: XMLEvent) = {
    ev match {
      case EvElemStart(pre, label, attrs, scope) => {
        "<" + label + attrsToString(attrs) + ">"
      }
      case EvElemEnd(pre, label) => {
        "</" + label + ">"
      }
      case _ => ""
    }
  }
 
  def attrsToString(attrs:MetaData) = {
    attrs.length match {
      case 0 => ""
      case _ => attrs.map( (m:MetaData) => " " + m.key + "='" + m.value +"'" ).reduceLeft(_+_)
    }
  }
 
  def filterText(text: String) = {
    val matches = interLink.findAllIn(text)
    if (matches.hasNext) matches.reduceLeft(_+_) else ""
  }
}

So the purpose of this program is to read the XML, remove everything inside <text> tags that doesn’t match the interLink regular expression, and output the XML again. Towards the end, note how pleasant map and reduceLeft are for string processing – in Java I can’t really think of a succinct way of expressing the same notion.

Another couple of disclaimers: someone brought to my attention that there’s a very compact way of doing XPath queries in Scala, which probably makes my pattern matching on EvElemStart unnecessarily verbose. (Here’s a blog post on the xpath technique) Also, there was no particular reason for me to use pull parsing – push parsing might have been more natural, but I started down that path and this is what I ended up with. It works.

You can tell that I still have an imperative style from the way I use the readingText state variable to keep track of what the program is doing. A much more functional style program is probably hiding behind this one. Fortunately Scala is very forgiving towards people who mix styles like this.

My experience has been that it’s quite easy to get started and do useful things with Scala, once you get past the initial ideas (such as the difference between objects and classes, traits, val/def/var, declaration syntax). I would recommend it to anyone doing things with the JVM.

An unusual Java construct

I now break the longstanding tradition of not posting code on this blog. I just wanted to share what I believe to be a somewhat unusual pattern in Java:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
public IMethod findMethod(String name, String[] types) {
		outer:
		for (IMethod m: m_methods)
		{
			if (m.getName().equals(name))
			{
				int i = 0;
				for (IType t: m.getArgList().getTypes())
				{
					if (! t.getName().equals(types[i]))
					{
						continue outer;
					}
					i++;
				}
				return m;
			}
		}
		return null;
	}

On line 2, there’s a label outer: which identifies a location in the code. Normally this feature is used with keywords like the hotly debated goto in C. Java has goto as a keyword, but doesn’t support the feature. However, you can still use the labels with statements like continue above (line 12), which in this case starts a new iteration of the outer loop rather than the inner one.

I can’t remember ever having had to use this feature of any C-like language before (perhaps once) so it was intriguing when it popped up. It’s possible that a neater implementation of this would put the inner loop in a matchesSignature method in the IMethod interface instead.

The ego fallacy

A senior manager at a company I used to work at once said that (making) software is a very social activity. I didn’t have much experience, and was very surprised at the time, since I had never thought about the human aspect of software development. But of course this aspect is extremely important. For example, in any setting with more than one programmer working on a project, the need for well functioning communication is huge, as much as in any other job I suspect. Projects often fail due to a lack of communication.

Another human side to software development is that some developers, this author included, easily start seeing the code they write as their own intellectual turf. If somebody challenges the developer’s practices or code, offering a better solution, it will be met with massive resistance. Partly out of laziness, but partly, I think, out of a desire to protect their territory and their legacy.

I do this myself more often than I would like. And it leads to bad results because it creates obstacles to communication and means that team members pull in different directions. Thus, somehow the incentives are wrong. If everybody’s goal were to allow the team to deliver a good product quickly, this would not happen. Why is it that your goal after some time with a project sometimes becomes to defend what you have created? Why do we identify with the code we wrote, and not with the bigger project?

This doesn’t mean that looking to your own interests or to your ego is a bad thing – rather that it’s easy to be shortsighted about what is in your best interests.

The problem with standards

Standardised formats are essential to connections in the digital society. On the hardware side, USB is so ubiquitous and well defined that I can connect essentially any peripheral to any PC. In many cases I can even expect them to work without drivers. For sound, the 3.5 mm headphone plug has been ubiquitous as long as I can remember. For video, there is VGA, DVI, and so on. Even if some of these change every now and then, we can always buy converters to convert from one format to another. The changes happen infrequently enough that this is a bearable burden.

On the software side, from a user perspective, it gets tougher. In terms of document formats, everybody is expected to have a web browser, to be able to read PDF documents, and to play MP3 files. There’s also a certain expectation that people can read and edit Microsoft Office documents, but in general, unless people have Microsoft Office itself, they are likely to use some program that does an imperfect job of converting to and from the office formats. More exotic formats people do not expect to share unless they know that the receiver has the same software.

One of the worst situations (and maybe one of the sources of these troubles) is software architecture. To look at just one subset of this universe, let’s consider some popular Java frameworks. There are many Java frameworks. I can think of Struts, Spring, Enterprise JavaBeans, Tomcat, JBoss, Websphere, OSGI. Each one is trying to be a universal, or near universal, component based framework for large scale Java applications. (Some of these are intended specifically for web applications, but the distinction is a bit arbitrary these days). Each one defines a particular behaviour and API that its users need to fit into.

One problem is that once you’ve developed your application in one of these frameworks, it’s tough to move to a different one. At best it’s “just” a matter of using a different API. At worst, the new framework has different expectations of component behaviour and forces you to either refactor your old application majorly or write lots of adapters.

Software has this particular kind of stickiness: objects designed in one context tend to be difficult or impossible to move to other contexts. Here are some problems brought about by this fact:

  • Across several different devices and services, I have many different lists of my various contacts. For instance, I have an address book on my computer, which fortunately is shared with my iPhone. However, there’s also a list that Gmail maintains, one list on MSN messenger, one on ICQ, one on Facebook, one on the Playstation Network… etc. In a world with perfect software, I would have a single contact list that could be updated and accessed from any of these services (and I would still be able to selectively hide things from any one service, if I wanted to).
  • Most social networks support some kind of status update feature (Twitter, Facebook, …) but they generally can’t share this information with each other. There’s a tool called ping.fm which is able to update lots of networks at once, but this is just for a single feature. And it took a third party tool to do it. And I have to give it all of my logins and passwords.
  • The health care systems in many countries are a mess (UK’s NHS and probably most US systems fall in this category). Different medical offices in different medical IT systems establish different journals, but they have no way of sharing the journal if the patient wants to use a different care provider. As experiences in the UK showed, achieving the ability to share these records can be very difficult sometimes.

So how can we design really future proof standards? Here are my intuitive, unqualified answers:

Favor openness over closedness. Allow unknown entities to enter your protocol at many different stages and in many different roles. Even if you don’t design for this, it will tend to happen. Better to design for it.

Allow extensibility in many directions. Frameworks often allow you to make more specific versions of their components, but it’s rare that you can generalize or inject behaviour at a middle level of a concept hierarchy.

Use semantic metadata. Ontologies are under heavy research in computer science. They are formal classifications of concepts and their relations. One popular format for developing ontologies is RDF(S); for an example of what the concepts can look like, check this RDF(s) directory. I will have reason to come back to this topic in the future.

One example of a brilliant long-lasting standard: TCP/IP. Somehow in this case, the designers got something miraculously right at the first try. I’m amazed at how we keep building on this standard every day.

Exploring Scala

I’ve started experimenting with the programming language Scala. I’ve been wanting to get back into functional programming for some time, but I’ve found it impractical for the time being to dive right into something like ML, Haskell or Scheme. Scala has gained notoriety since Twitter announced that they’ve rewritten their engine in it. Some of its benefits are:

  • Mixes multiple paradigms, including imperative, functional and actor programming
  • Runs on the Java VM – interop with Java libraries and frameworks is trivial
  • Lightweight syntax

There is much to like in this. I’m hoping that it will turn out to be useful as a rapid prototyping language for trying things out, especially since so many third party tools are available in Java world.

For now I am using exercises from Project Euler as a means to experiment with and learn it. This works surprisingly well.

Also see this list of the most popular programming languages - Scala is now number 27, ahead of Prolog, Erlang, Haskell and ML.

The textual paradigm

Following up on yesterday’s post on code reuse, I have a more specific reason to be skeptical of literate programming.

Programming and software development is stuck in a textual paradigm – the idea that programming is something you do by writing text in a formal language. I think this idea constrains us somewhat in the development of languages and tools. IDE’s like Eclipse allow you to perform operations on the abstract syntax tree to some extent (e.g. automated refactorings), but in general, those too are text oriented.

I don’t have a replacement for the textual paradigm today. Because of the strong link between languages and automata, clearly computer programming is very strongly related to formal languages. But sentences in formal languages don’t need to be represented as text. 

However, I don’t think I want to advocate leaving text behind completely. It is one of the most powerful devices for input and output of precise information, to and from people and computers. But I think we need to take steps towards being less about text editing, and literate programming  does not seem to permit that.

Code reuse according to Knuth

I stumbled upon this interview with Donald Knuth today. In addition to the celebrated The Art of Computer Programming, Donald Knuth pioneered the concept of literate programming, which emphasises writing code for human beings to understand, as opposed to writing it for machines to understand.

When being asked about current trends and fashionable practices that have undeserved popularity, Knuth says:

I must also confess to a strong bias against the fashion for reusable code.

His take, then, is that with highly readable, modifiable and accessible source code, reuse is unnecessary and dangerous. He would prefer rewriting the code for a new context.

I am one of those who would preach the “fashion” of reusable code. It does appear to me that reusability is very important, and in particular, I think there are tremendous benefits to be realised when the language implementation is able to reason about code. The success of languages like Java shows that the more managed and controlled the language is, the more human creativity and mental capacity is free to focus on the higher levels of abstraction. If programmers must adapt code to each context manually, surely they will be distracted from the bigger task. So my take on the issue is very different to Knuth’s.

But I confess to not knowing much about literate programming and to not having explored the tools (in particular Knuth’s CWEB). Perhaps I should learn more.