Posts tagged “reuse”.

A wikipedia of algorithms

Here’s something I’ve wanted to see for some time, but probably don’t have time to work on myself.

It would be nice if there was a wikipedia-like web site for code and algorithms. Just the common ones to start with, but perhaps more specialised ones over time. Of course the algorithms should be available in lots of different languages. This would in fact be one of the main points, so that people could compare good style and see how things should be done for different languages. In addition, there should be an in-browser editor, just like on Wikipedia (but perhaps with syntax highlighting) so people can make changes easily.

Furthermore, there should be unit tests for every algorithm, and these should be user-editable in the same way as the main code. In an ideal world, the web site would automatically run the unit tests every time there’s a change to some algorithm and check in a new version of the code to a versioned repository. People could then trust with reasonable confidence that the code is valid and safe. However, if the system were to be as open as Wikipedia is, such a system wouldn’t work, since users could write unit tests with malicious code. So I suspect volunteers would have to download, inspect, and run the unit tests regularly, and perhaps there would be a meta-moderation system of some kind, allowing senior members to promote changes to the official repository. In the meantime, everybody should be allowed to see and edit changes on the wiki immediately, but they would be marked as “untested” or “unsafe”.

User interface would be very important since this kind of site needs to be fun and easy to use regularly.

Has this kind of project already been carried out by someone? I can find some things by googling. The Code Wiki appears to once have been a wikipedia of code, but it seems defunct, C# only, and now they’re selling a book with the contents of the site! Algorithm Wiki has many algorithms in different languages, but the user interface is awkward and littered with obstructive advertising, the code is hard to browse, and it doesn’t make for a usable quick reference. They seem to have gotten off to a good start though. Any others?

Edit: Rosetta Code seems to be the most mature and useful such site out there today.

The problem with standards

Standardised formats are essential to connections in the digital society. On the hardware side, USB is so ubiquitous and well defined that I can connect essentially any peripheral to any PC. In many cases I can even expect them to work without drivers. For sound, the 3.5 mm headphone plug has been ubiquitous as long as I can remember. For video, there is VGA, DVI, and so on. Even if some of these change every now and then, we can always buy converters to convert from one format to another. The changes happen infrequently enough that this is a bearable burden.

On the software side, from a user perspective, it gets tougher. In terms of document formats, everybody is expected to have a web browser, to be able to read PDF documents, and to play MP3 files. There’s also a certain expectation that people can read and edit Microsoft Office documents, but in general, unless people have Microsoft Office itself, they are likely to use some program that does an imperfect job of converting to and from the office formats. More exotic formats people do not expect to share unless they know that the receiver has the same software.

One of the worst situations (and maybe one of the sources of these troubles) is software architecture. To look at just one subset of this universe, let’s consider some popular Java frameworks. There are many Java frameworks. I can think of Struts, Spring, Enterprise JavaBeans, Tomcat, JBoss, Websphere, OSGI. Each one is trying to be a universal, or near universal, component based framework for large scale Java applications. (Some of these are intended specifically for web applications, but the distinction is a bit arbitrary these days). Each one defines a particular behaviour and API that its users need to fit into.

One problem is that once you’ve developed your application in one of these frameworks, it’s tough to move to a different one. At best it’s “just” a matter of using a different API. At worst, the new framework has different expectations of component behaviour and forces you to either refactor your old application majorly or write lots of adapters.

Software has this particular kind of stickiness: objects designed in one context tend to be difficult or impossible to move to other contexts. Here are some problems brought about by this fact:

  • Across several different devices and services, I have many different lists of my various contacts. For instance, I have an address book on my computer, which fortunately is shared with my iPhone. However, there’s also a list that Gmail maintains, one list on MSN messenger, one on ICQ, one on Facebook, one on the Playstation Network… etc. In a world with perfect software, I would have a single contact list that could be updated and accessed from any of these services (and I would still be able to selectively hide things from any one service, if I wanted to).
  • Most social networks support some kind of status update feature (Twitter, Facebook, …) but they generally can’t share this information with each other. There’s a tool called ping.fm which is able to update lots of networks at once, but this is just for a single feature. And it took a third party tool to do it. And I have to give it all of my logins and passwords.
  • The health care systems in many countries are a mess (UK’s NHS and probably most US systems fall in this category). Different medical offices in different medical IT systems establish different journals, but they have no way of sharing the journal if the patient wants to use a different care provider. As experiences in the UK showed, achieving the ability to share these records can be very difficult sometimes.

So how can we design really future proof standards? Here are my intuitive, unqualified answers:

Favor openness over closedness. Allow unknown entities to enter your protocol at many different stages and in many different roles. Even if you don’t design for this, it will tend to happen. Better to design for it.

Allow extensibility in many directions. Frameworks often allow you to make more specific versions of their components, but it’s rare that you can generalize or inject behaviour at a middle level of a concept hierarchy.

Use semantic metadata. Ontologies are under heavy research in computer science. They are formal classifications of concepts and their relations. One popular format for developing ontologies is RDF(S); for an example of what the concepts can look like, check this RDF(s) directory. I will have reason to come back to this topic in the future.

One example of a brilliant long-lasting standard: TCP/IP. Somehow in this case, the designers got something miraculously right at the first try. I’m amazed at how we keep building on this standard every day.

Research idea: a snapshot

As part of an application form I had to fill out recently, I had to write a summary of my research ideas. Of course this changes all the time, since I’m still searching for a precise topic (and probably will be for a long time). But this is what a snapshot of those thoughts, taken now, looks like:

One of the most important problems in software engineering is reducing the impact of change. To this end, recently methods such as inversion of control (dependency injection) have become popular, in order to reduce the coupling to concrete interfaces. However, even with these schemes, there is still a dependency on specific names and abstract interfaces.  My project aims to investigate the possible use of semantic methods to address this problem. In essence, I want to allow developers to use semantic interfaces rather than syntactic ones to describe and access their components. 

Specifically, I am investigating techniques commonly used in the context of Semantic Web Services, such as ontologies and semantic/syntactic mediation, and their applicability to this problem. 

We may regard services as being somewhat large scale components. However, I am interested in applying these methods not just for large scale services distributed across the web, but also for small and numerous software components running in a single process. In such a setting, performance and scalability are important issues to investigate, in addition to the usual problems of reliability, correctness of composition, etc.

Code reuse according to Knuth

I stumbled upon this interview with Donald Knuth today. In addition to the celebrated The Art of Computer Programming, Donald Knuth pioneered the concept of literate programming, which emphasises writing code for human beings to understand, as opposed to writing it for machines to understand.

When being asked about current trends and fashionable practices that have undeserved popularity, Knuth says:

I must also confess to a strong bias against the fashion for reusable code.

His take, then, is that with highly readable, modifiable and accessible source code, reuse is unnecessary and dangerous. He would prefer rewriting the code for a new context.

I am one of those who would preach the “fashion” of reusable code. It does appear to me that reusability is very important, and in particular, I think there are tremendous benefits to be realised when the language implementation is able to reason about code. The success of languages like Java shows that the more managed and controlled the language is, the more human creativity and mental capacity is free to focus on the higher levels of abstraction. If programmers must adapt code to each context manually, surely they will be distracted from the bigger task. So my take on the issue is very different to Knuth’s.

But I confess to not knowing much about literate programming and to not having explored the tools (in particular Knuth’s CWEB). Perhaps I should learn more.