Tag: tools

Tips for academics who develop software

February 23rd, 2010 — 2:44am

Academics and practitioners, having rather different goals in life, tend to approach software development in quite different ways. No doubt there are many things each side of the fence can learn from the other, but I think academics in particular could often benefit quite a lot by adopting some of the practices used in industrial development. And not just computer science academics!

A common misconception is that these techniques only are useful with large projects and large teams. I find, though, that they can help reduce much of the growth pains even in small projects, helping them reach maturity much faster.

Use version control. Classical, but invalid, counter arguments include “it’s a hassle and too much work to set up”, or “there’s only one person working on this project anyway”. Even if it’s only you, you will benefit massively from being able to undo your changes far back in time. It will let you experiment safely. Plus, setup is no longer an issue with free and easy-to-use services like github and bitbucket. My tool of choice is now Mercurial, and I used to use SVN. And there are many other good choices.

Use a debugger. If there is a debugger available for your language, and there most certainly is, then you should use it to find nontrivial errors, rather than extensive printf style testing.

Don’t optimise prematurely, but when you need to, use a profiler. Profilers tell you where a program’s performance bottlenecks are. You can profile things like heap usage (what classes use most space in Java, for instance) and CPU usage (which functions use the most CPU time). For Java, I’ve discovered that the NetBeans IDE has a very good built in profiler. Eclipse also has one, but it didn’t work on Mac last time I checked. For C/C++, GProf used to be good and probably still is.

Use unit testing wisely. All of the above apply even to very small projects, but I think some projects are too small to need unit tests, at least initially. You be the judge. I find that unit tests can have a lot of benefit when applied to the fragile, complicated parts of a system, where many different things interlock. If you are ambitious you can also write tests first and code later — test driven development.

Use a good IDE if you can. For a language like Java, where you have to type a lot of code to get something done and spread out your code across lots of files, a good IDE that can generate boilerplate code and navigate quickly can really speed up your work. It’s beneficial for other languages too. But I have no problem with people who use pure vim or emacs, after all these are practically IDEs.

I believe that honing your software development skills as an academic can pay off. Also see: Daniel Lemire on why you should open source your projects. (I will get around to doing this eventually, I promise ;-))

Comment » | Software development

A wikipedia of algorithms

December 7th, 2009 — 10:21am

Here’s something I’ve wanted to see for some time, but probably don’t have time to work on myself.

It would be nice if there was a wikipedia-like web site for code and algorithms. Just the common ones to start with, but perhaps more specialised ones over time. Of course the algorithms should be available in lots of different languages. This would in fact be one of the main points, so that people could compare good style and see how things should be done for different languages. In addition, there should be an in-browser editor, just like on Wikipedia (but perhaps with syntax highlighting) so people can make changes easily.

Furthermore, there should be unit tests for every algorithm, and these should be user-editable in the same way as the main code. In an ideal world, the web site would automatically run the unit tests every time there’s a change to some algorithm and check in a new version of the code to a versioned repository. People could then trust with reasonable confidence that the code is valid and safe. However, if the system were to be as open as Wikipedia is, such a system wouldn’t work, since users could write unit tests with malicious code. So I suspect volunteers would have to download, inspect, and run the unit tests regularly, and perhaps there would be a meta-moderation system of some kind, allowing senior members to promote changes to the official repository. In the meantime, everybody should be allowed to see and edit changes on the wiki immediately, but they would be marked as “untested” or “unsafe”.

User interface would be very important since this kind of site needs to be fun and easy to use regularly.

Has this kind of project already been carried out by someone? I can find some things by googling. The Code Wiki appears to once have been a wikipedia of code, but it seems defunct, C# only, and now they’re selling a book with the contents of the site! Algorithm Wiki has many algorithms in different languages, but the user interface is awkward and littered with obstructive advertising, the code is hard to browse, and it doesn’t make for a usable quick reference. They seem to have gotten off to a good start though. Any others?

Edit: Rosetta Code seems to be the most mature and useful such site out there today.

Comment » | Software development

Bibliography tools (2) – Mendeley

September 13th, 2009 — 7:27pm

Following a comment on my previous foray into bibliography management systems, I had a look at the product known as Mendeley.


In order to evaluate Mendeley, let’s ask ourselves what we want from a bibliography management system in the modern research environment. At a bare minimum, we want an easy way to catalogue and search PDF documents, and of course compile the all-important reference list at the end of the laborious writing process. Mendeley does this, as well as bring a social networking aspect into the picture. It tries to recommend papers that are relevant to your work, as well as give you an easy way of sharing interesting papers with colleagues.

In contrast to Aigaion, which I wrote about previously, Mendeley is not a web based system but a desktop application. This definitely has benefits as the interface is quite slick. I can set the application to watch my “papers folder”, and any PDFs I save to that folder, or its subfolders, will automatically be scanned and entered into Mendeley. Metadata, such as author, title and references, is automatically extracted from the document in most cases, though I found I had to manually revise it sometimes. There’s a built in command that searches for the metadata by paper title on Google Scholar, which comes in very handy in such cases.

Mendeley is built around an internal PDF viewer where the user can highlight text, add little stickies with notes, and so on. This works quite smoothly, but on the Mac platform, it’s definitely not as polished as the Mac’s built in Preview PDF viewer. Mendeley is using its own PDF rendering layer, and it shows in the slower loading times when you scroll the documents. Some additional work could be done here. This is my only major complaint so far, though.

Much like the Evernote application, Mendeley has the option of storing all the papers on a central server, so that I can easily access them (and any annotations I might have made) from a different computer by signing in with my user name and password and then syncing the files. This means I don’t have to give up the benefits I get from a centralized server. It might be nice, however, to have the option of running my own Mendeley server, so I’m not dependent on the Mendeley company’s server somewhere – but then I would forgo the social networking benefits of course.

This application has similarities to how last.fm is used for music, in that people build a profile based on what they consume. Indeed, Mendeley is describing itself as a last.fm for research (video presentation). Let’s compare research and music as forms of media.

  • Most music listeners probably don’t make their own music – most people who read research papers probably write their own papers.
  • Songs sample other songs (the remix culture), but it’s relatively recent – researchers have always done this in order to establish basic credibility.
  • The atomic unit of music is the song. The atomic unit of research is the research paper (the PDF in today’s internet based world, at least in my discipline) – but could this change in the future? Do we have to constrain ourselves to the article format?

In summary, Mendeley is probably the most useful, workflow friendly bibliography system I’ve tried so far. If you’re in research, I’d recommend you give it a try. If I get time, I plan to also investigate a more Mac-centric tool called Sente in the future.

The Savage Minds blog recommends that you don’t use Mendeley as your main tool yet due to its relative immaturity, but I have seen no showstopper bugs so far.

3 comments » | Uncategorized

Paper documents made searchable

July 15th, 2009 — 1:41pm

I use the tool Evernote on my iPhone and my desktop computers. It’s pretty nice. You can upload “notes” such as PDFs or images from your desk or from the phone, and the software makes them all searchable and syncs all data between all the different places where you use it. It OCRs photos, so if you take a photo of text, that text will be searchable. It can also show on a map where notes added from the phone were taken.

But this seems to be the icing on the cake: Pixily can scan your paper documents for you, supposedly even handwritten notes. (If they can do my handwriting, and I’m not sure they can, then they can surely decipher absolutely anything, even lost alphabets.) Apparently you send them all your stuff in boxes or envelopes, and they will OCR it into your Evernote account so it all becomes searchable. I would definitely do this if it were cheap, but I suspect shipping to and from Japan is too expensive for me to do this in bulk.

1 comment » | Uncategorized

The textual paradigm

May 1st, 2009 — 5:39am

Following up on yesterday’s post on code reuse, I have a more specific reason to be skeptical of literate programming.

Programming and software development is stuck in a textual paradigm – the idea that programming is something you do by writing text in a formal language. I think this idea constrains us somewhat in the development of languages and tools. IDE’s like Eclipse allow you to perform operations on the abstract syntax tree to some extent (e.g. automated refactorings), but in general, those too are text oriented.

I don’t have a replacement for the textual paradigm today. Because of the strong link between languages and automata, clearly computer programming is very strongly related to formal languages. But sentences in formal languages don’t need to be represented as text. 

However, I don’t think I want to advocate leaving text behind completely. It is one of the most powerful devices for input and output of precise information, to and from people and computers. But I think we need to take steps towards being less about text editing, and literate programming  does not seem to permit that.

Comment » | Software development

Back to top