Posts categorized “Uncategorized”.

Paper documents made searchable

I use the tool Evernote on my iPhone and my desktop computers. It’s pretty nice. You can upload “notes” such as PDFs or images from your desk or from the phone, and the software makes them all searchable and syncs all data between all the different places where you use it. It OCRs photos, so if you take a photo of text, that text will be searchable. It can also show on a map where notes added from the phone were taken.

But this seems to be the icing on the cake: Pixily can scan your paper documents for you, supposedly even handwritten notes. (If they can do my handwriting, and I’m not sure they can, then they can surely decipher absolutely anything, even lost alphabets.) Apparently you send them all your stuff in boxes or envelopes, and they will OCR it into your Evernote account so it all becomes searchable. I would definitely do this if it were cheap, but I suspect shipping to and from Japan is too expensive for me to do this in bulk.

Iran, Twitter and information control

Ahmadinejad protesters in Ebisu, Tokyo

We’ve now had just over a decade of truly mainstream access to and use of the internet. I think I personally took my first stumbling steps on the web around 1995-1996. At the time, it was a limited phenomenon, rife with poor design. It was hard to see what was eventually going to come out of that. And even today, it’s hard to see what today’s internet will eventually evolve into.

If it wasn’t clear before, the events of the past week have made it clear that the internet is a valuable tool for democracy. When everybody can broadcast to everybody else, as opposed to just a select few broadcasting, it’s difficult to control the information flow. Repressing select bits of information becomes hard – the repression just results in the information getting more attention. In the aftermath of Iran’s elections, it seems one of the most important communication channels for protesters was Twitter.  The situation is being likened to Tiananmen square. Together with everybody else, I could follow #IranElection as the events unfolded. It went to the point where the US State Department asked Twitter to delay upgrades in order to keep the service operative, supposedly because of Twitter’s importance in Iran.

I don’t know enough about the candidates to take sides in Iran, but I think one of our fundamental principles should be that nobody should seek to rule by repressing communication. Today, the Internet is a free communications device that anyone can use. How long will it stay this way? When legislators seek to clamp down on the Internet’s uncontrolled nature and regulate it for one reason or another, we should protest. Unrestricted mass communication for everyone is too important an invention to give up.

For those who read Swedish, Rasmus Fleischer has written a brilliant post on the events from a philosophical-historical perspective.

Software roundup

I enjoy experimenting with new software  just to see what people come up with. There’s just so much unknown software to discover. I suspect most people find something they like and then stick with it until it doesn’t work anymore, but there’s something to be said for proactively replacing your software and searching for better things. Here are some things I’ve taken a liking to recently.

New iPhone OS: Apple released version 3.0 a few days ago. It’s been all positive so far. I get spotlight search for the phone, the keyboard feels more responsive, it finally has copy and paste, Youtube appears to have higher quality, and a slew of other features. (Also MMS which I don’t really need in Japan).

GWT: I started playing with the Google Web Toolkit just for fun. For those who don’t know, it’s an API for developing AJAX based web applications using pure Java, which is then compiled to client side Javascript and a Java servlet. It turns out I can be extremely productive with it – I found that it lets me develop fairly advanced web applications using my existing skills. There’s a very high reward/effort ratio that makes me excited. It feels like I don’t need to learn Ruby on Rails properly when I have GWT given that I’m very comfortable developing in Java.. but we’ll see.

Fluid.app: A web browser enhancement for the Mac that allows you to create separate “applications” from web sites you visit often. This means they will show up in the task bar and application switcher, have their own icon, and occupy less screen space. It sounds simple, but it’s a revelation. (And if you’re like me, you tend to have 15+ tabs open in your web browsers constantly, which is a poor way of managing windows).

Chandler: Like many others, I found out about this little calendar and note manager by reading Scott Rosenberg’s Dreaming in Code , which chronicled the misfortunes of an open source startup project. It went 1.0 last year after many years of development. The impression you get from the book is that the developers had a lot of bad luck despite setting out with the right ambitions. This is now an old debate, but the tool is actually usable today – I’ve been using it every day to manage myself for 3 months. Aside from slight bugs, it feels very smart sometimes, thanks to its unique user interface and features.

Why are micropayments not taking off?

jtree

Micropayments are an old idea.

Examples of services using something that might be called micropayments today are Apple’s App Store (for the iPhone), Sony’s Playstation Network, various MMORPGs, etc. However the typical payment sizes are still quite large:  the smallest possible payment on the App Store is 100 yen (1 dollar). With even smaller payments, say around the value of 0.001 dollars or less, a new range of possibilities is opened up. For those who worry about payment costs, it will probably soon be economically feasible to make payments in the order of 1 millionth of a dollar, given that network costs, processing costs, and storage costs go down all the time – the economics of electronic payment are really changing. Fraud is probably a much bigger hurdle to overcome.

My case for micropayments is about derivative works. I’m not sure what copyright laws will look like in the future, but it is likely that payments and some kind of monetary system will remain in the picture. With the rise of the internet and various kinds of legal and quasilegal file sharing (the American term “fair use” might apply here), a certain kind of derivative work has proliferated. Songs being remixed, music videos being created by fans on Youtube (usually consisting of the song in its original form and fan-made footage), memes floating around. The available technology eases the process of creating derivative works massively.

The existing legal framework was clearly not designed for this. As an amateur musician, I sometimes make music. Once, several years ago, I wanted to sample a tape recording made by Andy Warhol and use it as part of one of my works. After having e-mailed the Andy Warhol Museum, I was told that a written agreement would have to be set up with the Andy Warhol estate. (I have never made a penny from my recordings; they are made strictly for my own amusement, so there was no benefit for me in going through with a cumbersome process). Different countries have different sampling laws, for instance in Sweden sampling something like 10 seconds for use in your own music is allowed without prior agreement. However, the point here is that with a sufficiently advanced major micropayment system, this process could be made much smoother.

Consider completely original works, their derivative works, derivatives of the derivatives, and so on for a certain number of steps. In mathematical terms, this forms a graph (or rather a tree/DAG), branching out and connecting all the included and indirectly included items. With micropayments, it might be possible to pay the creators of each included work automatically by sending money down these connections, slightly reducing the payment amount on each depth level. (The hard thing here is determining the amount to pay and the reduction amount on each step – this would depend on how much the included work has been changed and how prominent it is, among other things). All of this should be fully automatic.

With such a system in place, anyone could sample anything at any time without worrying about legal agreements. Creators might receive a very large number of possibly very small payments. It’s unclear if the final payment distribution would be different from today, but I’m convinced that more derived works would be created.

However, it’s an open question whether these payments always have to be monetary. Can we envision other compensation systems for the digital world (which do not convert to cash)?

“True Knowledge”: Another search engine

I previously commented on Wolfram Alpha and PowerSet. Fisheye Perspective now brings my attention to another “answer engine” as they are called these days: True Knowledge. You have to sign up for an account in order to test it, which I have yet to do, but one feature that’s immediately appealing is that users can add and edit content. This was apparently one of the main design principles. But is this then just an alternative to Wikipedia? Not necessarily, as it also has an inference system (it can deduce facts from other facts). And it has an API for programmatic access. I can think of many interesting uses for an online user-edited inference-enabled knowledge base, if they can get the details right. These things are still in their infancy (I hope, since I want them to be better).

Two new-ish search engines

Recently, while reading about methods for manipulating RDF, I discovered the search engine PowerSet. More recently, Wolfram Research’s Wolfram Alpha launched. There’s been no shortage of new search engines in the past year or so – Cuil is one that was much publicized but ended up remarkably useless – but these two still impress me.

PowerSet impresses me because of its interface – I can easily see what a particular match is about without leaving the list of search results. Speeding up the typical use cases like this is very important for usability.

Wolfram Alpha impresses me because of the quality of the results. Maybe I’m in the minority thinking this – the press seems to have been giving it mostly negative reviews. Clearly WA is not intended as a Google replacement, but perhaps it was described as being one at some point. Today, being available to the public, it’s something different. It lets me look at data, mostly of the quantitative sort, and make all sorts of semi-interactive charts and comparisons. Here are some searches I liked: earthquakes in Japan, 1 cup of coffee, Tokyo to Osaka. I especially like the interactive earthquake graph.

WA is not without its problems though. Sometimes it’s hard to figure out what kind of queries you can make. I found the above mostly by experimentation. If they exposed more details about their data model and what they knew about each kind of object, maybe this would be easier. Right now I’m wondering why I can do a query like “largest cities” but not “largest cities in mexico”, for instance. I suppose this is mainly a question of maturity both on behalf of the system and of its users, though.

Search engines like PowerSet and WA are indicative of a broader trend towards semantics in computing and internet usage. While the semantic web isn’t here yet in the sense that we don’t have a semantic web browser or a unified way of querying the internet, clearly services that are based very heavily on semantic models are becoming mainstream. More on the impact of this in a future post.

Research idea: a snapshot

As part of an application form I had to fill out recently, I had to write a summary of my research ideas. Of course this changes all the time, since I’m still searching for a precise topic (and probably will be for a long time). But this is what a snapshot of those thoughts, taken now, looks like:

One of the most important problems in software engineering is reducing the impact of change. To this end, recently methods such as inversion of control (dependency injection) have become popular, in order to reduce the coupling to concrete interfaces. However, even with these schemes, there is still a dependency on specific names and abstract interfaces.  My project aims to investigate the possible use of semantic methods to address this problem. In essence, I want to allow developers to use semantic interfaces rather than syntactic ones to describe and access their components. 

Specifically, I am investigating techniques commonly used in the context of Semantic Web Services, such as ontologies and semantic/syntactic mediation, and their applicability to this problem. 

We may regard services as being somewhat large scale components. However, I am interested in applying these methods not just for large scale services distributed across the web, but also for small and numerous software components running in a single process. In such a setting, performance and scalability are important issues to investigate, in addition to the usual problems of reliability, correctness of composition, etc.

Realtime disease tracking

I just found out about BioCaster, a tool made by people at my institute. It tracks news in real time and lets you view the spread of diseases geographically.

I’ve seen similar services before (related to swine flu, etc), but this one lets you break down the data by disease and even by symptoms.

Asahi Shinbun mentions it in an article (Japanese).

Blogging revisited

This marks the launch of my second (possibly third) attempt at serious blogging. Unlike previous blogs, I intend for this one to have some specific purposes.

Primarily it will be a vehicle for me to express ideas about software and software research that I can’t express elsewhere. This will probably range from random thoughts to essayish writings. The benefit for me is that expressing myself coherently aids my thinking and understanding. In this sense the blog will be a kind of public notepad, but hopefully more well written than my notepads are.

Secondly, though surely this is too much to hope for, maybe it could attract comments from people who share my interests. Any kind of dialogue or exchange of ideas is always welcome.

This time around I use WordPress, one of the most popular blog packages. Times have changed since around 2004, when I made my own PHP/MySQL based system. At the time I had lots of free time and a kind of do-it-myself ethic. Today there is less time and no clear benefit in rolling my own. My focus should be more on the content and less on the form.

So let’s see how this instance of my blogging attempts turns out.

Technorati Profile