Complex data: its origin and aesthetics

Kolmogorov complexityÂ is a measure of the complexity of data. It is simple to define but appears to have deep implications. The K. complexity of a string is defined as the size of the simplest possible program, with respect to a given underlying computer, that can generate the data. For example, the string “AAAAAA” has lower complexity than “ACTGTT”, since the former can be described as “Output ‘A’ 6 times”, but the latter has no obvious algorithmic generator. This point becomes very clear if the strings are very long. If no obvious algorithm is available, one has no option but to encode the whole string in the program.

In this case, when writing the “program” output ‘A’ 6 times, I assumed an underlying computer with the operations “repetition” and “output”. Of course a different computer could be assumed, but provided that a Turing-complete computer is used, the shortest possible algorithm is unlikely to have a very different length.

An essential observation to make here is that the output of a program can be much longer than the program itself. For example, consider the program “output ‘A’ 2000 times”.Â K. complexity has an inverse relation to compression. Data with low K. complexity is generally very easy to compress. Compression basically amounts to constructing a minimal program that, when run, reproduces the given data. Data with high K. complexity cannot, per definition, be compressed to a size smaller than the K. complexity itself.

Now that the concept is clear, where does data with high K. complexity come from? Can we generate it? What if we write a program that generates complex programs that generate data? Unfortunately this doesn’t work – it seems that, because we can embed an interpreter for a simple language within a program itself, a program-generating program doesn’t create data with higher K. complexity than the size of the initial, first-level program. A high complexity algorithm is necessary, and this algorithm must be produced by a generating process that cannot itself be reduced to a simple algorithm. So if a human being were to sit down and type in the algorithm, they might have to actively make sure that they are not inserting patterns into what they type.

But we can obtain vast amounts of high complexity data if we want it. We can do it by turning our cameras, microphones, thermometers, telescopes and seismic sensors toward nature. The data thus recorded comes from an immensely complex process that, as far as we know, is not easily reduced to a simple algorithm. Arguably, this also explains aesthetic appeal. We do not like sensory impressions that are easily explained or reduced to simple rules. On first glance at least, hundreds of blocks of identical high density houses are less attractive than low density houses that have grown spontaneously over a long period of time (although we may eventually change our minds). Objects made by artisans are more attractive than those produced in high volumes at low cost. Life is more enjoyable when we can’t predict (on some level, but not all) what the next day will contain.

The deep aesthetic appeal of nature may ultimately have as its reason the intense complexity of the process that generates it. Even a view of the sky, the desert or the sea is something complex, not a case of a single color repeated endlessly but a spectrum of hues that convey information.

Monomorphic