Wednesday, October 11, 2006

Data Compression

First: lets break the term:

  1. Data: here it is the sequence of data elements (say bytes).
  2. Compression: To 'pack' in such a way that the resultant takes lesser 'space' than the original.
Thus - Data Compression is a technique used to decrease the size of a given data to a smaller size (i.e. lesser no. of data elements) in such a way that the original data can be reproduced from this new smaller size data.
There are two types of Data Compression:
  1. Lossless: In this type the exact original data can be reformulated from the compressed data.
  2. Lossy: Here only a slightly different version of the original data can be reformulated.


For example: using this technique one can now pack the same data in less no. of discs, say 4 that would have other wise required say 14 discs. The compressed data then at the receiving end has to be decompressed (processed by a decompressing algorithm) to reproduce the original data.

-Though floppies are now obsolete, stressful lengthy downloads -we all are familiar with. Unfortunately even though this amazing tool has been around since decades the internet community has not yet adopted it completely & that efficiently or at least has not been able to keep up with advances in the technology. Data compression has an extremely wide spectrum of applications and even in standalone platforms. E.g. it is used in boosting a 3D-game’s performance by compressing high quality textures to take up minimal of the then “very precious” memory space. All the audio, video and image standards (at least pertaining to PCs) are basically only about different Data compression techniques (Lossy), (with one exception of multiple “channels”). Data compression utilities are readily available. E.g.: the popular WinZip*. there’s an another utility available called 7-Zip (7-zip.org) absolutely for free. The most amazing part of it is that it renders the BEST compression ratio (Owing to the new LZMA Algorithm); better than any other compression utility in the market. It’s even an "Open-Source" distributed under the GNU's LGPL. What this means is that the source code (lame: Formula) of the software is now available to anyone and every one for any kind of modification. And this kind of modification only means the bettering of the software. Moreover as per the "L" that prefixes the "GPL" mentioned above - the library version of 7-Zip (as a module) can be linked to even a proprietary software. And what this means is that it can be very efficiently & seamlessly integrated into say a Web Browser like  one more point that deserves mention here is that the linked module can be “updated”. In case of dynamic linking this can be accomplished completely independently. Even in case of static linking the software designer is obliged, under the terms of LGPL, to facilitate up-gradation of the module by providing the complete source code or linkable object files of the software. Even though employing more efficient compression technique does require more processing resource, the advantage offered by it completely overshadows this glitch- in most, if not all, circumstances. Never the less- the computing power of modern CPUs has been on a run and shows no sign of slowing -with advances like multi-core, Hyper threading, enhanced Microcode expansion, Ultra high speed buses. Whilst on the other hand the level of detail and data hunger of newer applications is also increasing rapidly and so is the number of nodes attached to the internet at any given instant. These factors have elevated Data Compression to high priority - to play a very influential role to modern computing. The “open”-ness and a standard model is going to be crucial to progress in Data Compression technology and hence modern computing. As of now what is needed is that ALL of the computing community adopt this solution, And bring awareness to those who use inefficient software for archiving their uploads.