Milestones:Lempel-Ziv Data Compression Algorithm, 1977
Lempel-Ziv Data Compression, 1977
IEEE Israel Section,
Dedication: 6 September 2004
The data compression algorithm developed at this site in 1977 by Abraham Lempel and Jacob Ziv became a basis for enabling data transmission via the internet in an efficient way. It contributed significantly in making the internet a global communications medium.
In order to quickly and efficiently transfer data, the ability to compress data is desirable. The Lempel-Ziv Algorithm allows for a simple compression of data. The algorithm was first published in the IEEE Transactions on Information Theory in May 1977.
Professors Lempel and Ziv teach and conduct research at the Technion - the Israel Institute of Technology, located in Haifa. Together they wrote the algorithm which was simple yet effective. According to Ziv, Lempel concentrated on the practical aspects, while he focused on the theoretical.
The algorithm works by replacing a strings of characters with single codes, called “tokens.” Each time the algorithm recognizes a new string, it outputs the string and then it adds it to a table, or dictionary. The next time it encounters that string, it outputs only the new code from the table. The output of a single code instead of a string of codes means that the data has been “compressed” or shortened, making it more efficient to transmit (as long as it can be decoded at the other end!). The first 256 codes are assigned to the standard 8-bit character set, so, for example, in a 12-bit code the dictionary can contain almost 4000 tokens.
A basic example would be:
“I am an engineer therefore I am an engineer, and only if I am an engineer”
“I am an engineer* there&fo& *, and only if *.
After the first time “I am an engineer" appeared, it was assigned the value *. Afterwards, that phrase is indicated only by the *. Likewise, & has been assigned the value “re”. Actually, the real algorithm would compress the space-letter combinations as well, but that would confuse this example. So you can see how much shorter the second message is than the first. On the other hand, the word “therefore” itself used the same number of characters after compression. Therefore (pardon the pun), the longer and more redundant the original string, the more compression that will be achieved.
The recognition of strings can be done very simply using a looping program to add each incoming character in a line and checking against the table to see whether or not it is an existing string. It further turns out, because the algorithm first sends out the string as well as the token, that a reverse decompression algorithm can reconstruct the original message without needing any additional information (such as the dictionary itself).
The simplicity and ease of the algorithm gives it a wide range of powerful applications. It has been used in various ways in the creation and growth of the Internet. One of its uses that is still important today is the storing and sending of .gif image files.