LZ77 and LZ78

LZ77 ( Lempel-Ziv 77 ) is a data compression method that was published by Abraham Lempel and Jacob Ziv 1977. The authors first took advantage of that whole words, or at least parts of it, more than once in a text. In contrast, only the differences in the frequency of individual characters in a text was in the previously almost exclusively used Huffman or Shannon -Fano coding exploited (see also entropy ).

Function

The LZ77 algorithm is referred to as a sliding window (English: sliding window ). Here, a window of arbitrary constant size in a preview buffer and a text field is divided. The window now moves over the text to be compressed, the contents of the preview buffer is to be compressed and in the text field already compressed strings are, that are now used as a dictionary.

Compressed data is output as triples. Here, the position of the lexicon - entry is first specified in the window ( whether to count from left or right is started, does not matter, the main thing it is done consistently ), then follows the length of the data to be copied and then finally the following characters. The latter is necessary in order to compress characters that are not in the lexicon. Here is namely indicated as position and length of 0, which only the third part of the tuple is inserted, namely, the next character.

Example, the compression of the word PINEAPPLE with an 8 character large lexicon window plus 4 characters preview (the position is specified here from left):

01234567 | ________ | ____ | | | ANAN | → (0,0, A) | A | NANA | → (0,0, N) | T | ANAS | → (6,2, A) | ANANA | S | → (0,0, S) | PINEAPPLE | | → (-, -, -) finished as preview empty Reconstruction of code words [ (0,0, A), ( 0,0, S) ( 6,2, A), ( 0,0, S) ] with 8 character dictionary window:

| 01234567 | Attach no compression → "A" | (0,0, A) → | ________ Attach no compression → " N" | (0,0, N) → | _______A Copy next 2 characters from point 6 and append "A" | (6,2, A) → | ______AN No compression → " S " Append | (0,0, S) → | ___ANANA (-, -, -) → | __ANANAS | ready because input tuples empty history

The compression quality is directly dependent from the lexicon. To obtain good compression ratios, the window must therefore attain a certain minimum size. However, as to be compared to compressing text with each entry in the lexicon, the time needed to compress strongly increases with the size of the window. The LZ77 algorithm in pure form therefore initially received little attention. James Storer and Thomas Szymanski in 1982 improved some problems in the now Lempel-Ziv - Storer - Szymanski ( LZSS ) mentioned algorithm that was utilized by Phil Katz for the widespread Deflate algorithm. In Reduced Offset Lempel Ziv ( ROLZ, Lempel Ziv also Ross Williams by Ross Williams, 1991) and the Lempel -Ziv - Markov algorithm ( LZMA, Igor Pavlov, 1998), he found more familiar, modern successor.

Use

Today the LZ77 compression is used, inter alia, still on the Game Boy Advance and other embedded systems. Combined with Huffman coding, LZ77 is used in the well-used Deflate algorithm, which in turn is used by well-known compression programs as well as the graphics format PNG. That LZ77 is occupied with no patents, is probably the reason that he is still preferred today which a year later published LZ78 successor, who was in some patented until 2004 in parts.

Lempel–Ziv–Storer–Szymanski Lempel–Ziv–Welch LZX (algorithm)

536786