Lempel–Ziv–Markov chain algorithm

The Lempel -Ziv - Markov algorithm ( LZMA ) is a free data compression algorithm developed by Igor Pavlov Viktorovich since 1998 and achieved relatively good compression rates and high speed while unpacking. It is named after Abraham Lempel and Jacob Ziv, who developed the LZ77 algorithm, and after Andrei Markov, after which the Markov chains were named.

The algorithm uses a procedure similar to LZ77 dictionary and thus can be seen in principle as a further development of deflate.

  • 4.1 Unix platforms

Features

  • Very good compression ( average better than bzip2 )
  • Rapid decompression ( about twice as fast as bzip2 )
  • There are very large dictionaries support ( up to one gigabyte )
  • Pronounced asymmetry For the decompression only a fraction of the memory used for compression is required (under Windows about 2 MB dictionary size), additional memory requirements when packing is a multiple of the size of the dictionary.
  • Unzip usually requires a fraction of the computational complexity of the corresponding packing process - about 10 to 20 times as fast as the compression.

Technology

LZMA uses an improved variant of the LZ77 algorithm, Markov chains and a Bereichskodierer ( an implementation of arithmetic coding ) for entropy coding.

The code for unpacking of LZMA is in compiled form usually only about 5 Kbytes. The amount of memory needed when unpacking depends on the size of the dictionary used in the packing. Due to the low Entpackergröße and the rather low memory requirements ( especially with smaller dictionaries ), the method is particularly well suited for embedded applications.

In the 7- Zip implementation of different variants of hash nodes, binary trees and Patricia trie be used for dictionary lookup.

At LZMA2 the entire amount of data to be compressed may be divided into sections. The parts can be processed by its own process and joined together later. It is used by all processes, a common dictionary built up, but the rest of data are necessarily processed at the entropy coding as separate data streams, so that the sub-optimality of the range coding ( of less than one byte per part ) corresponding to frequently come into play and each block still gets minimalist own header data.

Use

Besides the use of special file formats for compressed data LZMA support has been integrated into many other systems. So is in the transparent compression and decompression of executable files with UPX ( since version 2.92 beta) or Upack and compressed file systems such as squashfs (or cramfs, with appropriate patches ) and the LZMA for election. With a large number of Linux distributions ( ArchLinux since March 2010, source packages in the distribution Gentoo Linux, Slackware Linux since May 8, 2009, openSUSE since March 27, 2008, Pardus ' package management and the Debian package management system provide support, ...) can now LZMA compressed installation packages are used. Even software installation systems for Windows like the Nullsoft Scriptable Install System and Inno Setup to create a kind of extended self-extracting archive files that can be compressed with LZMA.

File Formats

Originally, it could only be used with the new 7z format of 7 -Zip. By now, several other formats. In the case of xz a new format was created specifically with regard to LZMA support, which is especially designed for exclusive use with LZMA. ( Similarly, the lzip format) In the case of the new ALZip format (. Egg files) file format was created as part of a displayed with the availability of more modern methods modernization of the file format capabilities in the context of a compatibility break with the format a new, more modern ( flexible ) which is now used mainly with LZMA - compressed content. In the case of Zip2 (eg with WinZip since version 12.0 or 7-Zip since version 4.61 beta) a ( Expandable ) Format LZMA support existing has been added.

Software

The reference implementation of LZMA is done in free software. She came first in the form of the 7- Zip program and is now also isolated published in the form of the LZMA SDK. The free reference library for LZMA compression was written in C and supports multithreading.

Currently there are three functioning transmissions on Unix-like platforms:

  • P7zip is a port of the current command-line tool 7z, thus fully supports the 7z archive format, and often serves as a foundation for the 7z functions graphical tools with 7z support such as Karchiver and WinRAR.
  • Lzip was the first LZMA solution for Unix -like operating systems that completely copied the familiar concept of gzip.
  • XZ Utils are a port of the LZMA code of 7- Zip, which provide Linux for LZMA - pack method, the same type as the handling gzip and bzip2 established, which do not support 7z archives. XZ Utils used but in contrast to p7zip mostly an older version of LZMA code.

Furthermore, GRUB2 used in July 2008 by default LZMA instead of LZO previously used ( for the time being but only for i386 -pc).

History

  • The reference implementation of 7-Zip was published in 2000.
  • The source code of LZMA SDK is released since 23 November 2008 ( version 4.61 beta) public domain (English " public domain ").
  • With version 9.04 beta 7-Zip, the algorithm LZMA2 was introduced on May 2009, which represents a slight modification of the original algorithm, the better multithreading support and improve the treatment of non-compressible content.

Unix platforms

Since the source code of 7 -Zip extensive use of Windows-specific properties is made, passed after its first publication some time until the appearance of a Unix -compatible version, although it is free software. LZMA was first used in 2004 to a port of the command line version of 7-Zip named p7zip on Unix platforms. Is in the same year the ( much more portable ) LZMA SDK is available at the command line program " lzma_alone " included. was similar to gzip or bzip2 lzma_alone with tar used together to accommodate file metadata and rights information from Unix file systems and operating systems can. Less than a year after the first publication of the LZMA SDK LZMA Utils published Lasse Collin (initially consisting only of a set of wrapper scripts ) created one for Unix users familiar (gzip -like ) user interface to lzma_alone.

Published in 2008, Antonio Diaz lzip, which is a container format with checksums and magic numbers offered instead of the raw LZMA data stream. Thus, a complete solution for the use of LZMA was given in Unix fashion, but could prevail only partially in before the LZMA Utils have evolved accordingly and now offered similar under the name " XZ Utils ". The XZ Utils now seem to prevail as LZMA implementation for Unix-like platforms. Your xz file format is now supported by the reference implementations.

506234
de