Solid compression

Progressive compression, including compact compression or solid compression (english solid compression ), is a method or a preprocessing step in compressing multiple files. The files will be combined into one or several large blocks and then compressed across files. In this way, usually one achieves higher compression than if each file is compressed individually. The size of this advantage will depend on how similar the files.

Function

Solid compression can always be used when multiple files are bundled in an archive. If solid compression is used, then all files will be summarized before the actual compression and then compressed as a single continuous stream. Otherwise, that is, without solid compression, individual files are first compressed independently and combined until after the actual compression to an archive file. Usually progressive compression improves the compression rate - especially with many smaller and similar files (such as log files). This is due to the fact that in the solid compression and redundancies between different files can be exploited for data reduction, whereas only redundancies can be exploited within the respective file with no sound compression.

Some archiving programs (eg RAR) sort the files by file type before, so as to improve the compression rate of a little more.

Technical Explanation

Modern compression programs use a combination of dictionary compression (eg LZ77 ) and entropy coding (eg Huffman coding ), as is the case, inter alia, the Deflate algorithm.

The aim of the dictionary compression is to replace multiple occurrences of byte sequences, so these need to be stored only once. The dictionary used in this case ( also glossary ) is now ia R. gradually built up during the compression and decompression process ( starting from an initial, empty dictionary ) so that the dictionary does not have to separately transmitted or stored. Be transferred only literals, ie byte sequences that are not listed in the dictionary, or references to existing dictionary entries. All transferred literals are immediately added to the dictionary - by both the encoder and decoder from. This leads to a " warm-up " at the beginning of the compression process, the dictionary must be first filled before data is actually by dictionary references can be saved ( see, eg, LZ77 and LZ78 ).

Without sound compression only redundancies within each file can be removed. Redundancies between multiple files be disregarded because it starts again for each file with a blank Wörtbuch. Also, by every time new " warm up " the compression suffers a bit. In the solid compression, however, can be used for all files consistently the same dictionary. This is especially important files with similar content is of great advantage. For small files, this advantage is even more pronounced, since the " warm-up " has a larger share.

The above considerations apply similar form him for the adaptive context models that are used in the entropy ( cf. PPMD or LZMA ).

Disadvantages

Since a solid archive consists only of a data stream that has been compressed continuously is not possible random access to individual files. This means that until all the files are in the archive before this file must be unzipped when unpacking a particular file. This usually happens only in memory, since only a portion of the data is required later to decompress the file you want and the rest can be discarded again. In case of damage of the archive may also extend further over the affected file out by a mistake. How far depends on the height of the compression rate, but can lead to the loss of all data from the error position.

More files can be added only at the end of the archive.

Deleting files from the archive file is only possible if the data stream is completely decompressed, the files to be deleted are removed and then the files are compressed solid.

In order not each time having to unzip the entire archive can, as Komprimisslösung also limits the length of contiguous compressed data and thus independent compressed blocks are created.

Use

Progressive compression 7z, among others, the archive formats, RAR, ACE and ARC support.

In Unix environments separate tools for archiving and compression are traditionally used (see Unix philosophy ). Usually, all of the files are combined with the tool to a tar ( uncompressed ) archive, which can be compressed afterwards. For the compression, for example, gzip (results. Tar.gz ), bzip2 (results. Tar.bz2 ) or xz (results. Tar.xz ) are used. This procedure corresponds to a progressive compression.

The widespread zip file format, however, does not support the progressive archiving. By the use of two nested ZIP archives but a progressive compression can be achieved. To do this, first summarizes all the individual files along with compression -less ZIP - archive. Then you compress this zip file with the desired level of compression.

483495
de