Range encoding
The range coding (English range encoding) is a data compression method for entropy coding, which realizes a sort of arithmetic coding.
The coding region is often seen as an alternative description of the arithmetic encoding and as basically identical to this one. It is based on the 1979 published document "Range encoding: an algorithm for removing redundancy from a Digitised message " (English to German about the area coding, an algorithm to redundancy of digitized messages to remove ), by G. N. N. Martin. Due to the age of the document, it is assumed that the method for arithmetic coding implementations described herein are not affected by the patents in the arithmetic coding. This has sparked interest in the art, especially in the free software community.
Operation
The coding region encoded in principle, all symbols of a message to a number - in contrast to the Huffman coding, which assigns each symbol a bit pattern and the bit pattern after another lined up again. Therefore, the coding region is not as such limit the need to use at least one bit of a symbol, and also does not suffer from this such as the inefficiency in the use of probabilities of occurrence which is a multiple of two are not exact. Thus, in order to achieve better compression rates.
The central concept behind the range coding is simplified as follows: If you have a sufficient range (English range) of integer values as symbols and an assessment of the probabilities of occurrence of the symbols, the original area can be easily divided into sub-areas, whose sizes are proportional to the probabilities of occurrence symbols they represent. Thus, each symbol of the message to be coded by the value range is reduced to the sub-area corresponding to the next symbol to be encoded. The decoder has to have the same probability of acceptance as the encoder are available, either previously transferred, derived from data already transferred or may be part of the encoder and decoder.
If all the symbols are coded, it is enough to transmit the message, display the sub-region, provided of course that the decoder gets to know when the message is complete again. A single integer is sufficient to indicate the sub-region, and it must not even be necessary to transfer the whole integer, but maybe there are enough of the first centers in order to describe the original message clearly.
Example
It is the message " AABA
A: [0, 60000 )
B: [ 60000, 80000 )
AA: [0, 36000 )
AB: [ 36000, 48000 )
A
AAA: [0, 21600 )
AAB: [ 21600, 28800 )
AA
AABA: [ 21600, 25920 )
AABB: [ 25920, 27360 )
AAB
AABAA: [ 21600, 24192 )
AABAB: [ 24192, 25056 )
AABA
Now it may seem as the main problem to choose at the beginning of a sufficiently large area to encode all the symbols without leaving when dividing integers or to get zero. In practice, this problem is not apparent, however, since the encoder can increase the number gradually and first places that have already been established, already able to issue and no longer needed for the calculations. So is working at all times with a small number range by issued fixed numbers and this on the other side which must be added. In the example was already after processing the first three symbols "2 " as the first digit of the result determined and would no longer need to be included in the calculation.