Move-to-front transform

Move to front ( "Move Forward" in English, also: rotary encoder ) is a coding method that is well suited to data derived from the Burrows - Wheeler Transform, further processing, so that they can then compress effectively.

Format of the input and output data

The input data for the Move- to- Front are a finite alphabet and a string from this alphabet. In the alphabet can be, for example, ASCII or Unicode, but also to bytes.

Issuing move-to - front is a sequence of natural numbers, wherein each number is less than the length of the alphabet.

Operation

Move -to -front coding:

This process leads to characters that often occur in the input, during encoding are relatively far forward in the alphabet a. In turn, the output contains more small figures than larger, and in turn is useful in order to then compress the sequence of numbers, such as the Huffman coding.

Move-to -front decoding:

The decoding is almost the same as the encoding, except that the position at which the alphabet is changed, already known ( the number of the input ), while it must first be determined in encoding.

Example

The string " Mississippi " should be encoded with the MTF algorithm. The alphabet, which is used here, is ( for brevity ) " ABCIMPSabcimps ".

The following table lists characters in each case the input character, the current alphabet alphabet. The output is the position of the character in the current alphabet ( starting with 0), and alphabet ' is the new alphabet, caused by the fact that the input character is moved to the beginning.

The result of the encoding is the text ( 4,10,13,0,1,1,0,1,13,0,1 ). If you omit the re-sorting of the alphabet, you get to compare the text ( 4,10,13,13,10,13,13,10,12,12,10 ). One can see that, after a short " induction phase " (here 3 characters: 4,10,13 ) relatively frequent small numbers are issued, what is good for subsequent compression.

To decode MTF again, you go the opposite way:

The sequence of numbers ( 4,10,13,0,1,1,0,1,13,0,1 ) to be decoded using the alphabet " ABCIMPSabcimps ". In the following table is the position number of the sequence number to be decoded, and the decoded symbol character. The column alphabet and alphabet ' are exactly the same as in the coding table above.

The output of the decoding is thus as expected "Mississippi ".

584806
de