Prediction by Partial Matching

Prediction by Partial Matching ( PPM, English) is a family of adaptive statistical data compression algorithms, based on context models and forecasts. PPM models use a set of symbols from the previous symbol stream to predict the next symbol of the current.

Predictions are usually limited to ratings of the symbols. The number of previous symbols, defines the order of the PPM model, which is held as PPM ( n ). Unlimited versions without limitations on the length of the context also exist and are denoted by PPM *. If, because of all n context symbols no prediction can be made, according to a forecast is attempted due. This procedure is repeated until a match is found or no symbols remain in context. At this time, a prediction is established. This process is the reverse of that followed by dynamic Markov predictions that are based on a model of order 0.

A great deal of work on the optimization of a PPM model involves the use of inputs has not yet occurred in the input stream. The obvious way to handle it is to generate a "unknown " icon, which triggers the escape sequence. But what probability should be assigned to a symbol that has never occurred? This is called the problem of the 0 frequency. One approach allocates the " Unknown symbol " a fixed pseudo- value of 1. A PPM -D mentioned variant increases the pseudo- value at every appearance of the " Unknown symbol ". ( In other words, PPM -D thus estimates the probability of a new symbol, as the ratio of the number of unique symbols for a total number of all the symbols. )

Reactions of compression using PPM are very different in other details. The actual symbol selection is usually coded arithmetic, although Huffman coding or a kind of dictionary coding are possible. The underlying model of most PPM algorithms can also be extended to predict several symbols. It is also possible to use other than the Markov modeling, to replace it either completely or supplement. The symbol size is usually static, typically a single byte, which makes the generic support all file formats easily.

Publications on research on this algorithm family can be found as far back as the mid-1980s. Software implementations enjoyed until the early 1990s, no popularity, since PPM algorithms require a significant amount of memory. Newer implementations of PPM can be found among the most powerful lossless data compression method for text in natural languages.

Attempt to improve PPM algorithms led to the PAQ compression algorithms.

Computer memory Digital Object Identifier Peter Grassberger arXiv

659043