Entropy estimation

The topic of Entropieschätzung deals with the different methods for the statistical estimation of Shannon entropy based on finite samples. For the formal computation of the Shannon entropy, the knowledge of the probabilities of the underlying item source is necessary, as defined. However, in practice, these probabilities are usually unknown, and one has to rely on to estimate the probabilities of messages from a given finite sample, in order to deduce the entropy of the whole. Due to the inherent statistical fluctuations in finite samples systematic and non-systematic deviations in the estimates are to be expected. In the ordinary likelihood estimator of the entropy of the probabilities are, in the Shannon entropy

Replaced by the maximum likelihood estimator. Appears in the case of total observations the event with an absolute frequency, thus leading to the use of the commonly used in practice likelihood estimator of the entropy

This estimator is particularly suitable if the sample is much greater than the possible number of different events, that is, from a statistical point of view is given. Otherwise, the estimator above often leads to a systematic underestimate of the entropy. This error becomes particularly remarkable when the size of the sample is not very much larger than the number of different messages from the source. In practice, however, the latter is often of particular interest.

"Finite - Sample" corrections

There are a number of approaches in the literature that deal with it, to reduce the systematic error with appropriate correction terms successively. It usually Taylor series expansion of entropy be made. For corrections to first order in, for example, gives the estimator

The correction term was first considered by Miller for the examination of medical data. Other applications in the context of genetic research, for example, were made ​​later by Herzel. The first calculations of correction terms up to second order were first published by Harris. It turns out that the second-order correction terms are not independent of the probabilities to be estimated. In addition, a substitution of the probabilities into these terms by the likelihood estimator not to improvements. For practical purposes, therefore, the result of Harris is not very suitable.

Higher order corrections

An alternative approach to contribute in only observable contributions to the correction terms of higher order, was first proposed by Peter Grassberger. For the probabilities to be estimated while the condition is assumed, the absolute frequencies are considered independent, a Poisson random variable. These assumptions are particularly interesting in the case studies mostly very well satisfied. Starting point in the derivation of higher order corrections is the Renyi entropy of order

The formal connection with the Shannon entropy is given by the border crossing, ie. It then appears obvious, first look for undistorted estimators for each of the summands. For the case of integer values ​​exist such unbiased estimators, ie

With for. For a formal education of the limit an analytic continuation for arbitrary real values ​​is necessary from. From the Grassberger - function was proposed to. Although this does not lead to unbiased estimators for the entropy, but it results in an asymptotically undistorted Entropieschätzer,

Leading finite sample in practice improvements. The function denotes the so-called digamma function. The systematic error of this estimator is smaller than that of the estimator with the corrections proposed by Miller For the interesting case of small probabilities.

Systematic corrections

In a similar manner, a parameterized family of general Entropieschätzern can specify which continue the above estimator or represent asymptotically. Instead of a Poisson binomial distribution for the absolute frequencies is thereby assumed. Further restrictions on the probabilities are not made here. As Entropieschätzer is thus obtained

Where the real variable parameterized different Entropieschätzer.

Examples

1 In the case of the correction term vanishes and one obtains the Entropieschätzer

A similar estimator was also discussed by Wolpert and Wolf in the context of Bayesian theory. Asymptotically, this estimator corresponds to the Miller estimator.

2 The estimator reproduces approximately the estimator. Numerical analyzes show that the difference between and is negligible. The systematic error is less than the bias of the estimator.

The third case corresponds asymptotically to a further derived by Grassberger Entropieschätzer. The latter has a smaller bias than the estimator and Miller.

Systematic error (bias )

The systematic error of an estimator is defined as the expected deviation between the observed and the estimator variables to be estimated. In the present case arises under this definition

This expression is explicitly dependent on the probabilities and the parameter. For each choice of these variables is a characteristic value for the estimation error, which can be determined as follows analytical results

The function on the right side of this formula is an incomplete beta function and belongs to the class of so-called special functions. For the non-systematic error is no such formula can be derived, however. The latter must therefore be determined numerically in general.

Credentials

309660
de