Sample mean and sample covariance

The corrected or unbiased sample variance (s2 ) is an estimator for the variance of a random variable from observed values ​​, which come from a sample of the population. This variation is also used in the descriptive statistics as a measure of the spread of data. It differs pending on the sample length n of the uncorrected sample variance by a correction factor n / (n- 1), which is also known as Bessel correction (after Friedrich Bessel ).

Definition

The corrected sample variance of sample values ​​is defined as

Here, the average empirical, so the arithmetic mean of the sample.

The corrected sample variance is often referred to as empirical variance or simply as a sample variance. However, this name is not unique; some authors call with empirical variance and sample variance, the size

This is the maximum likelihood estimator of the variance of assuming a normal distribution.

Calculation without averaging

With the shift kit is available to the corrected sample variance in a continuous charge without prior averaging:

But it has a numerical inaccuracy resulted if the squared mean value of the data is much larger than the variance. Then it comes to extinction when calculating the difference in the above formula. Possible remedy: one pre- determined an approximation for the mean and calculated the corrected sample variance to

Unbiasedness estimate of the population variance

The denominator in the corrected sample variance is explained as follows: If the values ​​of the independent and identically distributed random variables with variance and the mean of the population is known, is

Is an unbiased estimator for the population variance and thus

An unbiased estimate for the variance. It applies in particular:

Usually, but not know the mean of the population and therefore underestimated him by the sample mean

Substituting this estimate blithely into the above formula a, we obtain the estimate for the population variance

To decide whether this estimator is unbiased, you look at it as an expression of the estimator

And calculated as the expectation value:

The definition of the variance, and the formula for calculating the standard error of the variance of the population and the sample size used in the penultimate equality sign. It follows that the estimator is not unbiased, and that gives an unbiased estimator for the variance when multiplied by the factor. This brings us to the corrected sample variance

For now this applies regardless of the exact distribution of the

The expected value of the corrected sample variance is thus equal to the population variance. The corrected sample variance is thus an unbiased estimate for the variance.

Representation of pairwise differences

The formula of the corrected sample variance can also be rendered so that it contains no reference to a location dimension as the average:

In particular, for a sample of size n = 2:

This formula returns the same value as the above formula as follows, for example, reveals:

Because

This fact results from

The formula

Clearly shows the intention of the corrected sample variance. Pairs are not taken into account; the difference of the corresponding x - values ​​would in any case only zero. Considered one them, which by the way would correspond to replacing the divisor n-1 by n in the above- mentioned definition, would be subject to the sampling variance of a misrepresentation regarding the data number n With only two data there were four couples, two of which, namely ( x1, x1) and (x2, x2 ) is always zero, the difference had what would push the sample variance firmly down. If you have a lot of data on the other hand, the proportion of (xi, xi ) pairs would go back drastically. Therefore, taking into account in the definition of the corrected sample variance only pairs (xi, xj ) with i ≠ j. Only when their differences occur at zero ( ie if xi = xj) is are ) or at least not large, it is an indication of a small scattering of the data.

The exclusive consideration of pairs ( Xi, Xj ) with i ≠ j at the calculation of the corrected sample variance equal to the above-mentioned definition, and so the variance justifies the division of the local sum by n-1.

The pairwise differences relating to definition of the corrected sample variance in this section forms the basis of the definition of empirical variogram values ​​that are unaware of the importance of the factor often called semi- variances.

Sample standard deviation

The root of the corrected sample variance s is the sample standard deviation. Since the expectation of fidelity is lost when using a non-linear function in most cases, the sample standard deviation is in contrast to the corrected sample variance no unbiased estimator of the standard deviation.

120247
de