Bias of an estimator

Unbiasedness (rarely unbiasedness, English unbiasedness ) referred to in mathematical statistics a property of an estimator (in short: an estimator ). An estimator is called unbiased, if its expected value is equal to the true value of the parameter to be estimated. Is not an estimator is unbiased, it says that the estimator is biased. The extent of deviation of its expected value from the true value is called distortion or bias. The bias expresses the systematic error of the estimate.

Unbiasedness is one next to consistency, sufficiency and ( asymptotic ) efficiency of the four common criteria for assessing the quality of estimators.

Importance

The expectation of loyalty is an important property of an estimator, since the variance of the estimator converges most with increasing sample size tends to zero. That the distribution takes place around the expected value of the estimator, and thus with unbiased estimators to this true parameters of the population, together. For unbiased estimators, we can expect that the difference between the calculated from the sample estimate and the true parameter is the smaller the larger the sample size.

Except for the practical evaluation of the quality of estimators is the notion of unbiasedness for the mathematical theory of estimation is of great importance. In the class of all unbiased estimators, it is possible - under suitable assumptions on the underlying distribution model - the existence and uniqueness of best estimate to prove. These are unbiased estimators that have minimum variance among all possible unbiased estimators.

Basic idea and introductory examples

To estimate an unknown real parameter of a population, is calculated in mathematical statistics from a random sample using a suitably chosen function an estimate. In general, take appropriate estimators by means of estimation methods, such as the maximum likelihood method, to win.

Since the sampling variables are random variables, and the estimator is itself a random variable. He is expected faithful called when the expected value of this random variable is always the same parameters, no matter what value is in reality.

Example, the sample mean

To estimate the expected value of the population is usually the sample mean

Be used. If all sampling variables drawn randomly from the population, all have the expected value. Thus, the expected value of the sample mean is calculated as

The sample mean is therefore an unbiased estimator of the unknown distribution parameter.

If the population is normally distributed with expected value and variance, then can specify the relative distribution of. In which case

That is, the sample mean is also normally distributed with mean and variance. If the sample size is large, due to the central limit theorem this distribution statement is true at least approximately, even if the population is not normally distributed. The variance of this estimator converges to 0 as the sample size goes to infinity. The chart on the right shows how more and more contracts, the distribution of sample means to a fixed value for different sample sizes. Due to the expectation of fidelity is assured that the value of the requested parameter.

Example relative frequency

To estimate the probability with which a particular feature occurs in the population, it becomes a sample of peripheral randomly selected and counted the absolute frequency of the feature in the sample. The random variable is then a binomial distribution with parameters and, in particular, applies to its expected value. For the relative abundance of

Then follows that is to say, it is an unbiased estimator of the unknown probability.

Mathematical definition

In the modern, maßtheoretisch justified mathematical statistics, a statistical experiment is described by a statistical model. This is on of a set, the sample space, together with a σ - algebra and a family of probability measures. An estimator for a given function by a real characteristic of the distribution parameter is a measurable function.

An estimator is called unbiased, if for all

Holds, where the expectation with respect to the probability measure called.

In applications is often the distribution of a ( real or vector-valued ) random variable on a probability space with an unknown parameter or parameter vector. An estimator for is then given by a function and is called analogy is unbiased if and only if

Where the expectation value is now formed with respect.

Estimator with distortion

It follows from the definition that "good " estimator at least be approximately unbiased, so should be distinguished by the fact that they are close to the value to be estimated on average. Usually, however, unbiasedness is not the only important criterion for the quality of an estimator; he should, for example, also have a small variance, thus fluctuate as low as possible to the value to be estimated. In summary, there is the classic criterion of minimum mean square deviation for optimal estimators.

The distortion of an estimator is defined as the difference between its expected value and the size to be estimated:

Is its mean square error

The mean squared error is equal to the sum of the square of the distortion and the variance of the estimator:

In practice, distortion may have two causes:

  • A systematic error, for example, a non- random measurement error in the apparatus, or
  • A random error, the expected value is not 0.

Random errors may be tolerable if they help ensure that the estimator has a smaller minimum square error than an unbiased.

Asymptotic unbiasedness

Usually it is not important that an estimator is unbiased. Most results of mathematical statistics are only asymptotically, ie, when the sample size grows to infinity. Therefore, it is usually sufficient if unbiasedness is valid in the limit, ie, the convergence statement holds for a sequence of estimators.

Another example: the sample variance in the normal distribution model

A typical example are estimators for the parameters of normal distributions. Consider, in this case, the parametric family

Each corresponds to a probability distribution that is normally distributed with mean and variance. Usually, observations are given, which are statistically independent and each have the distribution.

As seen previously, the sample mean is an unbiased estimator of.

The variance is obtained as the maximum likelihood estimator. However, this estimator is not unbiased, since it can be shown. The distortion is so. Since these asymptotically, ie for, disappears, the estimator is asymptotically unbiased, however.

In addition, you can specify exactly in this case, the expected value of the distortion and thus correct the distortion in which one with multiply (so-called Bessel's correction, see corrected sample variance ), and thus obtains an estimator for the variance, which is also for small sample is unbiased is.

In general, however, it is not possible to determine exactly the expected distortion and thus completely correct. But there are methods to reduce the distortion of an asymptotically faithful estimator for finite sample at least, for example, the so-called jackknife.

Best unbiased estimators

An important application of unbiasedness is that often uniformly best estimator can be constructed, while if we restrict ourselves to the case of unbiased estimator. The goal is to find estimators which a given risk function, often placed as the root mean square deviation, minimize over a whole class of estimators. Mostly there are no estimators that are optimal over the class of all arbitrary estimator, so you have to be limited to part classes. A typical subclass are the unbiased estimators. For these estimates, the root mean square deviation is equal to the variance. An unbiased estimator is therefore in this sense optimal if for all unbiased estimators and for all the inequality

Applies, so if its variance over all parameter values ​​is uniformly smaller than in all other unbiased estimators. Best unbiased estimators are therefore also called UMVU estimator for Uniformly minimum variance Unbiased.

By the theorem of Lehmann- Scheffé an unbiased estimator is then exactly a best estimator, if it can be represented as a function of suffizienten and complete statistics. For example, it can be shown that the sum of all sampling variables is a suffiency and full statistics for the expectation value of a normally distributed population. It follows that the sample mean as a function of a best unbiased estimator for.

314913
de