Standard deviation

  • The standard deviation of the sample, see: empirical variance.
  • The standard deviation of the sample mean value, see: standard error.

The standard deviation is a 1860 introduced by Francis Galton concept of statistics and probability and a measure of the dispersion of the values ​​of a random variable around its expected value. It is defined for a random variable as the square root of the variance and is as quoted.

If there is a series of observations of length, so empirical mean and empirical standard deviation are the two most important metrics in the statistics describing the properties of the observation series.

The standard deviation has the same dimension as the measured values ​​of the observation range. The dimension of the variance, however, is the square of the dimension of the observation values ​​.

As a shortcut, you can find next in applications, especially for the empirical standard deviation s often or SD ( standard deviation for English ), and mF for mean error. In applied statistics, one often finds the shorthand notation of the type " Ø 21 ± 4", which is read as " average 21 with a standard deviation of 4".

  • 3.1 General case 3.1.1 Basis of calculation
  • 3.1.2 example
  • 3.3.1 Basis of calculation
  • 3.3.2 Example

Definition

The standard deviation of a random variable is defined as the square root of the variance:

Here, the variance

Of always greater than or equal to 0, the symbol denotes the expected value.

Examples and rules of thumb

Normal distribution

One-dimensional normal distributions are completely described by specifying the expected value and variance. Thus, if a - random variable - in symbols - as their standard deviation is simple.

Scattered intervals

From the standard normal distribution table can be seen that for normally distributed random variables

Lie. Since in practice many random variables are approximately normally distributed, these values ​​from the normal distribution are often used as a rule of thumb. Thus, for example, σ as estimate the half-width of the interval, which includes the middle two thirds of the values ​​in a sample, see quantile.

Values ​​outside of the two -to three- times the standard deviation are often treated as outliers. Outliers can be an indication of gross errors of data collection. It can make the data but are also a highly skewed distribution is based. On the other hand, is a normal distribution, on average, about every 20th reading is out of two times the standard deviation and about 500 each measured value is outside three times the standard deviation.

As the proportion of values ​​outside of the six times the standard deviation of about 2 ppb is vanishingly small, such an interval is considered a good measure of a nearly complete coverage of all values ​​. This is used in quality management through the Six Sigma method by the process requirements dictate tolerance limits of at least 6σ. However, you go there from a long-term mean shift of 1.5 standard deviations, so that the allowable error percentage increases to 3.4 ppm. This error component corresponds to a four and a half times the standard deviation ( 4.5 σ ).

For confidence intervals ( - zσ, zσ ) the following shares of the expected values ​​are within or outside the interval:

An example ( with variation )

The body size of humans is approximately normally distributed. In a sample of 1,284 girls and 1,063 boys between 14 and 18 years an average height of 166.3 cm (standard deviation 6.39 cm) and in boys was in the girls an average height of 176.8 cm (standard deviation 7.46 cm).

Accordingly, the above variation can be expected that 68.3 % of girls 166.3 cm ± 6.39 cm and 95.4 % in the range have a body size in the range 166.3 cm ± 12.78 cm,

For the boys, it is expected that 68 % have a body size in the range 176.8 cm ± 7.46 cm and 95 % in the range 176.8 cm ± 14.92 cm,

Discrete distributions, dice

The discrete uniform distribution on the numbers has an expected value of and a standard deviation of. The result of the throw of a fair six-sided cube has thus, for example, the expected value of 3.5 and a standard deviation of about 1.7.

Binomial distribution

Is a binomial distribution with parameters ( number of repetitions ) and ( probability of success ), then and, so

If you roll, for example, 500 times with a fair dice, the number of ones is a binomial distribution with and. Is the expected value

And the standard deviation

Because a binomial distribution with the above parameters is normally distributed approaching, the thumb can therefore expect that in 68 % of cases, the number of ones is 75-92 and in 95 % of cases between 67 and 100

Estimate of the standard deviation of the population from a sample

General case

Basis of calculation

Are the n random variables independent and identically distributed, so for example, a random sample, the standard deviation of the population of the sample often contains the formula

Estimated. It is

  • The estimator for the standard deviation of the population
  • The sample size (number of values ​​)
  • The characteristic values ​​of the i-th element of the sample
  • The empirical mean, so the arithmetic mean of the sample.

This formula explained by the fact that the corrected sample variance is an unbiased estimator for the population variance. In contrast, however, is not an unbiased estimator of the standard deviation. Since the square root is a concave function, it follows from the Jensen's inequality

Thus, this estimator underestimated in most cases, the standard deviation of the population.

Example

If you select one of the numbers or by tossing a fair coin, so both with probability each, so this is a random variable with mean 0, variance and standard deviation. If one calculates from independent litters and the corrected sample variance

In which

Denotes the sample mean, so there are four possible test outputs, all of which have each probability:

The expected value of the corrected sample variance is therefore

The corrected sample variance so is therefore actually unbiased. The expected value of the corrected sample standard deviation is, however,

Thus, the corrected sample standard deviation underestimates the standard deviation of the population.

Calculation for accumulating measurements

In systems which detect continuously large quantities of measurements, it is often impractical to cache all the measured values ​​to calculate the standard deviation.

In this connection, it is effective to use a modified formula, bypassing the critical term. This can not be calculated for each measured value immediately, since the mean is not constant.

By applying the shifting theorem and the definition of the average value you get for the representation

Which can be updated for each incoming reading immediately if the sum of the measured values ​​and the sum of their squares carried and are continuously updated. This representation, however, is numerically less stable, especially the term under the square root can numerically by rounding errors less than 0.

A similar algorithm is described by Donald E. Knuth in The Art of Computer Programming.

Normally distributed random variables

Basis of calculation

For the case of normally distributed random variables, however, an unbiased estimator can specify:

It is

  • The unbiased estimate of the standard deviation and
  • The gamma function.

Example

It was in a sample from a normally distributed random variable, the five values ​​3, 4, 5, 6, 7 measured. We shall now calculate the estimate for the standard deviation.

The sample variance is:

The correction factor is in this case

And the unbiased estimate of the standard deviation is thus approximately

576640
de