Standard deviation
- The standard deviation of the sample, see: empirical variance.
- The standard deviation of the sample mean value, see: standard error.
The standard deviation is a 1860 introduced by Francis Galton concept of statistics and probability and a measure of the dispersion of the values of a random variable around its expected value. It is defined for a random variable as the square root of the variance and is as quoted.
If there is a series of observations of length, so empirical mean and empirical standard deviation are the two most important metrics in the statistics describing the properties of the observation series.
The standard deviation has the same dimension as the measured values of the observation range. The dimension of the variance, however, is the square of the dimension of the observation values .
As a shortcut, you can find next in applications, especially for the empirical standard deviation s often or SD ( standard deviation for English ), and mF for mean error. In applied statistics, one often finds the shorthand notation of the type " Ø 21 ± 4", which is read as " average 21 with a standard deviation of 4".
- 3.1 General case 3.1.1 Basis of calculation
- 3.1.2 example
- 3.3.1 Basis of calculation
- 3.3.2 Example
Definition
The standard deviation of a random variable is defined as the square root of the variance:
Here, the variance
Of always greater than or equal to 0, the symbol denotes the expected value.
Examples and rules of thumb
Normal distribution
One-dimensional normal distributions are completely described by specifying the expected value and variance. Thus, if a - random variable - in symbols - as their standard deviation is simple.
Scattered intervals
From the standard normal distribution table can be seen that for normally distributed random variables
Lie. Since in practice many random variables are approximately normally distributed, these values from the normal distribution are often used as a rule of thumb. Thus, for example, σ as estimate the half-width of the interval, which includes the middle two thirds of the values in a sample, see quantile.
Values outside of the two -to three- times the standard deviation are often treated as outliers. Outliers can be an indication of gross errors of data collection. It can make the data but are also a highly skewed distribution is based. On the other hand, is a normal distribution, on average, about every 20th reading is out of two times the standard deviation and about 500 each measured value is outside three times the standard deviation.
As the proportion of values outside of the six times the standard deviation of about 2 ppb is vanishingly small, such an interval is considered a good measure of a nearly complete coverage of all values . This is used in quality management through the Six Sigma method by the process requirements dictate tolerance limits of at least 6σ. However, you go there from a long-term mean shift of 1.5 standard deviations, so that the allowable error percentage increases to 3.4 ppm. This error component corresponds to a four and a half times the standard deviation ( 4.5 σ ).
For confidence intervals ( - zσ, zσ ) the following shares of the expected values are within or outside the interval:
An example ( with variation )
The body size of humans is approximately normally distributed. In a sample of 1,284 girls and 1,063 boys between 14 and 18 years an average height of 166.3 cm (standard deviation 6.39 cm) and in boys was in the girls an average height of 176.8 cm (standard deviation 7.46 cm).
Accordingly, the above variation can be expected that 68.3 % of girls 166.3 cm ± 6.39 cm and 95.4 % in the range have a body size in the range 166.3 cm ± 12.78 cm,
For the boys, it is expected that 68 % have a body size in the range 176.8 cm ± 7.46 cm and 95 % in the range 176.8 cm ± 14.92 cm,
Discrete distributions, dice
The discrete uniform distribution on the numbers has an expected value of and a standard deviation of. The result of the throw of a fair six-sided cube has thus, for example, the expected value of 3.5 and a standard deviation of about 1.7.
Binomial distribution
Is a binomial distribution with parameters ( number of repetitions ) and ( probability of success ), then and, so
If you roll, for example, 500 times with a fair dice, the number of ones is a binomial distribution with and. Is the expected value
And the standard deviation
Because a binomial distribution with the above parameters is normally distributed approaching, the thumb can therefore expect that in 68 % of cases, the number of ones is 75-92 and in 95 % of cases between 67 and 100
Estimate of the standard deviation of the population from a sample
General case
Basis of calculation
Are the n random variables independent and identically distributed, so for example, a random sample, the standard deviation of the population of the sample often contains the formula
Estimated. It is
- The estimator for the standard deviation of the population
- The sample size (number of values )
- The characteristic values of the i-th element of the sample
- The empirical mean, so the arithmetic mean of the sample.
This formula explained by the fact that the corrected sample variance is an unbiased estimator for the population variance. In contrast, however, is not an unbiased estimator of the standard deviation. Since the square root is a concave function, it follows from the Jensen's inequality
Thus, this estimator underestimated in most cases, the standard deviation of the population.
Example
If you select one of the numbers or by tossing a fair coin, so both with probability each, so this is a random variable with mean 0, variance and standard deviation. If one calculates from independent litters and the corrected sample variance
In which
Denotes the sample mean, so there are four possible test outputs, all of which have each probability:
The expected value of the corrected sample variance is therefore
The corrected sample variance so is therefore actually unbiased. The expected value of the corrected sample standard deviation is, however,
Thus, the corrected sample standard deviation underestimates the standard deviation of the population.
Calculation for accumulating measurements
In systems which detect continuously large quantities of measurements, it is often impractical to cache all the measured values to calculate the standard deviation.
In this connection, it is effective to use a modified formula, bypassing the critical term. This can not be calculated for each measured value immediately, since the mean is not constant.
By applying the shifting theorem and the definition of the average value you get for the representation
Which can be updated for each incoming reading immediately if the sum of the measured values and the sum of their squares carried and are continuously updated. This representation, however, is numerically less stable, especially the term under the square root can numerically by rounding errors less than 0.
A similar algorithm is described by Donald E. Knuth in The Art of Computer Programming.
Normally distributed random variables
Basis of calculation
For the case of normally distributed random variables, however, an unbiased estimator can specify:
It is
- The unbiased estimate of the standard deviation and
- The gamma function.
Example
It was in a sample from a normally distributed random variable, the five values 3, 4, 5, 6, 7 measured. We shall now calculate the estimate for the standard deviation.
The sample variance is:
The correction factor is in this case
And the unbiased estimate of the standard deviation is thus approximately