Statistical dispersion

Under scattering ( also dispersion) is enclosed in the descriptive statistics and stochastics together various metrics that describe the spread of values ​​of a frequency distribution or probability distribution to find a suitable location parameter around. The different calculation methods differ principally by their suggestibility or susceptibility to outliers. The dispersion of the frequency distribution is called the standard error.

Metrics

Span

The span (English range) is calculated as the difference between the largest and the smallest measured value:

Since the span is calculated only from the two extreme values ​​, it is not robust against outliers.

See also: moving range ( engl. moving range)

Quantilsabstand

The Quantilsabstand is the difference between the quantile and:

Within the lie percent of all measured values.

( Inter) interquartile range

The interquartile range (English interquartile range), IQR abbreviated, is calculated as the difference between the quartiles Q.25 and Q.75:

Within the IQR are 50 % of all measurements. He is - as well as the median or Q.50 - insensitive to outliers. It can be shown that he has a break point.

The interquartile range is equal to the Quantilsabstand

Mean absolute deviation

The mean absolute deviation of a random variable from its expected value is defined by

In the case of a concrete sample with sample mean it is calculated by

The mean absolute deviation is usually avoided in mathematical statistics favor of square deviation, which is easier to handle analytically. The absolute value function used in the definition is not differentiable everywhere, which complicates the calculation of the minimum.

Due to the inequality of the arithmetic and quadratic means the mean absolute deviation is less than or equal to the standard deviation is ( equality holds only for constant random variables ).

For symmetric distributions, ie distributions with the property for all real, with monotonically decreasing density, applies

For the continuous uniform distribution, the equality holds.

Mean absolute deviation with respect to the median

The mean absolute deviation ( engl. mean deviation from the median, abbreviated MD) from the median is defined by

In the case of a concrete sample it is calculated by

Due to the extremal applies the median as compared to the mean absolute deviation always

That is, the mean absolute deviation with respect to the median is especially smaller than the standard deviation.

The normal distribution, the following applies:

Median absolute deviation

The mean absolute deviation ( engl. median absolute deviation, also MedMed ), abbreviated MAD is defined by

In the case of a concrete sample it is calculated by

By definition, the following relationship is obtained in the case of normally distributed data the standard deviation:

Is the 0.75 quantile of the standard normal distribution and is approximately 0.6745.

The mean absolute difference is a robust estimator of the standard deviation. It can be shown that it has a breakdown point of.

Variance and standard deviation

The variance ( engl. variance ) and the standard deviation (german standard deviation ) are the most important and most widely used measures of dispersion. The mean, or the expected value of the following deviations arise:

  • As a measure in the descriptive statistics,
  • As an estimate of the population variance
  • Than the variance of a random variable

This has the following standard deviations:

  • Or
  • .

Coefficient of variation

The coefficient of variation of a random variable is defined as the ratio of its standard deviation to its expected value

If instead the distribution of the random variable a concrete set of values ​​measured before, one is the empirical coefficient of variation as a quotient of the empirical standard deviation and arithmetic mean.

Graphical representation forms

  • Box - whisker plot
  • Scattering map pockets
  • Scatterplot
241799
de