Median

The median or central value is an average for distributions in statistics. The median of a list of numerical values ​​is the value which is at the intermediate position when the values ​​sorted by size. For example, for the values ​​4, 1, 37, 2, 1 is the number 2, the median, ie the average number of 1, 1, 2, 4, 37

Generally divides a median one record or a sample to be distributed in two halves, so that the values ​​in one half smaller than the median value, in the other greater. If, for example, the median length of all emperor penguins 1,26 m, half of all measured lengths would these penguins at most 1.26 m.

  • 3.2.1 definition
  • 3.2.2 properties
  • 3.2.3 Examples

Definition

The median divides a list of values ​​into two halves. It can be determined in the following manner:

  • All values ​​are sorted (ascending).
  • If the number of values ​​is even, the average number of the median.
  • If the number of values ​​is even, the median is usually defined as the arithmetic average of the two middle numbers are then called the upper and lower median.

An important property of the median is robust against outliers.

  • Example: Measured values ​​1, 2, 4, 4, 4, 5, 15; Median ( and the upper and lower median) is the value at the middle point, that is 4 if a is 4 is replaced with 46 in the example by a fault, the median does not change little or not at all, 1, 2, 4, 4, 5, 15, 46, the arithmetic mean of 5 to 11 bounces

Whether median or arithmetic mean is meaningful depends on the question. With a distribution of income taxpayers mostly interested how much deserves a typical citizen, which is reflected clearly in the median.

Comparison with other measures of central tendency

The median belongs to the group of quantiles and can be considered as a half - quantile. Other important measures of central tendency are the arithmetic mean and the mode.

In comparison to the arithmetic mean, often referred to average, the median is more robust against outliers (extremely different values ​​) and can also be applied to ordinal scaled variables. The term median ( Latin: median, located in the middle ',' The Middle ') comes from the geometry, where he also referred to a boundary between two halves of equal size.

Applications

In contrast to the arithmetic mean, the median can be used for ordinal variables such as score scaled stages in which there is no quantitative ratio. But even in interval and ratio-scaled data, the median can be applied and then has disadvantages and advantages over the arithmetic mean as a measure of central tendency. For only nominally scaled variables, their characteristics have no natural ranking, such as a variable country of birth, the median can not be applied. Here, the mode is the only measure of central tendency that can be observed.

The median is used in statistics and probability theory in three different meanings:

Median of a sample

A value is the median of a sample if at least half of the observations in the sample value and at least half have a value.

Sorted be the observed values ​​of size, ie you go to on after the rank ordered sample, the median is at an odd number of observations, the value of lying in the middle of this sequence observation. If the number of observations, there is no single central element, but two. Here are the values ​​of the two middle observations and all values ​​in between (although this may have occurred in any observation) a median of the sample, since for all these values ​​apply above condition.

In cardinal scaled variables ( ie, when it is reasonably possible to calculate the difference of measured values ​​) are used in the case of an even number of observations usually the arithmetic mean of the two middle values ​​observed. The median of an ordered sample of measured values ​​is therefore

This definition has the advantage that, for symmetric distributions, the arithmetic mean and the median are identical.

Upper and lower median

Often one would like to ensure, however, that the median should be in any case one of the elements of the sample. In this case, used as an alternative to this definition, with an even number of elements, either of the median or the median and upper called median. In the case of an odd number of observations is determined.

Using Gaussian brackets can be shorter write this definition as

And it is generally:

This determination Media plays for example in database systems an important role, such as in SELECT queries using the median of the medians.

Properties

The median, and in the case of an even number of samples, all values ​​that minimize the sum of the absolute differences, ie, for any valid

The median is based on the method of least absolute deviations method and the robust regression. The arithmetic average, however, minimizes the sum of squared deviations, and is based on the method of least squares, and the regression analysis and is mathematically easier to handle, however, is not robust against outliers.

The median, as described above, are algorithmically determined by the measured values ​​are sorted. Since this is associated with effort, is resorted to special algorithms for Quantilsbestimmung with linear effort or assessments such as the Cornish -Fisher method in general. The arithmetic mean can also be determined in linear time.

Median of grouped data

Especially in the social sciences, the median is often estimated as not all data is given explicitly and precisely, but only grouped in intervals present in statistics. For example, asked in surveys rarely for the exact salary, but only after the income class, ie, the area in which the content is. If only the frequencies of each class are known, then can be the median of the random sample in general only approximately determined. Let the total number of data, the numbers of data of the -th group, and and the corresponding upper and lower limits of the interval. First, now the median class (or median group ) is determined, that is, the one group in which the median ( according to the above, the conventional definition) falls into it, such as the - th group. The number is determined by that, but true. If no further information on the allocation of data is given, for example, uniform distribution postulated, so that one may use the linear interpolation as a tool to obtain an estimate of the median of the grouped data:

If no further information about the distribution of data is given, any other distribution can be possible except for the uniform distribution and thus, any other value in the m- th interval be the median.

In contrast to the conventional definition of the median of these need not necessarily be an element of the actual amount of data which is not known in most cases.

Example

Income:

Compute

So the median is in 2nd grade (that is ), since the first class includes only 160 elements. Thus arises as an estimate for the median

Since the actual distribution of the data is unknown in the intervals, any other value in the second interval may be the median. The example of calculated value 2081.25 can therefore up to 581.15 too large and up to 418,75 be too small, so be the error of the estimate up to 28 %.

An illustration of this method of determining the median for grouped data is the graphical determination by the total curve. Here the abscissa is sought, the part of the ordinate. With a smaller and straight and instead the ordinate value can be selected.

Median of a distribution

A generalization of the concept provides the study of a real-valued random variable and its distribution, or its distribution function.

Definition

Let be a random variable and its distribution. Then each element of the following set is a median or a median of:

Obviously, each with at this quantity, ie a median of.

If no such exists, then provides the generalized inverse distribution function

For a median of and indeed the smallest possible. If uniqueness plays a role, we define the median as. This corresponds to the approach in the definition of quantiles, the median is then the 50 % quantile.

Properties

A median is, in addition, for example, mean and mode, a location parameter of a probability distribution.

In contrast to the expected value of the median always exists. For example, the median of the standard Cauchy distribution is equal to 0, while its expected value does not exist.

For symmetrically distributed random variable with density, ie for random variable, where and have the same distribution, the median and the expectation value of both are the same.

For continuous distributions on the set of positive real numbers with monotonically decreasing density ( ie for true ), where the equality holds only for the continuous uniform distributions. A typical example of this situation is the exponential distribution.

Between the expected value, median, and standard deviation is due to the Cantelli 's inequality with a ≥ 0 the relation

The equality holds for the discrete random variable X with.

Examples

  • In the triangular distribution
  • For the exponential distribution with parameter is the distribution function

Alternatives

  • The welfare function is an alternative to the median income in the determination of the mass of a given income distribution.
  • Another way to deal with extreme values ​​than the median, the use of a trimmed mean, which can be determined in the minimum and maximum values ​​are removed before the calculation ( typically 5 % of the values ​​are omitted ).
  • To Butler, there is also a strict definition of the median (which is less common ), which says that the median is the value which is subject to the number of smaller values ​​in the series is equal to the larger number of values ​​in the series. For special cases, such as 3, 3, 3, 3, 4 or 1, 2, 3, 3, 3, there is a method by which one can calculate a unique median while maintaining the stricter definition.,
561257
de