Empirical distribution function

An empirical distribution function - also cumulative frequency function or distribution function called the sample - is in the descriptive statistics and stochastics a function that every real number the proportion of sample values ​​that are smaller or equal assigns. The definition of the empirical distribution function can be done in various spellings.

  • 2.1 binned data

Definition

If the observations are in the sample, the empirical distribution function is defined as

With if and zero otherwise, ie denotes the indicator function of the set.

Alternatively, the empirical distribution function with the characteristic values ​​and the corresponding relative frequencies in the sample define:

So this function is a monotonically increasing step function.

Binned data

Sometimes data are only classified before, that is, there are classes with class limits class limits and relative class frequencies given ().

Then, the distribution function is defined as

The class of upper and lower limits of the definition to the definition in unklassierten data match. In the areas between the class limits, a linear interpolation is taking place. It is here assumed that the observations are evenly distributed within a class.

Examples

As an example, the horse kick data from Ladislaus of Bortkewitsch to serve. During the period 1875-1894 a total of 196 soldiers died in 14 cavalry regiments of the Prussian army of horse kicks:

If we write the table with the characteristic values ​​and relative abundance, is then given

The last row contains the value of the distribution function in the appropriate place. For example, at the point arises.

Binned data

When classified at the data, we obtain the following data table. The graphics are to be found in the definition.

The last row contains the value of the distribution function in the appropriate place. At the point arises.

Convergence properties

The strong law of large numbers guarantees that the estimator converges almost surely for each value against the true distribution function:

That is, the estimator is consistent. Thus, the pointwise convergence of the empirical distribution function is given to the true distribution function. Another, stronger result, the set of Glivenko - Cantelli says that this happens even evenly:

This property is the mathematical justification that it is at all useful to describe data with an empirical distribution function.

Ogive

Ogive originally referred to the Gothic building style element pointed arch and reinforced ribs in the vaults. The term was used in the statistics for a distribution function for the first time in 1875 by Francis Galton:

"When the objects are marshalled in the order of Their magnitude along a level base at equal distances apart, a line drawn freely through the tops of the ordinates .. will form a curve of double curvature ... Such a curve is called, in the phraseology of architects, an ' ogive '. "

On the horizontal axis of the coordinate system will be here the parent (often clustered ) applied attribute characteristics; on the vertical axis the relative cumulative frequency as a percentage.

The chart on the right shows the cumulative distribution function of a theoretical standard normal distribution. Is the right part of the curve mirror at position ( broken red line ), the resulting figure looks like a nose cone.

This includes an empirical distribution function is shown. For the graphic 50 random numbers were drawn from a standard normal distribution. The more random numbers is drawn the more one approaches the theoretical distribution function.

307574
de