Pareto distribution

The Pareto distribution, named after Vilfredo Pareto (1848-1923), is a continuous probability distribution on a right -infinite interval. It is scale invariant and obeys a power law. For small exponents it is one of the endlastigen distributions.

The distribution was first used to describe the distribution of income in Italy. Pareto found characteristically when extending random, positive values ​​over several orders of magnitude and come by the action of many independent factors materialize. Distributions with similar characteristics are the Zipfverteilung and Benford's law.

  • 4.1 Relationship to the exponential distribution
  • 4.2 Relationship to the displaced Pareto distribution
  • 4.3 unequal distribution measurements and the Pareto Principle

Conceptual history

In the second volume of the Cours d' économie politique of Vilfredo Pareto (1897 ), this illustrates that the number of persons who have a higher income than a threshold within a state, is approximately proportional to, where the parameters across countries about 1.5 amounts. This specification defined up to a scaling of the Pareto named probability distribution (via the cumulative distribution function ). Numerous other empirical distributions can be well described as a Pareto distribution, for example, city sizes or amounts of damage in insurance mathematics.

Definition

A continuous random variable is called Pareto- distributed with parameters and, if the probability density

Possesses.

Here, a parameter describing the minimum value of the distribution. This is also the same time the mode of the distribution, ie the maximum point of the probability density. With increasing distance between and the likelihood that assumes the value. The distance between the two values ​​is the quotient, i.e. the ratio between the two quantities, is determined.

Is a parameter that describes the size ratio of the random numbers as a function of their frequency. The quotient is potentiated. With a larger curve extends significantly steeper, that is, the random variable takes on large values ​​with a lower probability.

The probability that the random variable takes a value less than or equal, so that the distribution function is calculated with:

In order to calculate the probability that the random variable takes values ​​larger by:

Properties

Expected value

The expected value is given by:

Quantiles

Median

The median is given by

Review of Paretoprinzips

Analogously we obtain for the sought-after in the Pareto 4th quintile

The expected value is greater than the fourth quintile restricted to values ​​sufficient for k > 1, the equation

For k = 1.5, the Pareto regarded as typical value is due to an expected value, ie about 58 % of the total expected value accounts. If the income of a country's population that is a Pareto distribution with parameter 1.5, the 20 % with the highest income would earn only 58% of total income - not 80 % as it suggests the Pareto principle. In contrast, applies for k = log45 ≈ 80 % -20% of the 1:16 rule.

Variance

The variance is given by

Standard deviation

From the variance is obtained for the standard deviation

Coefficient of variation

From the expected value and standard deviation one obtains the coefficients of variation for immediate

Skew

For the skewness is obtained for

Characteristic function

The characteristic function is given by:

Here is the incomplete gamma function.

Moment generating function

The moment generating function can not be specified for the Pareto distribution in closed form.

Entropy

The entropy is given by:.

Zipfsches law

The Zipf law is mathematically identical with the Pareto distribution ( - and - axis are reversed). While the Pareto distribution considering the probability of certain random values ​​, focuses the Zipf law, the probability of taking a particular position in the ranking of the frequency with which random values ​​.

Relationship to other distributions

Relationship to the exponential distribution

When a Pareto distributed random variable with parameters and, then is exponentially distributed with parameter.

Relationship with the shifted Pareto distribution

When a Pareto - distributed random variable, then satisfies a shifted Pareto distribution.

Inequality measures and the Pareto Principle

Since the (probability density ) Pareto distribution has a single maximum at the smallest value, show Pareto - distributed parameters consisting of the Pareto principle (also 80 -to -20 rule) known phenomenon of unequal distribution: Lower values ​​are quite common, large values, however, very rare. How strong this effect is pronounced depends on the parameters.

In cities example ( see figure in the Introduction) few large cities contribute disproportionately to the total population, while a very large number of small towns is only a few inhabitants.

To quantify this phenomenon are various inequality measures. For the calculation of inequality measures distributions of the form " to" describe two quantiles, the width of the first quintile of the height of the second quantile and the height of the first quantile of the width of the second quantile is similar. An example of this way of representing distributions, which is often quoted " 80-20 " principle. It applies, for example, if 80 % of a group have more than 20 % of the resources of the group, and 20 % of this group can use 80 % of the resources.

In the Lorenz curve, this situation was in the shape of a "standing" and a "lying" quantile support systems and the need in each case are in the range from 0 to 1 and we have:. The Gini coefficient and the Hoover inequality are equal in this case:

For a 80:20 distribution, this results in a Gini coefficient or a Hoover coefficient of 0.6 or 60%.

Then the Theil index is easy to calculate ( an entropy measure ) for this two- quantile distributions:

The Pareto principle can serve as a reminder for the range of values ​​of the Theil index. The index for a uniform distribution of 0.5:0.5 (50% to 50% ) has a value of 0, and takes about 0,82:0,18 (82% to 18%) to the value 1. This is very close to the distribution of 80 % to 20%. Above the distribution of 82 % to 18 % of the Theil index is greater than 1

Detection of Pareto distributions

Whether a distribution is a Pareto distribution, one can graphically estimated using a log-log representations of the distributions.

The probability density of the Pareto distribution can be written as a power law:

Can also be brought into the form:

The (single) logarithmic graph of such power laws

After taking the logarithm of the axis with (ie, the actual value is often the axis is, however, directly labeled with the values ​​), we obtain

What is a straight line with rise.

In the accompanying diagram is shown double - logarithmic for the cities example. It can be seen well that the graph over large parts actually is straight, with a rise, resulting in the parameters.

Accordingly, the exponent of the density function, is in good agreement with the literature.

For the presentation was used because it is a cumulative measure that by summing (in theory: integrating ) of many individual values ​​arises, whereby the scattering of individual values ​​relatively less weight. When using the histogram, however, a summation of many values ​​can only be achieved with a reduced number of intervals, whereby the distribution would be unrealistic coarse.

176378
de