Poisson distribution

The Poisson distribution (named after the mathematician Siméon Denis Poisson ) is a discrete probability distribution, with the number of events can be modeled, which occur at a constant rate and independently of each other in a fixed time interval or geographic area. It also provides a frequently occurring limit of the binomial distribution for infinitely many attempts dar. Like the binomial distribution, the Poisson distribution says the expected result of a series of Bernoulli experiments advance. The latter are random experiments, know the only two possible outcomes ( for example, " success" and "failure" ). If the temporal or spatial observation interval increasingly divided, thus increasing the number of attempts. The progressive subdivision due to a decrease in the probability of success in such a way that the product converges to a finite limit. Accordingly, the binomial probability distribution of mathematically somewhat simpler Poisson distribution approximates.

During the observation, which can be divided into any number of moments ( Bernoulli trials ), almost always happens nothing and now and then some. The Poisson distribution is therefore sometimes referred to as the distribution of rare events indicated (see also law of small numbers). The existing in the binomial symmetry between success and failure with each to be specified probabilities is lost here. For example, the Poisson distribution allowed, although the calculation of the probability that no lightning strikes, but the question of how often the lightning strikes is not, because of the continuous observation pointless.

The designated with probability distribution is determined by the parameter, which is the expected value and variance of the distribution simultaneously. It assigns to the natural numbers, the probabilities as follows:

The exponential ( base of the natural exponential function ), a real positive number and the Faculty of designated.

The Poisson distribution parameter identifies clearly the expected frequency of events when one is committed to a particular observation interval:

Wherein the constant event rate means ( number per unit interval).

Thus, the Poisson distribution provides predictions about the number () of the occurrence of independent events that occur in random sequence within a certain interval, if for prior accident observation is already known how many events are expected in the funds within the interval ().

The growth of a Poisson process are Poisson - distributed random variables.

Poisson published in 1837 his thoughts on this distribution, together with his probability theory in the book " Recherches sur la probabilité of jugements en matière civile en matières criminelles et " ( " Studies on the probability of judgments in criminal and civil matters ").

Extensions of the Poisson distribution as the Generalized Poisson distribution and the mixed Poisson distribution are mainly applied in the field of actuarial science.

  • 3.1 Relationship to the binomial distribution
  • 3.2 Relationship with the normal distribution
  • 3.3 Relationship to the Erlang distribution
  • 3.4 Relationship to the exponential distribution
  • 3.5 Relationship to the chi -square distribution
  • 4.1 Department Store Customers
  • 4.2 Radioactive decay
  • 4.3 Zählexperiment
  • 4.4 Inefficient count
  • 4.5 lightning strikes
  • 4.6 Scattered grains of rice
  • 4.7 Sport Results 4.7.1 example
  • 4.8.1 limit
  • 4.8.2 Lower limit
  • 4.8.3 interval
  • 4.8.4 Median

Derivation

The Poisson distribution arises on the one hand as a limiting case of the binomial distribution, on the other hand they can be derived from basic process characteristics ( Poisson assumptions ) can be derived. If these properties can be assigned to an event to a good approximation, the frequency of events will be Poisson distributed.

One considers a ( the Bernoulli experiment is often performed as it were, at every point of the continuum ), to ' take place space or time continuum, the countable events with constant mean number per unit interval. Now to set up a view on, enough ' small continuum interval can represent a range, a time interval, a defined distance, area or volume depending on the experiment. What happens there determines the global distribution on the continuum.

The three Poisson assumptions are:

With Assumption 1 and 2 is the probability to find an event in the interval given as

And the probability of an empty interval by

According to Assumption 3, the probability of an empty interval is independent of any events in the area before occurrence. How do you calculate the probability of no event to point to

The results approximate the differential equation with the solution

Under the boundary condition. Similarly, we find the probability for events up to the point

Each attached interval may by Assumption 1 only contain either no or an event. The corresponding differential equation has the solution

Is now identified in this expression, which describes the probability of occurrence of events in the continuum region, the parameters and with that he agrees with the formula of the Poisson distribution. The number results in many tasks as a product of a rate (number of events per unit interval ) and a multiple of the unit interval.

Properties

  • The Poisson distribution is fully characterized by the parameters.
  • The Poisson distribution is stationary, that is not time dependent.
  • In a Poisson process, the Poisson distribution is random number of events up to a certain point in time, to the -th event Erlang distributed random time.
  • The Poisson distribution is a special case of Panjer distribution.

Simple recursive calculation

First one determines then obtained successively. While the chances are greater with increasing, is long. Will they shrink. The mode, ie the value with the highest probability is, therefore, if not an integer, otherwise there are two adjacent ( see diagram right).

Approximation

If the calculation of due prepares to large values ​​of and problems, the following obtained by the Stirling approximation formula can help:

Distribution function

Is the distribution function of the Poisson distribution

And gives the probability for at most to find events where you expect the average. is the regularized gamma function of the lower limit.

Expected value, variance, moment

If the random variable Poisson distributed, ie, as is also the expected value, variance and third centered moment, because:

Expected value

Variance

After the shift formula now applies:

Alternative calculation of expected value and variance

Be independent Bernoulli random variables with and be. For valid and

Coefficient of variation

From the expected value and variance are obtained immediately the coefficient of variation

Skewness and kurtosis

The skewness is given by

The curvature can also be represented as a closed

Characteristic function

The characteristic feature is in the form

Generating function

For the generating function is obtained

Moment generating function

The moment generating function of the Poisson distribution is

Reproductivity

The Poisson distribution is reproductive, that is, the sum of two stochastically independent Poisson distributed random variables with parameters and is again Poisson distributed with parameter. Because it is:

This can be generalized to several stochastically independent Poisson distributed random variables. Here is. The Poisson distribution is therefore infinitely divisible.

By a theorem of the Soviet mathematician DA Raikov the converse also holds: If a Poisson distributed random variable, the sum of two independent random variables and, then the summands and also Poisson distributed. A Poisson - distributed random variable can thus decompose only in Poisson - distributed independent summands. This theorem is an analogue to the set of Cramér for the normal distribution.

Symmetry

The Poisson distribution has a strongly asymmetric shape for small mean values. For growing middle values ​​is symmetrical and can be displayed for a good approximation by the Gaussian distribution.

Relationship to other distributions

Relationship to the binomial distribution

The probability density of the binomial distribution is

The Poisson distribution can be derived from the binomial distribution. She is the limiting distribution of the binomial distribution with very small proportions of interested features and very large sample size: and under the constraint that the product takes on a value that is neither zero nor infinite. then for all in the thresholding considered binomial distributions as well as for the resulting Poisson distribution, the expected value.

The value of a Poisson - distributed random variable at the location of the threshold with a binomial distribution at the site:

In large samples can therefore well approximated by the Poisson distribution, the binomial distribution.

Since the sizes and are known mostly in calculating the probabilities for the number of events within an interval, you just take as an estimate of the number in a sample measurement in the interval of events occurred with a postulated normalized to the unit interval and is assumed to be constant probability of occurrence.

Relation to the normal distribution

For large similar to the Poisson distribution with a Gaussian distribution and:

Relationship with the Erlang distribution

  • In a Poisson process the random number of events is enough in a fixed interval of the Poisson distribution. The random distance ( distance or time ) until the arrival of the -th event and the distance between the events and on the other hand are Erlang distributed. It is also said that the Poisson distribution and the Erlang distribution are mutually conjugate distributions. In this case Erlang distribution turns into an exponential distribution. This refers to the number of expected events per unit interval. is then the probability density of the distance will pass until the arrival of the next event, as well as the distance between two consecutive events.
  • Is valid for the distribution functions of the Erlang distribution and the Poisson distribution

Relationship to the exponential distribution

The distance ( time or space ) to the first random event and the distance between two consecutive events of a Poisson process with intensity is exponentially distributed.

Relationship with the chi -square distribution

The distribution functions of the Poisson distribution and the chi -square distribution with degrees of freedom are related in the following way:

The probability of finding, or more events within an interval within which to events expected on the average, equal to the probability that the value is. It is therefore

.

This follows from with and as a regularized gamma functions.

Application Examples

The Poisson distribution is a typical distribution for the number of phenomena that occur within a unit.

So it is often used to describe temporal events. Given are a random event, which takes place on average once in a time interval, and a second period of time, should be related to the event.

The Poisson distribution with calculates the probability that in the period just held events. In other words, the average frequency of occurrence of an event.

Department store customers

A department store is entered, for example, on a Saturday on average every 10 seconds ( ) from a customer. Are now at an interval of one minute counted the people who were new to do so, one would expect an average of 6 persons ( persons / minute), who enter the department store. gives the probability that enter the next minute () function does the department store customers.

With a probability of about 4.5% entered exactly 2 people in one minute, the department store. With a probability of almost 92 % occur 0-9 persons ( accumulated) a. Consequently, the probability that more than 9 people enter in a minute, is 8 %.

The values ​​in the middle column are obtained in each of the overlying value multiplied by.

Choosing the length of the interval is the observer. If you were to choose one hour as observation interval would result would be at an interval of 1 second. The relative variation in the number of customers () decreases with increasing interval and consequently becomes greater. So the longer interval allowed on the longer averaging a basically accurate observation, but is associated with more effort and can (eg arrival of a bus with shopping consent tourists) do not capture occurring within the interval change of conditions.

Radioactive decay

For example, in nature followed by the number of radioactive decay of a radioactive substance in a given time interval, the Poisson statistics, if during the decay rate is not substantially decreased (that is, half-life). The times between individual decay events are then exponentially distributed. In addition, the activity following a half-life determined by the exponential decay.

Zählexperiment

The measurement of a Poisson - distributed number of events will scatter with frequent repetition of the measured mean and standard deviation. Is only counted once (without repeating the experiment ), the result is

As best estimates for the mean () of the underlying Poisson distribution and uncertainty (standard deviation ) of the number received. To achieve relative accuracy of 1 % here, you needed ' high Stastistik ' of over 10000 events!

Can be the expected variation of the count results for multiple samples without calculate explicit assumption of an underlying Poisson distribution: Each count is divided into two categories, the events observed, the counted and not counted. An investigation is interested, for example, to for the proportion of people whose height is between 1.70 m and 1.71 m. For this purpose a sample of people is measured and which satisfy the counting criterion. The probability that the next person surveyed fall into the counting class then, is attached. The statistics of this measurement is binomial, that is, describes the probability to obtain a count. The variance is therefore

And the measurement error (standard deviation) thus

If, similar to the binomial distribution to a Poisson distribution, and it is

Inefficient count

An observer of a Poisson - distributed random variable with parameter registers this may not be complete, but only with a probability. So if originally present events, only events are found according to the binomial distribution. In this case, the true value is unknown and varies between the measured value (all existing events seen ) and infinity ( there were more events than were seen ). The probability of a measurement value can be found by means of the product, then the probability of a successful measurement and the original Poisson distribution, summed over all possible values ​​:

The values ​​found in detection probability are again Poisson distributed. The probability of detection is reduced to the original parameters of the Poisson distribution.

Lightning strikes

The flash frequency in Germany is annually 10 strikes per square km, which corresponds to 0.1 strikes per ha and year. What is the probability that it comes in a plot of 1 ha of lightning in a year?

Strikes per hectare per year.

Statistically, it is not surprising when lightning strikes the same place twice within 200 years, although it is extremely unlikely to be able to predict the location ( see also birthday paradox).

Scattered grains of rice

The picture on the right shows a section of a floor with square tiles, were scattered on the rice grains. The fields each contain grains of rice and whole, grains of rice are in the considered neckline. The comparison between experiment and calculated Poisson distribution, where rice grains / squares, shows a good agreement:

15

12.7

1

15

17.2

2

11

11.6

3

5

5.2

4

1

1.7

5

2

0.5

The probability that a particular field is empty, is about 26 %:

Sports results

The ( temporal ) consistency of the probability of an event - an essential prerequisite for the application of Poisson statistics (see above under Poisson assumptions) - is of course no more than approximately given at sports results. It act together many non-isolable in detail influences and give a chance for points, or gates, we assume without better knowledge just as constant. Even if, for example, goals are scored independently, is questionable. Applying these assumptions can, however, test the compliance of data and Poisson distribution in retrospect. Here there is one room ' and no uniqueness.

In many sports it comes to victory in a competition means that within a given period to obtain more scoring events than the opponent. The physicist Metin Tolan has examined in his book to the football game, the applicability of the Poisson distribution in the sport in detail.

Example

During the group phase of the FIFA World Cup 2010, the Men in South Africa, the average number of goals per team game and was 1.05 ( 101 goals in 48 games ). This value can be approximated using the Poisson distribution, the distribution of the gates, and the distribution of the final results of the encounters. The likelihood of a given final results here from the product of the probabilities of the two opponents for the corresponding Torerfolge. Again, the sum of all probabilities is 100 %.

The following table shows the calculated proportions of the final results on the left and the actual proportions of the final results on the right side. The agreement is good, and the deviations between actual and calculated results for a particular game Final are very low single-digit percentage range. A play is a share of 1/48 ( = 2.083 %) of all games. In only one case (final result 0:1), the deviation between the calculation and the actual number of games 2 (or 3.81% ) in all other cases it is more than one.

Aside from the high 7-0 victory, resulting in summarizing the games with a winning team at the conclusion from the perspective of the winning team (ie no distinction between home games and guest appearances ) the following even smaller deviations between game results and statistical calculation:

Limit value is exceeded

Upper limit

The number Poisson - distributed events, which is not exceeded with a given probability, can be calculated from the inversion of the distribution function:

It can be re- expressed in terms of the regularized gamma function. This is useful as there is no inversion of the basic shape of the distribution function is known. Except for the point-wise calculation of the inversion (Creating a table of values ​​of the probabilities as a function of the candidate ) it gives the following approximate way:

Is found for that, for example, the following expressions of the distribution function hardly ( <1%) of dependent:

General is for high values ​​of the distribution function at very close, with the one-sided quantile of the standard normal distribution represents and is determined as a function of the probability by. The right side of the equation is produced from the reverse function of the error function. You can take over after the prospect for in this table from there the blue shaded border.

The approach is to initially motivated by the fact that the Poisson distribution for large in a normal distribution merges with the upper limit. The additional improves the consistency of the distribution function at small. Now if that is true, it can be specified in the next paragraph the context of and read.

For mean values ​​with probability ( 99%) is more than 1 event occur. Is larger, then the expected probability with greatest frequency of events to a good approximation calculated from the simple formula

It is advisable to round up the result (as in the formula for performed). Thus, with multiple repetitions ( or in other words: in the long run ), the probability of remaining with the number of events under the limit, increased somewhat. With (equivalent ) and are therefore not to be expected as events.

Lower limit

The lower limit of the number of events, which is not exceeded by the corresponding probability is given by a similar expression for:

Happen with probability at least so events. With 99% reliability is only to be expected from upstream of at least one event (for larger the probability of no event less than 1%).

Interval

Looking at the upper and lower limits at the same time, the number of expected events is, for example, with about 95 % probability within the interval

When and with each ( 97.5%) are calculated. Since the limits are rounded conservative (ie, outward), the interval tends especially at low expected values ​​to something more events to contain than the stated 95%. The inclined at a small interval with growing larger and more symmetric and approximates the width that you would expect in the presence of a normal distribution.

Median

With the two formulas for and it seems likely that the median, whose number is in repeated observations equally often exceeded as (), is located at. The correct value can be calculated approximately, and is

In order for the right-skewed Poisson distribution for some parameter values ​​(eg, in the interval ) is an example of that in quite skewed distributions, the median may be greater than the expected value!

2/3-Gesetz at roulette

The Poisson distribution gives a good estimate of how many different numbers are taken at 37 roulette games.

Random numbers

Random numbers for the Poisson distribution are usually generated using the inversion method.

654919
de