Q–Q plot

A quantile - quantile plot ( QQ plot, quantile - quantile plot) is an exploratory, graphical tool, in which the quantiles of two statistical variables are plotted against each other to compare their distributions.

A Probability - Probability Plot ( PP plot ) is an exploratory, graphical tool, in which the distribution functions of two statistical variables are plotted against each other to compare their distributions.

  • 2.1 Review of the distribution of a feature
  • 2.2 Trend -adjusted P -P -plot

Q -Q plot

Comparison of the distribution of two statistical features

The observed values ​​of two features, whose distribution you want to compare are both ordered by size. These ordered data are combined to pairs of values ​​and plotted in a coordinate system. Give the points ( approximately ) a straight line, one can surmise that the two characters is the same distribution as a basis. The problem is the process, if different from the two features are present many observations. This can be remedied by interpolation.

The data presented here is an example of about 110 warships at the outbreak of the Second World War. Collected the variables length and width. The scatter plot shows that there are obviously two different groups, which clearly stand out as a cluster. For the quantile - quantile plot of the data has been standardized in order to facilitate comparability. Can be seen in the gap in the point of disintegration of the curve, the data in two clusters. For the cluster at the bottom left of the type of distribution for both variables seem to be the same. For the second cluster the top right of the width compared to the first cluster tends to be larger. The " bulge " of the plot shows that the distributions of length and width are not equal here.

Review of the distribution of a feature

The observed values ​​of a feature are sorted by size. Were used for comparison, the quantiles of the theoretical distribution, which belong to the corresponding distribution value. If the characteristic values ​​from the comparison distribution date, agree the empirical and the theoretical quantiles correspond approximately, ie, the values ​​lie on a diagonal.

However, the quantile - quantile plot can not replace a distribution test.

Formal definition

For each of the observations an empirical underflow fraction is determined. With the help of the inverse distribution function ( or Quantilsfunktion ) of the theoretical distribution is the quantile

Calculated. Plotted versus now.

The calculation of the underflow fraction is determined by means of the rank of the observation:

Trend Adjusted Q -Q plot

In detrended quantile - quantile plot will be plotted instead of the points. If the empirical and the theoretical distribution agreement, so be on all points. Deviations from the zero line parallel to the Y axis only come from the differences between the empirical and theoretical distribution. The quantile - quantile plot the points always go in the diagram from bottom left to top right, ie Deviations are parallel to the Y - axis (or the X axis ), not only by the difference between the empirical and theoretical distribution. The trend adjusted QQ plot thus provides a better view of the deviations as the QQ plot.

P-P -Plot

Review of the distribution of a feature

For the observations the shortfall shares are based on Blom etc.. For distribution to compare the observed values ​​are used in the theoretical cumulative distribution function. Thus we obtain the theoretical shortfall share. If the characteristic values ​​from the comparison distribution date, the values ​​of and approximately coincide, ie, the values ​​lie on a diagonal.

In contrast to the QQ plot the tails of the distribution in the PP plot have a lower visual impact. However, the Probability - Probability Plot can not replace a distribution test.

Trend -adjusted P -P -plot

In detrended Probability - Probability plot are plotted instead of the points. If the empirical and the theoretical distribution agreement, so be on all points. As with the detrended QQ plot this graph provides a better overview of the deviations.

Application Examples

  • Comparison of an empirical frequency distribution with a theoretical or hypothetical distribution: Graphical inspection of regression residuals for normal distribution
  • Visual inspection of the distribution conditions before performing a parametric testing method
de