Shapiro–Wilk test

The Shapiro - Wilk test is a statistical significance test which verifies the hypothesis that the population is normally distributed underlying a sample. The test was developed by Samuel Shapiro and Martin Wilk and 1965 first introduced.

The null hypothesis H0 assumes a normal distribution of the population is present. In contrast, under the alternative hypothesis H1 is that no normal distribution is given. If the value of the test statistic is larger than the critical value, the null hypothesis is not rejected and it is assumed that a normal distribution.

If, alternatively, the p- value of the test is determined, the null hypothesis is not rejected as a rule, if the p-value is greater than the specified significance level α.

The test procedure was published in 1965 by the American Samuel Shapiro and Martin Wilk from Canada and is the result of their original idea, the graphic information of the analysis for normal distribution to summarize in a measure by Normalwahrscheinlichkeitsplot.

The test can be used to validate the univariate sample with 3-5000 observations. A further development of the test, the so-called Royston 's H test, allows you to review multidimensional sampling on multivariate normal distribution.

Among other well-known tests for normal distribution, such as the Kolmogorov -Smirnov test or the chi -square test, the Shapiro - Wilk test is characterized by its relatively high statistical power in many test situations, in particular in the verification of smaller samples with n < 50

The Shapiro - Wilk test or modifications of tests such as the Ryan- Joiner test are represented in popular commercial and non-commercial statistical software packages.

  • 4.1 Setting up the first hypotheses and determining the level of significance
  • 4.2 Creation of the second order statistics
  • 4.3 Calculation of the third estimator and
  • 4.4 4 Compare the test statistic with a critical value
  • 8.1 advantages
  • 8.2 disadvantages
  • 9.1 Other tests of significance
  • 9.2 Graphical Methods

Properties

Pre-test for further testing projects

Some inferential analysis methods ( such as analysis of variance, t- test or linear regression) assume that the prediction errors ( residuals ) are from a normally distributed population, this at least for small sample sizes with n < 30 Thus, the Shapiro - Wilk test for normal distribution also be regarded as a pre-test for further testing projects.

No general test of fit

While some normality tests such as the Kolmogorov -Smirnov test or the chi- square test are general goodness of fit tests ( goodness- of-fit tests), which are to test a sample on different hypothetical distributions out (including the normal distribution ) able is the Shapiro - Wilk test designed solely upon research regarding normal distribution. In contrast to the general goodness of fit tests, which usually at least 50 - need 100 observations to obtain meaningful test results, often less observations are the Shapiro - Wilk test is needed.

Capacity as Omnibus Test

The Shapiro -Wilk test is a test bus, i.e. he is only able to determine whether there is a significant deviation from the normal distribution or not. He can not be described in a position in which form the deviation occurs. It can, for example, No statement is made as to whether the distribution is left - or right-skewed or whether it is a fat-tailed (heavy -tailed ) distribution or possibly both.

Sample size to 5000 observations

Originally, the test was only able to examine

High strength test

In general, the test strength is less for all normality tests for small sample sizes than for larger, because the standard error is relatively large here. Only when the sample size becomes larger, the standard error and the test reduces strength grows. The Shapiro - Wilk test has even with a small sample size n <50, a relatively large test strength compared with other tests. For example, the Shapiro -Wilk test has a test strength of 54 % with a sample size of 20 observations when the actual distribution is a chi -square distribution, compared to the D' Agostino test from 1970, which has a test strength of 29%.

Operation

The test statistic W is the ratio, which expresses the ratio of two variance estimates to each other. The test statistic calculated using a first estimator in the counter, as the variance of a sample would look like if they came from a normally distributed population, and compares these "expected" variance estimator with a second in the denominator of the actual variance of the sample. When the population of the sample is in fact normally distributed, then have both estimators for the variance independently come to about the same result. The lower the estimated variances somewhat differ, the more likely it is that the population sampled is normally distributed in reality.

The Shapiro -Wilk test is based therefore on an analysis of variance (ANOVA ) of the sample, which also makes the original title of the publication " An Analysis of Variance Test for Normality ( complete samples for ) " clearly.

The estimator for the sampling variance in the denominator is the usual corrected sample variance.

The expected variance for a certificate from a normally distributed population sample on the counter ( H0 is thus assumed true) is estimated by the least squares method by the slope of the regression line in the QQ plot, the parent observations of a sample with corresponding order statistics from a normal distribution faces.

The ordinary linear model is understood as

In which

  • The slope of the regression line and thus describes the estimator in the numerator of the test statistic is
  • The point of intersection with the y -axis, and the estimate of the mean is
  • The expected order statistics from a normal distribution are
  • The order statistics from a sample are
  • The disturbance is, the nichterfassbare headwind

With this approach, the test of several other methods, such as the Jarque - Bera test, which checks how big the compliance of the sampling distribution with specific characteristics of the appearance of the normal distribution, which is characterized differs by its moments such as skewness and kurtosis ( skewness and kurtosis ).

Requirements

  • The observations of the sample must be independent of each other.
  • The sample must be not less than n = 3 and not more than n = 5,000.
  • The random variable must have a metric scale level.

Calculation of the test statistic

The test checks the hypothesis that a random sample from a normally distributed population was collected by passing the test statistic W is compared to a critical value for the rejection region ( from the distribution of the test statistic ).

1 Setting up the hypotheses and determining the level of significance

It is situated in the null hypothesis, which states that a normal distribution of the population is present, and the alternative hypothesis, which states that no normal distribution. At the same significance level is chosen, usually 5 %.

2 Creation of the order statistics

All observations in the sample are sorted in ascending order by size and each value is assigned a rank order.

Thus we obtain the order statistics of the sample with the values ​​. Wherein is defined as the -th ordered statistics.

3 Calculation of the estimator and

(also referred to as a weight ) to the sum of the number of pairs of the order statistics in each case multiplied by a respective coefficient. If the number of observations in the sample is straight, is at an odd number. Thus, the following applies:

Wherein the coefficients are given by the components of the vector

With representative of the expected order statistics of a normal distribution

And the covariance matrix V consisting of the expected order statistics

The coefficients are also commonly found for the first 50 pairs of numbers in tables of many statistics books.

The variance and the mean of the sample can be calculated by

4 Compare the test statistic with a critical value

The value of the test statistic W n is compared with a critical value for a given sample size and comparing the predefined significance level α. For the critical values ​​with n < 50 tables exist, which are reprinted in many statistics books. Critical values ​​for samples with n> 50 can be determined using Monte Carlo simulation.

Evaluation of results

If the value of the test statistic is larger than the critical value, the null hypothesis is not rejected. That It is assumed that a normal distribution. The test statistic can be interpreted as a correlation coefficient, which can take values ​​between 0 and 1, similar to the coefficient of determination. The closer the test statistic is to 1, the less deviation shows the actual variance of the hypothetical variance under the assumption of normal distribution. However, there is statistically significant deviations, ie the test statistic is less than the critical value ( significantly small), then the null hypothesis is rejected in favor of the alternative hypothesis, it is believed that no normal distribution. Thus the Shapiro -Wilk test is in contrast to many other normality tests then reject the null hypothesis if the respective test statistic is larger than the critical value.

Analysis using p- value

Alternatively to Teststatistk be many computer programs in addition to the p-value.

The p- value indicates the probability of obtaining such a sample, as it was drawn under the assumption that the sample actually comes from a normally distributed population. ( Null hypothesis is true)

  • The smaller the p-value, the smaller the probability that such a sampling vorkäme at a normally distributed population.
  • A p-value of 0 indicates that there is 0% likely, and a p-value of 1 means that it is 100% likely to draw such a sample if they came from a normal distribution.
  • In general, the null hypothesis is rejected if the p- value is less than the predetermined level of significance.

The method for calculating the p-value depends on the sample size. The probability distribution is known. For samples with a transformation is performed in the normal distribution.

The values ​​of σ, γ, μ for the respective sample sizes are calculated by Monte Carlo simulation.

Practical Example

The following 10 observations ( ) of a sample to be checked for normal distribution:

  • 200, 545, 290, 165, 190, 355, 185, 205, 175, 255

The parent sample is:

The number of sample is straight, thus number pairs. The corresponding weights are taken from a table.

  • B = 0.5739 (545-165) 0.3291 (355-175) 0.2141 (290-185) 0.1224 (255-190) 0.0399 (205-200) = 218.08 59.24 22.48 7.96 0.2 = 307, 96

For the sample, s = 117.59

As a result,

The critical value for a significance level of at is taken from a table and reads.

Since (0.76 < 0.842 ), falls in the rejection region and the null hypothesis is rejected. Consequently, it is assumed that the sample comes from any normally distributed population. The density function of the W - test statistic is very left-skewed and the rejection region of the test falls into the small end of the distribution.

Pros and Cons

Benefits

  • Compared to a more subjective visual inspection for normal distribution using a histogram or QQ plots provides the Shapiro - Wilk test as a statistical significance test the possibility of making an assessment in accordance with objective standards.
  • In many testing situations, the test has a high test strength, especially in smaller samples with n < 50
  • Mean and variance of the hypothetical normal distribution need not be known beforehand.
  • The test can be used for sampling between 3 < n < 5000.
  • Many common statistical software packages such as SAS, SPSS, Minitab and R have implemented the test.

Disadvantages

  • The test is very sensitive to outliers, for both single-sided and double-sided outlier. Outliers can distort the distribution pattern greatly, so that by the assumption of normal distribution could be falsely rejected.
  • The fact that even larger studies are carried out by computer programs under certain circumstances may lead to wrong decisions of the test: Since data from the standard normal distribution based on real numbers, but the computer calculated with rounded figures, Rounding errors can add up quickly, so that deviations between theoretical and empirical data are artificially generated, which when they are large enough, could result in the rejection of the null hypothesis.
  • The test is relatively susceptible bonds ( Ties), ie when there are many identical values, the test strength is greatly impaired. If it was originally worked with rounded data, can improve the strength test with the so-called Sheppard correction. The correction of Sheppard produces a customized, given by

With a rounding difference.

  • The operation of the test is very mathematical and therefore not easy to understand.
  • The test requires the use of special coefficients, the weights are only available for smaller sample sizes n < 50 in the form of a table.
  • When calculating the test statistic and the critical values ​​without a computer program, the computational effort for larger sample sizes is very high.

Alternative methods

Other significance tests

In addition to the Shapiro - Wilk test are at least 40 other normality tests or modifications of individual tests.

Normality tests compare to the one or the other characteristic features of the model- standard normal distribution, which to some extent serve as benchmarks, with the distribution of the sample. The tests differ in the ways, what standards they use as a comparison criterion.

During the Shapiro -Wilk test employing the technique of the regression, and correlation and the correlation analyzed for variance, other test methods are based on the study of the distribution function (for example, Kolmogorov -Smirnov test, Anderson -Darling test, Cramer- von Mises test ).

Other tests focus their attention on the comparison of skewness and kurtosis properties (eg D' Agostino - Pearson test, Jarque - Bera test, Anscombe - Glynn test).

The strength of each test normality tests varies depending on sample size, actual distribution and other factors such as outliers and bonds. There is no single test that has the highest strength test for all situations.

Graphical methods

Histogram and normal probability plots as the QQ plot or the PP plot are often used as tools for visual inspection of the distribution on normal theory and the testimony of a significance test either confirm or contest.

726380
de