Mann–Whitney U

The Wilcoxon-Mann -Whitney test (also called " Mann-Whitney U- Test", " U- Test", " Wilcoxon rank sum test ") is a parameter-free statistical test. The U- test is a homogeneity test. He used to check the significance of agreement between two distributions, ie whether two independent distributions A and B (for example, an unaffected and affected ) belong to the same population. The test was developed by Henry Mann and Donald Whitney (1947 ) and Frank Wilcoxon (1945 ). The central idea of the test was developed in 1914 by the German educator Gustaf Deuchler.

  • 2.2.1 Exact critical values
  • 2.2.2 Approximate critical values
  • 3.1 Two-sided test 3.1.1 Exact critical values
  • 3.1.2 Approximate critical values
  • 3.2.1 Exact critical values
  • 3.2.2 Approximate critical values

Assumptions

  • The random variables with continuous distribution functions or that differ only by a shift of one another, that is:
  • There are independent samples of the and that are mutually independent.

Test statistic

For testing the hypotheses of the Wilcoxon-Mann - Whitney test

There are two test statistics: the Mann-Whitney U- statistics and the Wilcoxon rank sum statistic. Due to the link between the test statistics

Are the Wilcoxon rank sum test and the Mann- Whitney U-test equivalent.

Mann-Whitney U statistic

The Mann-Whitney U- test statistic is

Where S (X, Y) = 1 if Y

Exact critical values

Exact critical values ​​are only tabulated and can be found for small sample sizes in the table below ( the two-sided test and the one-tailed test ).

Approximate critical values

For, and can

Can be approximated by the normal distribution. The critical values ​​are then obtained from the critical values ​​of the approximate normal distribution.

Wilcoxon rank sum statistic

The Wilcoxon rank sum statistic is

With the rank of the i-th X in the pooled ordered sample. In this form of testing often carries the name Wilcoxon rank sum test.

Exact critical values

The exact distribution of the condition of the null hypothesis can be easily found by combinatorial considerations. However, the computational effort for large values ​​of increases rapidly. One can calculate the exact critical values ​​for significance level by means of a recursion formula:

The formula is produced, if conditioned to the condition whether the last value in the arrangement of an X ( ... X) or Y ( ... Y).

Approximate critical values

For or (also: or ), the test statistic

Can be approximated by the normal distribution. The critical values ​​are then obtained from the critical values ​​of the approximate normal distribution.

One-sided hypotheses

The test can also be for the one-sided hypotheses

Be formulated.

Derived hypotheses

The test is especially interesting, because in the acceptance or rejection of the null or alternative hypothesis, the following null and alternative hypotheses can be accepted or rejected ( under the above conditions ):

That is, The mean values ​​of the distributions A and B are different.

That is, the medians of the distributions A and B differ.

If the conditions in the hypothesis of the medians are not met, then you can resort to the median test.

Example

Randomly 20 people were taken and determined their net income from the data of the general population survey conducted by the Social Sciences 2006:

It has two samples prior to sample of men with values ​​and sample of women with values. We may now consider whether the income of men and women are equal ( two -tailed test ) or the income of women lower ( one-tailed test ) with the distribution function of the income of men and the distribution function of the income of women. We consider here the tests

First, a test statistic is calculated from two rows per formed:

And are the numbers of the numerical values ​​per row, and the ranks of orderly rows. The ranks of the numerical values ​​are summed for and separated into two columns. If two or more values ​​in both data sets the same, then the medians (or arithmetic mean ) must be entered in both columns each rank. For the tests, you need the minimum of and, ie.

For our example results

With the correct calculation must apply or. The test statistic is then compared with the critical value (s). The example is chosen so that both compared with exact critical values ​​as well as the approximate values ​​is possible.

Two-sided test

Exact critical values

Use the table with results and a critical value of a significance level of. Rejected the null hypothesis when it is; this is not the case here.

Approximate critical values

Since the test statistic is distributed approximate normal, it follows that the

Is distributed. For a significance level of the non- rejection region of the null hypothesis in the two-sided test by 2.5 % results - and 97.5 % - quantile of the standard normal distribution. It appears, however, that the test value is within the interval and the null hypothesis can not be rejected.

One-sided test

Exact critical values

Use the table with results and a critical value of a significance level of ( another level of significance as the two-sided test ). Rejected the null hypothesis when it is; this is not the case here.

Approximate critical values

For a significance level of the critical value than the 5 % quantile of the standard normal distribution and the non- rejection region of the null hypothesis as follows. It appears, however, that the null hypothesis can not be rejected.

Table of critical values ​​of the Mann-Whitney U statistic

The following table is valid for (two-sided ) or ( one-sided) with. A - entry means that the null hypothesis in each case at the given significance level can not be rejected. For example, is:

544718
de