Mann–Whitney U
The Wilcoxon-Mann -Whitney test (also called " Mann-Whitney U- Test", " U- Test", " Wilcoxon rank sum test ") is a parameter-free statistical test. The U- test is a homogeneity test. He used to check the significance of agreement between two distributions, ie whether two independent distributions A and B (for example, an unaffected and affected ) belong to the same population. The test was developed by Henry Mann and Donald Whitney (1947 ) and Frank Wilcoxon (1945 ). The central idea of the test was developed in 1914 by the German educator Gustaf Deuchler.
- 2.2.1 Exact critical values
- 2.2.2 Approximate critical values
- 3.1 Two-sided test 3.1.1 Exact critical values
- 3.1.2 Approximate critical values
- 3.2.1 Exact critical values
- 3.2.2 Approximate critical values
Assumptions
- The random variables with continuous distribution functions or that differ only by a shift of one another, that is:
- There are independent samples of the and that are mutually independent.
Test statistic
For testing the hypotheses of the Wilcoxon-Mann - Whitney test
There are two test statistics: the Mann-Whitney U- statistics and the Wilcoxon rank sum statistic. Due to the link between the test statistics
Are the Wilcoxon rank sum test and the Mann- Whitney U-test equivalent.
Mann-Whitney U statistic
The Mann-Whitney U- test statistic is
Where S (X, Y) = 1 if Y
Exact critical values are only tabulated and can be found for small sample sizes in the table below ( the two-sided test and the one-tailed test ).
For, and can
Can be approximated by the normal distribution.
The critical values are then obtained from the critical values of the approximate normal distribution.
The Wilcoxon rank sum statistic is
With the rank of the i-th X in the pooled ordered sample.
In this form of testing often carries the name Wilcoxon rank sum test.
The exact distribution of the condition of the null hypothesis can be easily found by combinatorial considerations.
However, the computational effort for large values of increases rapidly.
One can calculate the exact critical values for significance level by means of a recursion formula:
The formula is produced, if conditioned to the condition whether the last value in the arrangement of an X ( ... X) or Y ( ... Y).
For or (also: or ), the test statistic
Can be approximated by the normal distribution.
The critical values are then obtained from the critical values of the approximate normal distribution.
The test can also be for the one-sided hypotheses
Be formulated.
The test is especially interesting, because in the acceptance or rejection of the null or alternative hypothesis, the following null and alternative hypotheses can be accepted or rejected ( under the above conditions ):
That is,
The mean values of the distributions A and B are different.
That is,
the medians of the distributions A and B differ.
If the conditions in the hypothesis of the medians are not met, then you can resort to the median test.
Randomly 20 people were taken and determined their net income from the data of the general population survey conducted by the Social Sciences 2006:
It has two samples prior to sample of men with values and sample of women with values.
We may now consider whether the income of men and women are equal ( two -tailed test ) or the income of women lower ( one-tailed test ) with the distribution function of the income of men and the distribution function of the income of women.
We consider here the tests
First, a test statistic is calculated from two rows per formed:
And are the numbers of the numerical values per row, and the ranks of orderly rows.
The ranks of the numerical values are summed for and separated into two columns.
If two or more values in both data sets the same, then the medians (or arithmetic mean ) must be entered in both columns each rank.
For the tests, you need the minimum of and, ie.
For our example results
With the correct calculation must apply or.
The test statistic is then compared with the critical value (s).
The example is chosen so that both compared with exact critical values as well as the approximate values is possible.
Use the table with results and a critical value of a significance level of.
Rejected the null hypothesis when it is;
this is not the case here.
Since the test statistic is distributed approximate normal, it follows that the
Is distributed.
For a significance level of the non- rejection region of the null hypothesis in the two-sided test by 2.5 % results - and 97.5 % - quantile of the standard normal distribution.
It appears, however, that
the test value is within the interval and the null hypothesis can not be rejected.
Use the table with results and a critical value of a significance level of ( another level of significance as the two-sided test ).
Rejected the null hypothesis when it is;
this is not the case here.
For a significance level of the critical value than the 5 % quantile of the standard normal distribution and the non- rejection region of the null hypothesis as follows.
It appears, however, that
the null hypothesis can not be rejected.
The following table is valid for (two-sided ) or ( one-sided) with.
A - entry means that the null hypothesis in each case at the given significance level can not be rejected.
For example,
is:
Exact critical values
Approximate critical values
Wilcoxon rank sum statistic
Exact critical values
Approximate critical values
One-sided hypotheses
Derived hypotheses
Example
Two-sided test
Exact critical values
Approximate critical values
One-sided test
Exact critical values
Approximate critical values
Table of critical values of the Mann-Whitney U statistic