Wald–Wolfowitz runs test

The run or runs test ( also Wald-Wolfowitz test, after Abraham Wald and Jacob Wolfowitz, Iterationstest or Geary test) is a nonparametric test for randomness of a sequence. The starting point is an urn model with two types of balls ( dichotomous population ). There are n balls and taken it to be tested the hypothesis that the sampling is done randomly.

  • 4.1 Example of a metric feature

Method

There a dichotomous population n balls were removed. The results are in their chronological sequence. There are now all neighboring results summarized equivalent expression for a run or run. If the sequence is truly random, should not be too few runs, but not too many.

It is situated in the null hypothesis: The sampling was random.

For the determination of the number of runs in which the hypothesis is rejected, the distribution of runs is needed: Let n1, the number of balls of first grade and n2 = n - n1 of the second kind; whether r is the number of runs scored. After the symmetry principle, the probability of any sequence of balls with random sampling is the same. There are a total

Possibilities of collection.

Regarding the distribution of the number of runs, we distinguish between cases:

1 The number of runs r is even:

2 The number of runs r is odd:

If r is too small or too large, which leads to rejection of the null hypothesis. At a significance level of H0 is rejected if the test statistic is valid for r:

R (p) as a quantile of the distribution of R at the point P, in which case the principle of conservative testing is applied. Since the calculation of the critical values ​​of r for the rejection of the hypothesis is cumbersome, often one uses a table.

Simple Example

For a panel discussion with two political parties, the speakers were allegedly determined randomly. It was drawn that 5 representatives may speak in the following order from the party supination and 4 representatives of the party Toll:

S S S T T T T T S A representative of Toll complained that S would be preferred. It was made a run test:

It is n1 = 4 and n2 = 5 It was r = 6 runs.

It is clear that in the case of many runs there is no suspicion of favoring one of the parties. The null hypothesis is therefore rejected if there are too few runs. According to the table of the Run - test, H0 is rejected if r ≤ 2 So the test statistic is r = 6 is not in the rejection region; one can not conclude that the order of the speaker is not random according to the criteria of the Run - test.

Incidentally, in the next case:

S S S S T T T T T with r = 4 runs, the null hypothesis is not rejected, although almost everyone will have a suspicion that supination was preferred. One can but because of the relatively small number of observations does not exclude the possibility that the result is due to chance.

Supplements

Parameters of the distribution of R

The expected value of R

And variance

Population with more than two levels of the feature

If there is a finite sequence of real numbers of a metric trait, the result is dichotomized: One first determines the median z of the sequence. Values ​​are then values ​​interpreted as balls of the first type as spheres of the second kind. The resulting dichotomous sequence can then be tested for randomness (see example below).

If a non-numeric symbol sequence with more than two levels before, a numerical series must first be generated, in which case there may be a problem that the symbols can not be ordered.

Normal Approximation

For sample sizes n1, n2 > 20, the number of runs R is approximately normally distributed with mean and variance as above. This gives the standardized test statistic

The hypothesis is rejected if

With as a quantile of the standard normal distribution for the probability.

Applications

The Runtest can be used to check or non- stationarity of correlation in a time series, or other sequence, particularly when the distribution of the characteristic is unknown. The null hypothesis here is that successive values ​​are uncorrelated.

Run the test can be combined with the Chi -square test, as both test values ​​are asymptotically independently.

Example of a metric feature

It is the result

13 3 14 14 1 14 3 8 14 17 9 14 13 2 16 1 3 12 13 14 before. It is dichotomized with the median z = 13. For the first instantiation is set for the second form -.

0 1 -10 1 -12 1 -10 1 -5 4 -4 1 0 -11 3 -12 -10 -1 0 1 - - - - - --- Obtained with n1 = 11 ( ) and n2 = 9 (-) r = 13 runs. R is approximately normally distributed with mean

And variance

The test statistic z is then calculated as

At a significance level of 0.05 H0 is rejected if | z |> 1.96. This is not the case.

Decision: The hypothesis is not rejected. The elements of the sample are believed to have been taken at random.

697174
de