Behrens–Fisher problem

The Behrens -Fisher problem is a problem of mathematical statistics whose exact solutions has been shown to have undesirable properties, which is why it is preferable to approximations.

Wanted is a nichtrandomisierter similar test of the null hypothesis of equal expected values ​​, two normally distributed populations whose variances are unknown and not be assumed to be equal. The similarity of the test stating this is that the null hypothesis is in its validity with exactly probability, the specified level of significance, rejected, how big and different and always the unknown variances and are. For reasons of power of the test is based upon the following " Behrens- Fisher " test statistic:

Where and are the mean values ​​and the standard deviations and the two samples are; with and their respective scope is called.

The Behrens -Fisher problem generalizes the t- test for two independent samples; this presupposes that the variances of both populations match.

Formation

Ronald Fisher led 1935 " fiducial inference " for a solution to this problem. He was referring to a previous work of WV Behrens from 1929. Behrens and Fisher proposed to determine the distribution of the above- mentioned test size.

Fisher approximate this distribution by ignoring the randomness of the relative size. Consequently, the resulting test did not have the desired property, reject the null hypothesis with probability whenever it applies. This provoked a controversy that is commonly known as the Behrens -Fisher problem.

Non-existence of a desirable solution

Linnik (1968, Theorem 8.3.1 ) has shown that there is not a continuous function for the boundary between the above-mentioned receiving and Ablehnbereich Behrens -Fisher test statistic, only the quotient of the empirical variance of the average values ​​, (as well as constants, and the significance level ) depends. The boundary between acceptance and Ablehnbereich any exact solution of the Behrens -Fisher problem is necessarily discontinuous in this quotient. More than that: An exact solution requires that the Ablehnbereich the Behrens -Fisher test size includes environments of points for which is an unsustainable property ( Linnik, 1968). That instead of Linnik and said variance ratio, and refers, is not essential since the problem by means of the latter is described in an equivalent manner.

Best approximation by means of a non-convergent series approach

A job that Linnik (1968 ) never mentioned, is that of BL Welch ( 1947). Just two decades earlier has namely Welch ( 1947), who, like Fisher, worked at University College London, made ​​an approach to the exact solution of the Behrens -Fisher problem, which as a continuous function in the boundary between acceptance and Ablehnbereich the test size would describe. Welch ( 1947) are for a given level of significance, this limit initially for the empirical mean difference as a function of the empirical variances and in the form of a partial differential equation of infinite order precisely to. He also describes the method how to approach the solution by means of three Taylor series with arbitrary precision. The expansion of this function shows that it can be factored into a product of the estimated standard deviation of the mean difference, and only a variance from the quotient ( and constants ) dependent function. So the test according to the size of standardized function depends - as desired - only by variance from the quotient. Now Converged Welch's series approach evenly, so that the function infinitely differentiable, so would also be continuous, this Linniks would contradict evidence, therefore there is not such a function. It follows that Welch's approach can not converge uniformly. Graphical representations of the function to different well-developed systems, with very small and also slightly larger, and let this conclusion quite believable, although remarkable for not too small, and the results in terms of smoothness and accuracy of the numerically calculated error probabilities of the first kind are. Aspin's (1948 ) development of the series approach of Welch to the fourth power in reciprocals of degrees of freedom provides by far the most accurate approximation, unless, and are much smaller than usual. The resulting Welch - Aspin test is in Bach Maier ( 2000) and described in detail in German language.

The approximation in the so-called Welch - test

Approximate approaches to the Behrens -Fisher problem, there are several. One of the most widely used approximations ( for example, Microsoft Excel) is also from Welch. This is referred to based on this approximation Welch test as Welch test.

The variance of the mean difference is. The approximate distribution of Welch ( 1938) by that Pearson curve type III ( a scaled chi-square distribution), the first two moments ( mean and variance) with those of the same. This applies to the following number of degrees of freedom ( degrees of freedom, df) with generally non-integer values ​​to:

If the null hypothesis of equal expected values ​​, the distribution of the aforementioned Behrens -Fisher test magnitude, which depends somewhat on the ratio of the standard deviations could be approximated by Student's t distribution with these degrees of freedom. Now, however, this also contains the variances, which are unknown. It has finally prevailed following estimate of the degrees of freedom, which is simply due to the replacement of not pooled variances by the sample variances:

By this estimate, but is a random variable. A t-distribution with a random number of degrees of freedom does not exist but. However, this is not an obstacle, to compare the test statistic with the corresponding quantile of the t distribution with the estimated degrees of freedom. In this way an infinitely often differentiable dependent on the empirical variance function arises as the boundary between acceptance and Ablehnbereich the Teststgröße.

This method considers the significance level is not exact, but is not too far from it. Only when the population variances, and are identical or in the case of rather small sample sizes can be assumed to be nearly identical, at least, the ordinary t-test of student is the better choice.

112301
de