Statistical hypothesis testing

A statistical test is used in mathematical statistics to make on the basis of the present observations a reasoned decision on the validity or invalidity of a hypothesis. Formally, a test is thus a mathematical function that assigns an observation result of a decision. Since the available data are realizations of random variables can be in most cases not say with certainty whether a hypothesis is true or not. One therefore tries to control the probabilities of wrong decisions, which corresponds to a test at a given significance level. For this reason it is also called a hypothesis test or a test of significance.

  • 9.1 Parametric and nonparametric tests 9.1.1 Parametric tests ( parametric test methods)
  • 9.1.2 Nonparametric Tests
  • 9.1.3 decision scheme parametric / nonparametric test

Interpretation of a statistical test

A statistical test procedures can be compared to a court case in principle. The method has (mostly) determine the purpose, whether there is sufficient evidence to convict the accused. It is always assumed that the innocence of a suspect, and as long as serious doubts about the evidence for an actual offense exist, an accused is acquitted. Only if the evidence of the guilt of a defendant clearly prevail, there will be a conviction.

There are therefore at the beginning of the process, the two hypotheses "of the suspect is innocent," and " the suspect is guilty." The former is called the null hypothesis of it is provisionally considered. The second is called the alternative hypothesis. She is the one who is trying to "prove".

In order not to easy to condemn an innocent man, the hypothesis of innocence is only discarded if a mistake is very unlikely. It also depends on the probability of a type I error speaks (thus condemning an innocent person ) to control. Naturally, the probability of a Type II error is this unbalanced approach (ie handsfree one culprit ) "large". Due to the stochastic structure of the test problem can be as in a court of wrong decisions in principle not be avoided. However, one tries to construct optimal tests that minimize the error probabilities in statistics.

An introductory example

It is an attempt to develop a test for clairvoyant abilities.

One subject is shown 25 times the back of a purely random playing card and it is then asked which of the four colors ( clubs, spades, hearts, diamonds ) is one of the map. The number of hits we call.

Since the clairvoyant abilities of the person to be tested, we provisionally from the null hypothesis, the test person was not clairvoyant. The alternative hypothesis is in accordance with: The test person is gifted clairvoyant.

What does this mean for our test? If the null hypothesis is correct, the subject is can only try to guess the respective color. For each card there is of course a probability of 1/4 to answer correctly. If the alternative hypothesis is correct, that person has a greater probability than 1/4 for each card. We call the probability of a correct prediction.

The hypotheses are then:

And

If the test subject shall designate all 25 cards right, we will consider them to be clairvoyant and of course reject the null hypothesis. And with 24 or 23 hits also. On the other hand, there are only 5 or 6 hits no reason to. But what about 12 hits? What's with 17 hits? Where is the critical number of hits, from to, we can no longer believe, whether it be pure fluke?

So how do we determine the critical value? With ( so that we want to see only the gift of clairvoyance when all cards have been correctly identified ) one is more critical than with. In the first case we will consider is hardly a person as a clairvoyant, in the second case some more.

So in practice, it depends on how critical you want to be accurate, so how many times you allow a wrong decision of the first kind. With the likelihood of such a mistake:

Therefore very small. It is the probability that the test subject has by chance 25 times guessed correctly.

Less critical, with, we obtain the binomial distribution,

A much greater chance.

Before the test, a probability of type I error is fixed. Typical values ​​are between 1% and 5 %. Depending on can be (in this case a significance level of 1%) then determined such that

Applies. Among all numbers that satisfy this property, you will select most recently as the smallest number that satisfies this property in order to keep the probability of Type II error small. In this concrete example follows: . A test of this kind is called binomial test, since the number of hits is distributed binomial under the null hypothesis.

Possible wrong decisions

Although it is desirable that the test result of the present data "right" decides there is a possibility of wrong decisions. In the mathematical model, this means that you have a type 1 error ( α error) committed with proper null hypothesis and choosing the alternative. If you see confirms the null hypothesis, although it is not true, one commits a type 2 error ( β error).

In statistical practice, it makes this ostensibly symmetric problem an unbalanced: So You put a significance level fixed α which provides an upper bound on the probability of a type I error. Tests with this property are called test of level α. Following this, one tries to obtain an optimal test for the given level by allowing in all tests the level α one is looking, which has the lowest probability of a type 2 error.

The formal procedure

Generally you go in the application of a test with the following steps:

  • If not, then is maintained.
  • Located in, so you lean in favor of.

Formal definition of a statistical test

Parametric statistical test

Let be a random variable that maps from a probability space into a measurement space. Is additionally parametric distribution assumption, that is, a family of probability measures, wherein the distribution of is. This is the parameter space, which is usually a subset is in the practice of. A decomposition of disjoint into two sets and defines the test problem:

  • ,

The null hypothesis and the alternative hypothesis respectively.

A measurable function is called test. This test function can now presents the following interpretation based on:

  • Refuse or reject the null hypothesis
  • Maintained null hypothesis

The set of all observation results that lead to a rejection of, is called critical region of the test.

Let now a significance level. Then is called a test, a test that level for the test problem against (also level - test ) when

In the case of the existence of one searches usually the critical region satisfying for those who meet the conditions for all, and for all the following optimality condition:

Neyman -Pearson tests

Information about the probabilities of errors of first and second kind are at their best possible if null and alternative hypothesis are characterized by only one each value of the parameter: . Reason is that it can be dispensed with in this simplest possible case to the formation of the supremum to determine the levels of significance and an upper bound for a type II error. Consequently, let designed to test this situation, decisions particularly well compare with each other, so that optimal tests can be found.

A so-called Neymann - Pearson test is a so-called likelihood ratio test rejects the null hypothesis if

Applies. This test is only according to the Neyman -Pearson Lemma best test of level.

Neyman -Pearson tests can be one-sided hypotheses on the form and expand, if the distribution family has a monotone density ratios.

Asymptotic behavior of the test

In most cases the exact probability distribution of the test statistic is unknown, under the null hypothesis. So you have the problem that no critical area can be set to predetermined level. In these cases extending the class of acceptable test to those that hold the correct asymptotic level. Formally, this means that the field K are chosen so that for all the condition

Is satisfied. As a rule, we obtain such asymptotic tests via normal approximation; So you try the test statistic to transform so that it converges to a normal distribution.

Simple examples are the single and double test for expected values ​​. Here is the asympotische distribution directly from the central limit theorem as applied to the arithmetic mean. But there are a number of other statistical methods that allow the derivation of the asymptotic normal distribution even for more complicated functionals. Among the delta method falls for nonlinear, differentiable transformations asymptotically normally distributed random variables:

Let be a differentiable function and is an estimator - normally distributed with covariance matrix then has the following distribution: .

Furthermore, the nonparametric delta method has (also: Influence Function Method ) brought some progress:

Is a functional that depends on the distribution. Be the Gateux derivation of the statistics in (influence function) and is Hadamard differentiable with respect to, then has the following distribution: .

The delta method allows Normalverteilungsapproximationen for nonlinear, differentiable transformations ( asymptotically ) normally distributed random variables, while the influence function method allows such approximations for many interesting characteristics of a distribution. Included here are the moments ( ie about: variance, kurtosis, etc. ), but also functions of these moments (about: correlation coefficient).

Another important requirement of a good test is that it is more sensitive with increasing sample size. In statistical terms, this means that when there is a consistent test statistic increases the probability that the null hypothesis is actually rejected in favor of the alternative if it is not true. Especially when the difference between the actual behavior of the random variables and the hypothesis is very low, it is not detected until a suitably large sample size. However, if these differences are of practical significance, and generally justify the expense of a large sample, dependent on the aspect to be examined.

Problem of model selection

Most mathematical results are based on assumptions that are made with respect to certain properties of the observed random variables. Depending on the situation, different test statistics are chosen so that its ( asymptotic ) properties largely depend on the demands on the underlying distribution family. In general, these model assumptions must first be tested empirically in order to ever be applied to. Critical here is mainly that the typical testing procedures are subject to strict conditions that are rarely met in practice.

Types and properties of tests

Parametric and nonparametric tests

Parametric tests ( parametric test methods)

In parameter tests interested concrete values ​​such as variance or mean. So A parametric test method makes statements about population parameters or constants appearing in the distribution function of a study variables. For this, all parameters of the population must be known ( which is often not given). For a parameter test each of the possible sample has the same chance of realization. Parametric tests assume that the observed sample data come from a population in which the variables or characteristics have a certain scale level and a certain probability distribution, often interval scale level and normal distribution. In these cases, it is therefore interested to test hypotheses about certain parameters of the distribution.

If the distributional assumptions made ​​are not correct, the results of the test in most cases are useless. Specifically, the probability of a Type II error can no longer minimize meaningful. One then says that for many alternatives decreases the power.

Nonparametric Tests

In non-parametric tests ( nonparametric tests or distribution tests called ) the type of random distribution is checked: You decide whether an existing data derived from observations or frequency distributions null hypothesis that it has drawn from a random sample, is consistent with a null hypothesis that one of the distribution has established in the population. So Nonparametric tests come out with other assumptions, the amount of approved alternative hypothesis and distributions can not be described by a parameter.

Typical examples:

  • Tests on a certain distribution function as the Kolmogorov -Smirnov test.
  • The Wilcoxon-Mann -Whitney test compares the position of two independent samples.
  • The Kruskal -Wallis test compares the situation of three or more groups of independent samples.
  • The Wilcoxon signed-rank test compares the position of two dependent samples ( eg, pairwise comparisons )
  • The Friedman test compares the situation of three or more groups of dependent samples.

However, parametric tests despite violation of their assumptions often offer a better power than non-parametric, the latter are rarely used.

Decision scheme parametric / nonparametric test

Basically, a parametric test of a non-parametric alternative is preferred. A parametric test uses more information than a nonparametric test, what the test quality increases (assuming that the extra information is correct ). The following algorithm ( in pseudocode ) can be applied to the selection of a parametric test or a nonparametric alternative. STOP is reached, the algorithm is terminated.

Free distribution and distribution -bound tests

In distribution bound or parametric tests, the test statistic depends on the distribution of the sampling variables, so their distribution in the population decreases. Often, a normal distribution is assumed. An example of a distribution of bound test is the F-test to compare two variances of two normally distributed populations.

Called by distribution -free assays, including non-parametric or nonparametric test, the test statistic does not depend on the distribution of the sampling variables. An example of a distribution-free test is the Levene test to compare two variances of two arbitrarily distributed populations.

Conservative test

For a conservative test applies to any check, that the probability of a type 1 error ( acceptance of the alternative hypothesis as a result of the test decision, although the null hypothesis is true) is less than the predetermined significance level. The consequence is that the non- rejection region of the null hypothesis is wider than is actually necessary. Thus, the null hypothesis is rarely rejected as dictated by the level of significance. It behaves conservatively and promotes the acceptance of the null hypothesis.

An example of a conservative test is the binomial test (test for unit value, for example, vs.. ). Due to the discreteness of the test statistic can not be reached, that applies to the critical value. Instead, the proposal calls for. We choose therefore generally considered critical value the value that leads to a significance level of at most. The predetermined level of significance can therefore be practically fell significantly below.

Exact test

A precise test is a test in which the distribution of the test statistic for the simple calculation is not approximated by a different distribution. It is used the exact sampling distribution. Exact tests are about the Fisher's exact test or the binomial test.

An example here is the binomial test (test for unit value, such as vs. ). Due to the central limit theorem the binomial distributed test statistic can be approximated by the normal distribution, for example, if the following holds. Under certain circumstances, the use of a continuity correction in this case is to improve the approximation necessary.

Overview tests

The most important tests can be based on various criteria characterize, for example, after

Unless otherwise stated, it is assumed in all tests in the following table assumes that the observations are independent and identically distributed. It can be used the following abbreviations:

  • GG: population
  • ZGS: Central Limit Theorem

Non-parametric tests are presented with a yellow background.

Tests of location parameters ( mean, median )

Tests on scattering

Tests for context and association parameters

Adjustment or distribution tests

Tests in the regression and time series analysis

Various tests

Others

Special forms of these tests are:

Comments

228235
de