The p-value (also exceedance probability, significance value, p-value of probability English, English for probability.) Is a measure for the evaluation of statistical tests. He is closely associated with the level of significance, but can not be that easy in tables so that the practical application has only become possible with the introduction of computers and statistical software.

The p-value is a probability, and therefore takes a value between zero and one on. The value is determined by the selected sample. It indicates how likely it is to obtain such a sample result or a more extreme, if the null hypothesis is true. A common misconception is to equate this statement with the false claim that the p-value would indicate how likely the null hypothesis when receiving this sample result. With the p- value is thus indicated how extreme the result is: the smaller the p-value, the more likely the result against the null hypothesis. In various scientific disciplines stipulated limits have been established, such as 5 %, 1% or 0.1 % that are used to make decisions whether the null hypothesis can be rejected. If the null hypothesis is rejected, the result is referred to as statistically significant. Significantly in this context means only randomly across. The size of the p- value is no information about the magnitude of the true effect.

Mathematical formulation

When a statistical test an assumption ( null hypothesis ) is checked by a suitable random experiment is carried out, which provides the random variables. These random variables are called to a single number, test statistics, summarized:

For a concrete experimental outcome of the experiment one obtains a value

The p-value of the test output is the probability that a random attempt at a valid null hypothesis at least as "extreme" starts as the observed test. The chosen test statistic is therefore very important.

In right-sided test:

In left-sided test:

And with two-sided test:

The p-value indicates "like extremely " is calculated on the basis of the data collected value of the test statistic. It corresponds to the probability of obtaining the computed or a more extreme value of the test statistic if the null hypothesis. For composite null hypotheses, this conditional probability can be estimated only up.

The smaller the p- value, the more fundamental it is to reject the null hypothesis. Usually, a significance level before the test set and the null hypothesis is rejected if the p-value is less than or equal to.

After frequentist view of the introduced Fisher RA p value does not contain further information, but the fact whether it is less than a predetermined level, is of interest. In this form, is just another way of saying that the observation in the critical region is the Neyman - Pearson and adds 's theory of hypothesis tests add anything new.


Given a coin. The test null hypothesis is that the coin is fair, so that heads and tails are equally likely; the alternative hypothesis is that a result is more likely being not determined which of the two should be more likely. The random experiment to test the null hypothesis consists in the fact that the coin is tossed twenty times. K denote the number of litters that provide "head" as a result. With a fair coin ten times would be expected, " head ". When statistics are therefore selects usefully

Suppose the experiment provides times earnings " head ", ie. Under the null hypothesis, the number of heads is binomial distribution with and. The p-value for this test output is therefore

At a significance level of 5%, one would not reject the null hypothesis, that is, one can not conclude that the coin is not fair from the data.

If the test result time head, so then would the p-value for this test output

At a significance level of 5%, one would therefore reject the null hypothesis in this case, so close that the coin is not fair, however, need further tests with a significance level of 1%. ( More specifically: . One would look at the data location for insufficient to justify the conclusion that the coin is not fair to take this as evidence that the coin is fair, however, would be wrong. )

Relationship with the level of significance

There is an equivalence between a test method of calculating the p-value, and a method with the pre-determined level of significance. The p- value is calculated based on the observed value of the test statistic, and the critical value follows from the level of significance, so the right side is true, for example:


In statistical software in the implementation of a test of the p- value, see right below Asymptotic significance ( last line in the box ) is indicated. If the p - value is less than the predetermined level of significance, so the null hypothesis is rejected.

On the one hand, the output of the p- value, the software relieves explicitly in a test of it, to ask for the given level of significance, to make a test decision. On the other hand there is the danger that the researcher adapts the fact to be determined in advance significance level to get his desired result.

Other properties

If the test statistic has a continuous distribution, the P- value of the ( dot-shaped ) null hypothesis, uniform distribution on the interval.

Typical misinterpretations

Goodman formulated 12 statements about p- values ​​, which spreads very far and still are wrong. In particular, the following statements are false:

  • Wrong is: If p = 0.05, is the chance that the null hypothesis is true, only 5%.
  • Wrong is: A non- significant difference means in a comparison of means between two groups, that the means are equal.
  • Also is wrong: Only one significant difference means that the result in reality, for example, in clinical use, is important.