Bayes' theorem

The Bayes' Theorem is a mathematical theorem from probability theory, which describes the calculation of conditional probabilities. It is named after the English mathematician Thomas Bayes, who first described it in a special case in the 1763 posthumously published treatise An Essay Towards Solving a Problem in the Doctrine of Chances. It is also called Bayes formula or ( calqued ) Bayes theorem.

  • 7.1 Problem
  • 7.2 Example

Formula

For two events and the probability of leaves under the condition that has occurred, specify the probability of the condition that has occurred:

This is

In a finite number of events of Bayes' Theorem is:

If a partition of a result set into disjoint events is true for the a posteriori probability

The last transformation step is also referred to as marginalization.

Because an event and its complement always represent a decomposition of the result set, especially

Evidence

The theorem follows immediately from the definition of conditional probability:

The relationship

Is an application of the law of total probability.

Interpretation

The set of Bayes allowed in a sense reversing conclusions: One starts with a known value, but is actually interested in the value. For example, it is of interest, as is the probability that someone has a particular disease when a custom-developed rapid test shows a positive result. From empirical studies, the probability is known in general for the results of the test in an infested by this disease person to a positive result. The desired conversion is only possible if one knows the prevalence of the disease, that is the (absolute ) probability of the disease in question occurs in the overall population (see calculation example 2).

For understanding a decision tree or a fourfold table can help. The process is also known as a reverse induction.

Sometimes one encounters the fallacy of wanting to close directly from to, without taking into account the a priori probability, for example, by assuming the two conditional probabilities would have approximately the same size (see Prevalence error). As the Bayes' Theorem shows, but this is only the case, though, and about the same size.

Also note that conditional probabilities alone are not suitable on its ability to prove a specific causal relationship.

Areas of application

  • Statistics: All questions of learning from experience, in which an a priori probability assessment changes, and converted into a posterior distribution on the basis of experience (cf. Bayesian statistics).
  • Data mining: Bayesian classifiers are theoretical decision rules with provably minimal error rate.
  • Spam detection: Bayesian Filter - From the characteristic words in an e -mail (event A) will be on the Spam property (event B) is closed.
  • Artificial Intelligence: Here the Bayes' Theorem is used to draw conclusions also in domains with " uncertain " knowledge can. These are then not deductive, and thus not always correct, but rather abductive nature have, but definitely proved for hypothesis formation and learning in such systems to be useful.
  • Quality management: assessing the predictive value of a test series.
  • Decision Theory / Information Economics: Determination of the expected value of additional information.
  • Basic model of the traffic distribution.
  • Bioinformatics: Determination of functional similarity of sequences; Reconstruction of pedigrees and the age of the node
  • Communication Theory: Solution of detection and Dekodierproblemen.
  • Econometrics: Bayesian Econometrics

Calculation example 1

In the two urns A and B are each ten balls. In A are seven red and three white balls in B a red and nine white. It any ball will be drawn from an urn randomly selected. In other words, whether drawn from box A or B is a priori equally likely. The result of the draw is: The ball is red. Wanted is the probability that this red ball comes from urn A.

It should be

The event " The ball comes from urn A"   the event " The ball comes from urn B" and   the event " The ball is red ."

Then:

(both urns are a priori equally likely )

( in urn A are 10 balls, 7 red )

( in urn B are 10 balls, of which 1 red )

(total probability of drawing a red ball )

This is.

The conditional probability that the drawn red ball is drawn from the urn A, is thus.

The result of the Bayesian formula in this simple example can be seen easily explained: Since both urns a priori equally likely to be selected and there are the same number of balls in two urns, have all the balls - and thus all eight red balls - the equal probability of being drawn. If you repeat a bullet from a random urn pulls and back again placed in the correct box, you will on average in eight of 20 cases, a red and draw a white ball in twelve of 20 cases ( therefore also the total probability of a red ball to mean the same). Of these eight red balls are on average seven from urn A and one from urn B. The probability that a red ball is drawn from urn A is thus equal.

Calculation Example 2

A certain disease occurs with a prevalence of 20 per 100,000 people. The facts that a person carries this disease in itself, so it has the chance.

In a screening test to determine which persons are carriers of the disease. denote the fact that the test is positive in a person. The manufacturer of the test assures that the test is the presence of the disease to 95 percent recognizes (sensitivity ) and when viewing the disease in 99 of 100 cases correctly located (specificity ), ie only in one percent of cases erroneously strikes, although the disease did is not present. The probability of a false positive test result is therefore. It refers to the complement of, so here the fact that a person examined is not sick. Its probability for it.

So we know the probability that the test is positive if the disease is present (ie, a 95 percent probability). The question is: How likely is the presence of the disease if the test is positive? So Wanted is the positive predictive value

Calculation using the Bayes' Theorem

.

Calculation using a tree diagram

Problems with a few simple classes and distributions can be displayed clearly in the tree diagram for the allocation of frequencies. Judging from the frequencies to relative frequencies or on (conditional) probabilities above, is from a tree an event tree, a special case of the decision tree.

Following the above information is obtained as absolute frequency at 100,000 people 20 actually ill persons, 99,980 persons are healthy. The test diagnosed in the 20 ill persons in 19 cases (95 percent sensitivity ) correct the condition. In one case, failed the test and shows the present illness not ( false negative). In the 99,980 healthy individuals the test shows because of its specificity of 99 percent at one percent, so at 1,000 individuals erroneously a disease to (false positive ) in 98 980 healthy individuals of the test correctly shows no disease at. Of the total of 1,019 people tested positive so only 19 are actually sick. The probability of testing positive person to be actually ill, is thus

Interpretation of results

The probability that a positive Tested is actually sick, so in this example is only about 1.86 percent, that is, a positive Tested still has a chance of over 98 percent, to be healthy, even though the test as sick him has classified. Given the intuitively as " reliability " misinterpreted specificity of the assay of 99 percent which is hard to believe, but is that the a priori probability of being actually ill (0.02 percent), only one-fiftieth of the likelihood of a false positive test result (one percent) is, false positive Tested therefore are much more frequently than is actually ill.

This problem of high false positive error rate always occurs when the property whose existence is tested, as here, the existence of a particular disease in the general population occurs only with a small prevalence rate.

This problem and its consequences are described by Gerd Gigerenzer in the book The ABC of skepticism in detail.

Bayesian statistics

Bayesian statistics is used, the set of Bayes in the context of inductive statistics for the estimation of parameters and testing of hypotheses.

Problem

The following situation is given: An unknown environmental condition (eg, a parameter of a probability distribution ) to be estimated on the basis of an observation of a random variable. Further knowledge is provided in the form of an a priori probability distribution of the unknown parameter. This a-priori distribution contains all the information about the state of the environment, which is given prior to the observation of the sample.

Depending on the context and philosophical school, the a priori distribution is a distribution

  • As mathematical modeling of subjective degrees of belief (subjective probability concept )
  • As an adequate representation of the general prior knowledge (where probabilities are understood as a natural extension of Aristotelian logic in terms of uncertainty - Cox's postulates ),
  • As known from preliminary investigations probability distribution of a real random parameter or
  • As a specifically chosen distribution, with ignorance of the parameter corresponds to an ideal way (objective a priori distributions, for example, by using the maximum entropy method).

The conditional distribution of the condition that assumes the value is referred to below. This probability distribution can be determined, and is also referred to as a likelihood value of the parameter according to observation of the sample.

The a posteriori probability can be calculated using the Bayes theorem. In the special case of a discrete a-priori distribution is obtained:

If the set of all possible environment states is finite, the posterior distribution in the value can be interpreted as the probability with which one expects the environmental status after observing the sample and with the inclusion of prior knowledge.

As an estimate used a follower of the subjectivist school of statistics are usually the expected value of the posterior distribution, in some cases, the modal value.

Example

Similar to above an urn am considered again, which is filled with ten balls, but now was unknown how many of them are red. The number of red balls is here the unknown state of the environment and as the a priori distribution should be assumed that all possible values ​​of zero to ten should be equally probable, that is, it applies to all.

Now going five times drawn with replacement a ball from the urn and denote the random variable that indicates how many of them are red. Assuming then a binomial distribution with parameters and, it is therefore

For.

For example, for, that is, two of the five balls drawn were red, the following values ​​result ( rounded to three decimal places )

It is seen that in contrast to the a priori distribution of the posterior distribution in the third line has the highest probability in the second row, in which all values ​​are assumed to be equally likely, that is, the a posteriori mode.

As expected value of the a posteriori distribution arises here:

110277
de