Naive Bayes classifier

Please help to eliminate the shortcomings of this article, and please take part you in the discussion! (Insert item)

A Bayesian classifier ( pronunciation: [ beɪz ], named after the English mathematician Thomas Bayes ), is, an inferred from the Bayestheorem classifier. It assigns each object of the class to which it belongs with the highest probability, or at the least cost incurred by arrangement. Strictly speaking, it is a mathematical function that assigns to each point of a feature space in a class.

To define the Bayesian classifier, a cost measure is required, which assigns to each possible classification costs. The Bayesian classifier is exactly the one classifier that minimizes the resulting classifications by all costs. The cost measure is sometimes called the risk function; they say then, the Bayes classifier minimize the risk of a wrong decision and was defined by the minimum -risk criterion.

If a primitive cost measure used, which causes only when wrong decisions cost, the Bayes classifier minimizes the probability of a wrong decision. We then say that he is defined by the maximum a posteriori criterion.

Both forms assume that the probability that a point of the feature space belongs to a given class is well-known, each class is thus described by a probability density. In reality, these density functions, however, are not known; they must be estimated. To this end, one suspects behind each class one type of probability distribution - usually a normal distribution - and tried with the data available to estimate their parameters.

Far more often the Bayesian classifier but is used to evaluate other classifiers: It artificially creates some classes and their probability densities generated with this model a random sample and lets the other classifier classify the objects in this sample classes. The result is compared with the classification that would have made ​​the Bayesian classifier. Since the Bayesian classifier is optimal in this case, is obtained an estimate as of the other is close to the optimum classifier. Simultaneously, the Bayesian classifier provides a lower bound for the error probability of all other classifiers in this scenario; better than the optimal Bayesian classifier, this can not be.

Naive Bayes classifier

Due to its fast computability with good recognition rate and the naive Bayes classifier is very popular. By means of the naive Bayes classifier, it is possible to determine the membership of an object ( class attribute) to a class. It is based on the Bayes theorem. One could consider a naive Bayes classifier as a star-like Bayesian network.

Basic assumption is that each attribute depends only on the class attribute. Although this is rarely true in reality, achieve naive Bayes classifiers in practical applications often good results as long as the attributes are not highly correlated.

For the case of strong dependencies between attributes an extension of the naive Bayesian classifier is a tree between the attributes makes sense. The result is extended naive Bayes classifier tree called.

Mathematical definition

A Bayesian classifier is a function that maps vectors from the -dimensional real-valued feature space to a set of classes:

Applies generally or for the case that two classes are considered, or if classes are considered.

Classification with normally distributed classes

If two classes described by normal distributions, as resulting from the Bayesian classifier decision boundary is between square. If the normal distributions beyond described by the same covariance matrix, the decision boundary between them is even linear. In these two cases, the discriminant function can be described very easily, which makes the classification easily and efficiently computable.

Example

In e- mail programs ( learning ) Naive Bayes filtering spam mails are filtered very efficiently. There are two classes of e-mails: spam and non-spam e- mails (). An e -mail consists of single words. From old, already classified emails for every word you can estimate the probability that it occurs in a spam or non-spam e- mail, so:

For a new e -mail now is to answer the question: If the probability is greater than or less than the probability? If, you will classify the new email as not spam, otherwise as spam.

For the probability applies after the Bayes theorem:

The probability that the e-mail appears, therefore, however, can not be underestimated, because usually occurs each email only once. Therefore, the e -mail programs, consider the expression

And is greater than 1, then we classify the email as spam, or otherwise as non-spam. The probability that any email is spam or not spam, we can estimate from the old emails again:

If the e-mail from the words ... and these words occur independently of each other, the following applies

The probability was already stated above, and thus, the overall ratio can be calculated:

At the end, three observations:

109897
de