Decision tree

Decision trees are ordered directed trees from which the representation of decision rules. They illustrate hierarchical, sequential decisions. They have a meaning in numerous areas in which will automatically classified or be derived or presented formal rules of empirical knowledge.

3.1 interpretability and comprehensibility
3.2 classification quality in real-valued data spaces
3.3 Size of Decision Trees and overfitting
3.4 Further processing of the results

4.1 Decision Forests
4.2 combined with neural networks

5.1 Application Programs
5.2 Customer Classification
5.3 Decision Trees in the BWL

Properties and applications

Decision trees are a method for the automatic classification of data objects and thus to the solution of decision problems. They are also used for clear indication of formal rules. A decision tree always consists of a root node and any number of internal nodes and at least two leaves. In this case, a logical rule and each leaf, each node representing a response to the decision problem.

The complexity and semantics of the rules are not restricted a priori. For binary decision trees can only assume one of two values each rule expression. All decision trees can be converted into binary decision trees.

Decision trees are used in many areas for use, for example,

In the stochastic conditional probability as a tree to illustrate probabilities (eg absolute frequencies )
In machine learning as a method for the automatic classification of objects
Data mining
In decision theory
In medical decision-making ( medicine) and in emergency medicine
In business rule management systems ( BRMS) for the definition of rules

Operation

Classification with decision trees

To read a classification of a single data object, one goes from the root node along the tree down. At each node, an attribute is queried and a decision on the selection of the next node. This procedure is continued until reaching a leaf. The sheet corresponds to the classification. A tree always contains rules for answering only just a question.

Example: A decision tree with low complexity

The adjacent binary decision tree is an answer to the question of whether an apple tree will bear fruit. As input, the tree requires a vector containing information about the attributes of an apple tree. An apple tree, for example, the attributes of age, possess natural variety and rich soil. Starting with the root node of the tree, the decision rules are applied to the input vector now. In this case, an attribute of the input vector in the example tree at each node queried at the root node as the age of the apple tree. The answer determines the next-hop node, and thus on the next applicable decision rule, in this case the question of variety, and then the question of the soil. If you move right after a sequence of evaluated rules on a sheet, you have the answer to the original question. Not always all levels of the decision tree must be traversed. For the apple tree described above, the answer is yes, so that the tree will bear fruit. This decision process is called formal classification.

Induction of decision trees

Decision trees can either be written manually by experts, or automatically induced with machine learning techniques from empirical values . There are several competing algorithms.

Induction of decision trees is usually carried out recursively in a top-down principle. For this it is crucial that an appropriate record with reliable experience for decision problem ( the so-called training data set ) is present. This means that for each object of the training data set, the classification of the target attribute must be known. At each induction step, the attribute is searched, with what can best be classified in this step with respect to the target attribute, the training data. As a measure for determining the best classification entropy measure, Gini index or other are used. The identified attribute is now used to separate the data. On the subsets thus created the procedure is applied recursively until each subset only properties are included in a classification. At the end, a decision tree is created that describes the experience and knowledge of the training data set in formal rules.

The trees can now be used for automatically classifying other records or to interpret and analyze the resulting rules.

Algorithms compared

The algorithms for automatic induction of decision trees to put all on the same recursive top-down principle. They only differ in their criteria for the selection of attributes and values of the rules at the nodes of the tree in their criteria for termination of the induction process and possible post-processing steps that optimize an already computed branch of a tree ( or entire trees) subsequently on the basis of various criteria.

The practice distinguishes between different families of algorithms:

The oldest family are the CHAIDs ( Chi -square Automatic Interaction Detector ) of 1964., You can generate trees based on discrete attributes.
The CART's ( Classification And Regression Trees ) family extends the CHAIDS mainly to the ability to handle real-valued attributes by a division criterion (English split -criterion ) is used for the allocation of real-valued attributes into discrete categories. In addition, some optimization strategies such as are the pruning introduced.
The recent algorithms are ID3 algorithm and its successor C4.5 and C5.0. They include formal to the family of CART's, but are often seen as a separate group, as they introduce a number of new optimization strategies compared to CART.

Error rate and effectiveness

The error rate of a decision tree is the number of incorrectly classified data objects in relation to all data objects in a data set. This number is determined on a regular basis either on the used training data or better on the disjoint to the training data as possible amount of correctly classified data objects.

Depending on the application, it may be of particular importance, either to keep the number of false positive or false negative classified objects in special low. So it is harmless to treat a healthy patient much about in emergency medicine, not be treated as a sick patient. The effectiveness of decision trees is therefore always a context- dependent parameter.

Pros and Cons

Interpretability and comprehensibility

A major advantage of decision trees is that they are well explained and understandable. This allows the user to analyze the results and to detect key attribute. This is useful especially when the basic characteristics of the data are a priori not known. Thus, the induction of decision trees is an important technique of data mining.

Quality of classification in real-valued data spaces

An often named disadvantage of decision trees is the relatively low quality of classification in real-valued data spaces, if one uses the trees for the automatic classification. Thus, the trees cut due to their discrete rules in most classification problems from the real world compared to other classification techniques such as neural networks or support vector machines something worse. This means that while for people easily comprehensible rules can be generated through the trees, but this understandable rules for classification problems in real world often statistically not possess the best possible quality.

Size of the decision trees and overfitting

Another disadvantage is the potential size of the decision trees when leave from the training data do not induce simple rules. This can have multiple negative effects: On the one hand, a human viewer quickly loses the overview of the context of the many rules, on the other hand result in such large trees but also mostly to overfitting to the training data set so that new records are only faulty automatically classified. It therefore called pruning methods have been developed to shorten the decision trees to a reasonable size. For example, you can limit the maximum depth of the trees or set a minimum number of objects per node.

Further processing of the results

Often one uses the decision trees as an intermediate step to a more efficient representation of the rules. To get to the rules, different decision trees are generated by various methods. It frequently occurring rules are extracted. The optimizations are superimposed to obtain a robust, general and correct set of rules. The fact that the rules in no relations to each other and contradictory rules that can be generated is detrimental to this method.

Further developments and extensions

Decision forests

A method to increase the quality of classification with the use of decision trees is the use of sets of decision trees, instead of individual trees. Such amounts of decision trees are used as decision forests (English: decision forests ) refers.

The use of decision forests is one of the machine learning, the so-called ensemble techniques. The idea of decision forests is that a single decision tree indeed must not provide optimal classification, but the majority decision of a set of suitable trees can achieve it very well. Common methods for generating suitable forests are boosting, bagging and arcing.

Disadvantage of decision forests is that it is no longer so easy for people to interpret the rules in all the trees in their entirety in a simple manner, as would be possible with individual trees.

Combination with neural networks

Decision trees can be combined with neural networks.

Thus, it is possible to replace inefficient branches of the tree by neural networks in order to achieve a higher, not achievable solely with the trees classification quality. The advantages of both classification methods can be used by the mapping of sub- structures in the other methodology: the trees do not need as much training data as to induce the neural networks. But they can be quite inaccurate, especially if they are small. The neural networks, however, classify accurate, but require more training data. Therefore, one tries to use the properties of decision trees for generating parts of neural networks by so-called TBNN ( Tree -Based Neural Network ) translate the rules of decision trees in the neural networks.

Practical Applications

Application programs

There are several applications that have implemented decision trees, as they are offered, for example, in the statistical software GNU R packages, SPSS and SAS. The latter two packages use the way - like most other data mining software packages also - the CHAID algorithm.

Customer classification

A bank wants to sell to a direct mailing campaign a new service. To maximize the profit, should be addressed with the action those households which correspond to the combination of demographic variables declared to be optimal, the corresponding decision tree. This process is called data segmentation or segmentation modeling.

So the decision tree provides good tips on who might respond positively to shipping. This allows the bank to write only those households that correspond to the target group.

Decision trees in the BWL

In the field of decision theory within the BWL decision trees are drawn from left to right ( and vice versa calculated) to calculate payoffs of expenditure and revenue and optimize the decision for maximum payoff.

These decision trees squares represent decisions, circles for options, and triangles terminate the branches. Decision trees of this type can be combined with other methods, linear distribution of assumptions (investment # 2) and PERT 3-point estimate ( Investment # 1) in the example above NPV ( net present value Net Present Value ) is used.

Data-Mining Recursion Entropy (information theory) Discrete mathematics C4.5 algorithm Pruning (decision trees) Boosting (machine learning) R (programming language) Classification Tree Method Decision table

310173