Pruning (decision trees)

Pruning is the English term for trimming ( Wing Clip ) of trees and shrubs. In the computer science in the field of machine learning, the term for simplifying, shortening and optimizing decision trees is used.

The idea of ​​pruning originally comes from the attempt to prevent the so-called overfitting in trees that are caused by induced learning. Overfitting indicates the induction of unwanted noise in a tree. Noise referred incorrect attribute values ​​or class affiliations, distort the data sets, thus increasing decision trees unnecessarily. By pruning the trees the unnecessary sub- trees to be cut again.

Pruning in the field of machine learning

Pruning methods can be prepared by two species share ( pre-and post -pruning ).

Pre- pruning method prevent full induction of the training sets by replacing a stop () criterion in the induction algorithm ( eg max. Tree depth or Information Gain ( Attr ) > minGain ). Pre- pruning methods are considered to be more efficient, as not an entire set is induced, but trees from the beginning remain small. Prepruning methods have a common problem, the horizon effect. Among the unwanted, premature Cancel induction by the stop ( ) criterion is to understand.

Post -pruning (or pruning only ) is the method most frequently used to simplify trees. In this case, nodes and sub-trees to be replaced by blades, to improve the complexity. By pruning can significantly reduce, but also improve the classification accuracy of unseen objects not just the size. While it may be the case that the accuracy of classification at the test set is poor, the accuracy of the classification properties of the tree, however, increases overall.

The methods are distinguished by their approach in the tree ( top-down or bottom-up ).

Bottom-Up Pruning

These procedures start on the last node in the tree ( at the lowest point ). Recursive following up, they determine the relevance of each node. Is the relevance for the classification not given, the node falls away and is replaced by a leaf. The advantage is that no relevant sub- trees may be lost by this process. These processes include the Reduced Error Pruning (REP ), the minimum cost -complexity pruning ( MCCP ) or the minimum Error Pruning (MEP).

Top -down pruning

In contrast to bottom-up methods, this methodology is at the root of the tree. The structure down following a relevance check performed which decides whether a node for the classification of all n items is relevant or not. By pruning the tree at an internal node, it may happen that an entire sub- tree ( regardless of relevance) is eliminated. These representatives of the Pesimistic Error Pruning (PEP ), which certainly brings good results in unseen items belongs.

Search method

When search techniques using different pruning methods for Vorwärtsabschneidung of search trees when the algorithm on the basis of collected data white ( or in the case of speculative pruning assumes ) that these subtrees the searched object is not included ( for example used in chess programs ).

Major pruning techniques for minimax or alpha-beta searches that (for example chess like ) can be used for solving two-person zero-sum games with perfect information, are for example:

  • Null move pruning
  • Verified null move pruning
  • Killer heuristic
  • History Heuristic

Pruning is also used in branch-and- bound algorithms in mathematical optimization. Here is a subtree of the search tree is not considered if the bound on the best possible solution in this subtree is worse than an already known solution.

Other areas

At Forum Software Pruning is a setting that causes the automatic deletion of old threads ( topics) to save space, reduce the CPU load and thereby increase the speed of the forum.

Credits

  • LA Breslow and DW Aha, Simplifying Decision Trees: A Survey, The Knowledge Engineering Review, Vol 12 (1 ), 1997, pp. 1-47.
  • JR Quinlan, Induction of Decision Trees, Machine Learning 1, Kluwer Academic Publishers, 1986, pp. 81-106.
  • Theoretical computer science
  • Algorithm
  • Computer Chess
663792
de