CHAID ( Chi -square Automatic Interaction Detector ) is an algorithm, which is used for decision making. It is used in the construction of decision trees.
The CHAID algorithm was first time in 1964 by JA Sonquist and J.N. Morgan published and is the oldest of the current decision tree algorithms. Anderberg 1973 describes him. J. A. Hartigan 1975 specifies an implementation.
The main difference of CHAID to CART and C4.5 is that the CHAID algorithm stops the growth of the tree before the tree is grown too large. The tree is not grown so desired, in order afterwards to trim it back with a pruning method. Another difference is that CHAID with categorically scaled variables such as color (red, yellow, green) or evaluation (good, average, poor ) rather than working with metric scaled variables, such as height in inches.
For the choice of the attributes here the chi-square test of independence is used. CHAIDs be used when a statement about the dependence of two variables must be made. For this purpose, a measure of the chi-square distance is calculated. The following applies: The higher the ratio, the greater the dependence of the variables under consideration. The variable with the largest chi-square distance to the target is considered as an attribute selection. To increase the quality of separation, can here - as well as the C4.5 algorithm - more than two branches are made per node. This has the result that the generated trees are more compact than the carts. The same method is used to determine the best subdivisions. Since these decision trees, all possible combinations must be evaluated by manifestations, it may in large amounts of data lead to runtime problems. Therefore, it is advantageous if the numeric variables into variables with categorical forms are converted, even though this means an additional expense. For this, the result should be better quality.