ID3 algorithm

ID3 (Iterative Dichotomiser 3) is an algorithm that is used for decision making. It is used in decision trees.

The Australian researchers J. Ross Quinlan published this algorithm first time in 1986. ID3 was very influential in its early years. He finds even today in some products use. ID3 is considered as the predecessor of C4.5 algorithm.

ID3 is used when large amount of data in many different attributes are important and therefore a decision tree should be generated without much calculations. Thus most simple decision trees arise. However, it can not be guaranteed that there is no better trees would be possible.

The basic structure of ID3 is iterative. It can be calculated for every not yet used attribute entropies with respect to the training set. The attribute with the highest information gain (English: information gain) or the smallest entropy, is chosen and generates a new tree node. The process terminates when all training instances have been classified, ie when every leaf node is assigned a classification.


If all the elements of T ( data) belong to a class

Selection of Characteristics

For the formation of the sub-trees corresponding to each of the information theory, the most informative feature is selected.

Be the set of characteristics with their respective classification, the attribute to be tested from the set of available attributes, the set of possible attribute values ​​and from the subset of, for which the attribute has the value. The profit that is achieved by selection of the feature is calculated as follows then:


Finally, one chooses an attribute with the greatest possible profits from the crowd as the next attribute.

This choice leads to a preference of features with many choices and thus in a broad tree. Can counteract the normalization of the number of choices can be performed.