Page 68 - Data Science Algorithms in a Week
P. 68
Decision Trees
S small ={(small,cold,no),(small,warm,no)}
S good ={(good,cold,no),(good,warm,yes)}
The information entropy of S is E(S)=-(1/6)*log (1/6)-(5/6)*log (5/6)~0.65002242164.
2
2
The information entropy of the partitions is:
E(S none )=-(2/2)*log (2/2)=-log (1)=0 since all instances have the class no.
2
2
E(S small )=0 for a similar reason.
E(S good )=-(1/2)*log (1/2)=1
2
Therefore, the information gain is:
IG(S,swimming suit)=E(S)-[(2/6)*E(S none )+(2/6)*E(S small )+(2/6)*E(S good )]
=0.65002242164-(1/3)=0.3166890883
If we chose the attribute water temperature to partition the set S, what would be the
information gain IG(S,water temperature)? The water temperature partitions the set S into
the following sets:
S ={(none,cold,no),(small,cold,no),(good,cold,no)}
cold
S warm ={(none,warm,no),(small,warm,no),(good,warm,yes)}
Their entropies are:
E(S )=0 as all instances are classified as no.
cold
E(S warm )=-(2/3)*log (2/3)-(1/3)*log (1/3)~0.91829583405
2
2
Therefore, the information gain from partitioning the set S by the attribute water
temperature is:
IG(S,water temperature)=E(S)-[(1/2)*E(S )+(1/2)*E(S warm )]
cold
= 0.65002242164-0.5*0.91829583405=0.19087450461
This is less than IG(S,swimming suit). Therefore, we can gain more information about the set
S (the classification of its instances) by partitioning it per the attribute swimming suit
instead of the attribute water temperature. This finding will be the basis of the ID3
algorithm constructing a decision tree in the next section.
[ 56 ]