Page 79 - Data Science Algorithms in a Week
P. 79
Decision Trees
Possible values for the attribute wind are none, breeze, strong. Thus we will partition the set
S into the three partitions:
S none ={(Warm,None,Sunny,Yes),(Hot,None,Sunny,No),(Cold,None,Sunny,Yes),(Warm,None,
Cloudy,Yes)}
S breeze ={(Hot,Breeze,Cloudy,Yes),(Warm,Breeze,Sunny,Yes),(Cold,Breeze,Cloudy,No)}
S strong ={(Cold,Strong,Cloudy,No),(Warm,Strong,Cloudy,No),(Hot,Strong,Cloudy,Yes)}
The information entropies of the sets are:
E(S none )=0.81127812445
E(S breeze )=0.91829583405
E(S strong )=0.91829583405
Thus, IG(S,wind)=E(S)-[(|S none |/|S|)*E(S none )+(|S breeze |/|S|)*E(S breeze )+(|S strong |/|S|)*E(S strong )]
= 0.97095059445-[(4/10)*0.81127812445+(3/10)*0.91829583405+(3/10)*0.91829583405]
= 0.09546184424
Finally, the third attribute sunshine has two possible values, cloudy and sunny; thus, it
partitions the set S into two sets:
S cloudy ={(Cold,Strong,Cloudy,No),(Warm,Strong,Cloudy,No),(Hot,Breeze,Cloudy,Yes),
(Cold,Breeze,Cloudy,No),(Hot,Strong,Cloudy,Yes),(Warm,None,Cloudy,Yes)}
S sunny ={(Warm,None,Sunny,Yes),(Hot,None,Sunny,No),(Warm,Breeze,Sunny,Yes),
(Cold,None,Sunny,Yes)}
The entropies of the sets are:
E(S cloudy )=1
E(S sunny )=0.81127812445
Thus, IG(S,sunshine)=E(S)-[(|S cloudy |/|S|)*E(S cloudy )+(|S sunny |/|S|)*E(S sunny )]
=0.97095059445-[(6/10)*1+(4/10)*0.81127812445]=0.04643934467
[ 67 ]