Page 79 - Data Science Algorithms in a Week
P. 79

Decision Trees


            Possible values for the attribute wind are none, breeze, strong. Thus we will partition the set
            S into the three partitions:

            S none ={(Warm,None,Sunny,Yes),(Hot,None,Sunny,No),(Cold,None,Sunny,Yes),(Warm,None,
            Cloudy,Yes)}

            S breeze ={(Hot,Breeze,Cloudy,Yes),(Warm,Breeze,Sunny,Yes),(Cold,Breeze,Cloudy,No)}
            S strong ={(Cold,Strong,Cloudy,No),(Warm,Strong,Cloudy,No),(Hot,Strong,Cloudy,Yes)}

            The information entropies of the sets are:

            E(S none )=0.81127812445

            E(S breeze )=0.91829583405

            E(S strong )=0.91829583405
            Thus, IG(S,wind)=E(S)-[(|S none |/|S|)*E(S none )+(|S breeze |/|S|)*E(S breeze )+(|S strong |/|S|)*E(S strong )]

            = 0.97095059445-[(4/10)*0.81127812445+(3/10)*0.91829583405+(3/10)*0.91829583405]

            = 0.09546184424
            Finally, the third attribute sunshine has two possible values, cloudy and sunny; thus, it
            partitions the set S into two sets:

            S cloudy ={(Cold,Strong,Cloudy,No),(Warm,Strong,Cloudy,No),(Hot,Breeze,Cloudy,Yes),
            (Cold,Breeze,Cloudy,No),(Hot,Strong,Cloudy,Yes),(Warm,None,Cloudy,Yes)}

            S sunny ={(Warm,None,Sunny,Yes),(Hot,None,Sunny,No),(Warm,Breeze,Sunny,Yes),
            (Cold,None,Sunny,Yes)}

            The entropies of the sets are:
            E(S cloudy )=1

            E(S sunny )=0.81127812445

            Thus, IG(S,sunshine)=E(S)-[(|S cloudy |/|S|)*E(S cloudy )+(|S sunny |/|S|)*E(S sunny )]

            =0.97095059445-[(6/10)*1+(4/10)*0.81127812445]=0.04643934467







                                                     [ 67 ]
   74   75   76   77   78   79   80   81   82   83   84