Page 78 - Data Science Algorithms in a Week
P. 78

Decision Trees


             Hot           Strong Cloudy     Yes

             Warm          None   Cloudy     Yes
             Warm          Strong Sunny      ?
            We would like to find out if our friend would like to play chess with us outside in the park.
            But this time, we would like to use decision trees to find the answer.

            Analysis:
            We have the initial set S of the data samples as:

                S={(Cold,Strong,Cloudy,No),(Warm,Strong,Cloudy,No),(Warm,None,Sunny,Yes),
                (Hot,None,Sunny,No),(Hot,Breeze,Cloudy,Yes),(Warm,Breeze,Sunny,Yes),(Cold,B
                reeze,Cloudy,No),(Cold,None,Sunny,Yes),(Hot,Strong,Cloudy,Yes),(Warm,None,C
                loudy,Yes)}

            First we determine the information gain for each of the three non-classifying attributes:
            temperature, wind, and sunshine. Possible values for temperature are cold, warm, and hot.
            Therefore, we will partition the set S into the three sets:

                S cold ={(Cold,Strong,Cloudy,No),(Cold,Breeze,Cloudy,No),(Cold,None,Sunny,Yes)}
                S warm ={(Warm,Strong,Cloudy,No),(Warm,None,Sunny,Yes),(Warm,Breeze,Sunny,Yes),
                (Warm,None,Cloudy,Yes)}
                S hot ={(Hot,None,Sunny,No),(Hot,Breeze,Cloudy,Yes),(Hot,Strong,Cloudy,Yes)}
            We calculate the information entropies for the sets first:

            E(S)=-(4/10)*log (4/10)-(6/10)*log (6/10)=0.97095059445
                          2
                                         2
            E(S )=-(2/3)*log (2/3)-(1/3)*log (1/3)=0.91829583405
                           2
                                        2
               cold
            E(S warm )=-(1/4)*log (1/4)-(3/4)*log (3/4)=0.81127812445
                            2
                                         2
            E(S )=-(1/3)*log (1/3)-(2/3)*log (2/3)=0.91829583405
                                        2
               hot
                           2
            Thus, IG(S,temperature)=E(S)-[(|S |/|S|)*E(S )+(|S warm |/|S|)*E(S warm )+(|S |/|S|)*E(S )]
                                                                                            hot
                                                      cold
                                                                                 hot
                                           cold
            =0.97095059445-[(3/10)*0.91829583405+(4/10)*0.81127812445+(3/10)*0.91829583405]
            =0.09546184424






                                                     [ 66 ]
   73   74   75   76   77   78   79   80   81   82   83