Page 67 - Data Science Algorithms in a Week
P. 67

Decision Trees


            So the information entropy of the probability space of unbiased coin throws is:

            E = -0.5 * log (0.5) - 0.5*log (0.5)=0.5+0.5=1
                                    2
                       2
            When the coin is based with 25% chance of a head and 75% change of a tail, then the
            information entropy of such space is:

            E = -0.25 * log (0.25) - 0.75*log (0.75) = 0.81127812445
                                       2
                         2
            which is less than 1. Thus, for example, if we had a large file with about 25% of 0 bits and
            75% of 1 bits, a good compression tool should be able to compress it down to about 81.12%
            of its size.



            Information gain

            The information gain is the amount of the information entropy gained as a result of a
            certain procedure. For example, if we would like to know the results of three fair coins, then
            its information entropy is 3. But if we could look at the third coin, then the information
            entropy of the result for the remaining two coins would be 2. Thus, by looking at the third
            coin, we gained one bit information, so the information gain is 1.
            We may also gain the information entropy by dividing the whole set S into sets, grouping
            them by a similar pattern. If we group elements by their value of an attribute A, then we
            define the information gain as:








            where S  is a set with the elements of S that have the value v for the attribute A.
                    v


            Swim preference - information gain calculation

            Let us calculate the information gain for the six rows in the swim preference example by
            taking swimming suit as an attribute. Because we are interested whether a given row of
            data is classified as no or yes for the question whether one should swim, we will use the
            swim preference to calculate the entropy and information gain. We partition the set S by the
            attribute swimming suit:

            S none ={(none,cold,no),(none,warm,no)}



                                                     [ 55 ]
   62   63   64   65   66   67   68   69   70   71   72