Page 67 - Data Science Algorithms in a Week

P. 67

Decision Trees

So the information entropy of the probability space of unbiased coin throws is:

E = -0.5 * log (0.5) - 0.5*log (0.5)=0.5+0.5=1
2
2
When the coin is based with 25% chance of a head and 75% change of a tail, then the
information entropy of such space is:

E = -0.25 * log (0.25) - 0.75*log (0.75) = 0.81127812445
2
2
which is less than 1. Thus, for example, if we had a large file with about 25% of 0 bits and
75% of 1 bits, a good compression tool should be able to compress it down to about 81.12%
of its size.

Information gain

The information gain is the amount of the information entropy gained as a result of a
certain procedure. For example, if we would like to know the results of three fair coins, then
its information entropy is 3. But if we could look at the third coin, then the information
entropy of the result for the remaining two coins would be 2. Thus, by looking at the third
coin, we gained one bit information, so the information gain is 1.
We may also gain the information entropy by dividing the whole set S into sets, grouping
them by a similar pattern. If we group elements by their value of an attribute A, then we
define the information gain as:

where S is a set with the elements of S that have the value v for the attribute A.
v

Swim preference - information gain calculation

Let us calculate the information gain for the six rows in the swim preference example by
taking swimming suit as an attribute. Because we are interested whether a given row of
data is classified as no or yes for the question whether one should swim, we will use the
swim preference to calculate the entropy and information gain. We partition the set S by the
attribute swimming suit:

S none ={(none,cold,no),(none,warm,no)}

[ 55 ]

62 63 64 65 66 67 68 69 70 71 72