Page 69 - Data Science Algorithms in a Week

P. 69

Decision Trees

ID3 algorithm - decision tree construction

The ID3 algorithm constructs a decision tree from the data based on the information gain. In
the beginning, we start with the set S. The data items in the set S have various properties
according to which we can partition the set S. If an attribute A has the values {v , ..., v }, then
n
1
we partition the set S into the sets S , ..., S , where the set S is a subset of the set S, where the
1
n
i
elements have the value v for the attribute A.
i
If each element in the set S has attributes A , ..., A , then we can partition the set S according
m
1
to any of the possible attributes. The ID3 algorithm partitions the set S according to the
attribute that yields the highest information gain. Now suppose that it was attribute A .
1
Then for the set S we have the partitions S , ..., S , where A has the possible values {v ,..., v }.
1
n
1
n
1
Since we have not constructed any tree yet, we first place a root node into the tree. For
every partition of S, we place a new branch from the root. Every branch represents one
value of the selected attributes. A branch has data samples with the same value for that
attribute. For every new branch, we can define a new node that will have data samples from
its ancestor branch.
Once we have defined a new node, we choose another of the remaining attributes with the
highest information gain for the data at that node to partition the data at that node further,
then define new branches and nodes. This process can be repeated until we run out of all
the attributes for the nodes or even earlier until all the data at the node has the same class of
our interest. In the case of the swim preference example, there are only two possible classes
for the swimming preference: class no and class yes. The last node is called a leaf node and
decides the class of a data item from the data.

Swim preference - decision tree construction by
ID3 algorithm

Here we describe, step by step, how an ID3 algorithm would construct a decision tree from
the given data samples in the swim preference example. The initial set consists of six data
samples:

S={(none,cold,no),(small,cold,no),(good,cold,no),(none,warm,no),(small,warm
,no),(good,warm,yes)}

[ 57 ]

64 65 66 67 68 69 70 71 72 73 74