Page 76 - Data Science Algorithms in a Week
P. 76
Decision Trees
Program output:
We construct a decision tree from the data file swim.csv with the verbosity set to 0. The
reader is encouraged to set the verbosity to 2 to see a detailed explanation how exactly the
decision tree is constructed:
$ python construct_decision_tree.py swim.csv 0
Root
├── [swimming_suit=Small]
│ ├── [water_temperature=Cold]
│ │ └── [swim=No]
│ └── [water_temperature=Warm]
│ └── [swim=No]
├── [swimming_suit=None]
│ ├── [water_temperature=Cold]
│ │ └── [swim=No]
│ └── [water_temperature=Warm]
│ └── [swim=No]
└── [swimming_suit=Good]
├── [water_temperature=Cold]
│ └── [swim=No]
└── [water_temperature=Warm]
└── [swim=Yes]
Classifying with a decision tree
Once we have constructed a decision tree from the data with the attributes A , ..., A and the
1
m
classes {c , ..., c }, we can use this decision tree to classify a new data item with the attributes
1
k
A , ..., A into one of the classes {c , ..., c }.
k
m
1
1
Given a new data item that we would like to classify, we can think of each node including
the root as a question for data sample: What value does that data sample for the selected
attribute A have? Then based on the answer, we select the branch of a decision tree and
i
move further to the next node. Then another question is answered about the data sample
and another until the data sample reaches the leaf node. A leaf node has an associated one
of the classes {c , ..., c } with it; for example, c . Then the decision tree algorithm would
i
1
k
classify the data sample into the class c . i
[ 64 ]