Page 76 - Data Science Algorithms in a Week
P. 76

Decision Trees


            Program output:

            We construct a decision tree from the data file swim.csv with the verbosity set to 0. The
            reader is encouraged to set the verbosity to 2 to see a detailed explanation how exactly the
            decision tree is constructed:

                $ python construct_decision_tree.py swim.csv 0
                Root
                ├── [swimming_suit=Small]
                │ ├── [water_temperature=Cold]
                │ │ └── [swim=No]
                │ └── [water_temperature=Warm]
                │ └── [swim=No]
                ├── [swimming_suit=None]
                │ ├── [water_temperature=Cold]
                │ │ └── [swim=No]
                │ └── [water_temperature=Warm]
                │ └── [swim=No]
                └── [swimming_suit=Good]
                    ├── [water_temperature=Cold]
                    │ └── [swim=No]
                    └── [water_temperature=Warm]
                        └── [swim=Yes]



            Classifying with a decision tree

            Once we have constructed a decision tree from the data with the attributes A , ..., A  and the
                                                                                   1
                                                                                         m
            classes {c , ..., c }, we can use this decision tree to classify a new data item with the attributes
                     1
                          k
            A , ..., A  into one of the classes {c , ..., c }.
                                                k
                    m
                                           1
              1
            Given a new data item that we would like to classify, we can think of each node including
            the root as a question for data sample: What value does that data sample for the selected
            attribute A  have? Then based on the answer, we select the branch of a decision tree and
                      i
            move further to the next node. Then another question is answered about the data sample
            and another until the data sample reaches the leaf node. A leaf node has an associated one
            of the classes {c , ..., c } with it; for example, c . Then the decision tree algorithm would
                                                      i
                           1
                                k
            classify the data sample into the class c . i






                                                     [ 64 ]
   71   72   73   74   75   76   77   78   79   80   81