Page 95 - Data Science Algorithms in a Week
P. 95

Random Forest


                    ├── [swimming_suit=None]
                    │ ├── [water_temperature=Cold]
                    │ │ └── [swim=No]
                    │ └──[water_temperature=Warm]
                    │   └── [swim=No]
                    └── [swimming_suit=Good]
                      ├──  [water_temperature=Cold]
                      │ └── [swim=No]
                      └──  [water_temperature=Warm]
                        └── [swim=Yes]
                The total number of trees in the random forest=2.
                The maximum number of the variables considered at the node is m=3.


            Classification with random forest

            Because we use only a subset of the original data for the construction of the random
            decision tree, we may not have enough features to form a full tree that is able to classify
            every feature. In such cases, a tree will not return any class for a feature that should be
            classified. Therefore, we will only consider trees that classify a feature to some specific class.
            The feature we would like to classify is: ['Good', 'Cold', '?']. A random decision tree
            votes for the class to which it classifies a given feature using the same method to classify a
            feature as in the previous chapter on decision trees. Tree 0 votes for the class: No. Tree 1
            votes for the class: No. The class with the maximum number of votes is 'No'. Therefore, the
            constructed random forest classifies the feature ['Good', 'Cold', '?'] into the class
            'No'.



            Implementation of random forest algorithm

            We implement a random forest algorithm using a modified decision tree algorithm from the
            previous chapter. We also add an option to set a verbose mode within the program that can
            describe the whole process of how the algorithm works on a specific input- how a random
            forest is constructed with its random decision trees and how this constructed random forest
            is used to classify other features.

            The implementation of a random forest uses the construction of a decision tree from the
            previous chapter. A reader is encouraged to consult the function
            decision_tree.construct_general_tree from the previous chapter:

                # source_code/4/random_forest.py
                import math
                import random

                                                     [ 83 ]
   90   91   92   93   94   95   96   97   98   99   100