Page 108 - Data Science Algorithms in a Week
P. 108

Random Forest


                Tree 15 votes for the class: Yes
                Tree 16 votes for the class: Yes
                Tree 17 votes for the class: No
                Tree 18 votes for the class: No
                Tree 19 votes for the class: No
                The class with the maximum number of votes is 'Yes'. Thus the constructed
                random forest classifies the feature ['Cold', 'None', '?'] into the class
                'Yes'.

            However, we should note that only 12 out of the 20 trees voted for the answer Yes. Thus just
            as an ordinary decision tree could not decide the case, so here, although having a definite
            answer, it may not be so certain. But unlike in decision trees where an answer was not
            produced because of data inconsistency, here we have an answer.

            Furthermore, by measuring the strength of the voting power for each individual class, we
            can measure the level of the confidence that the answer is correct. In this case the feature
            ['Cold', 'None', '?'] belongs to the class Yes with the confidence of 12/20 or 60%. To
            determine the level of certainty of the classification more precisely, even a larger ensemble
            of random decision trees would be required.




            Summary

            A random forest is a set of decision trees where each tree is constructed from a sample
            chosen randomly from the initial data. This process is called bootstrap aggregating. Its
            purpose is to reduce variance and bias in the classification made by a random forest. The
            bias is further reduced during a construction of a decision tree by considering only a
            random subset of the variables for each branch of the tree.

            Once a random forest is constructed, the result of the classification of a random forest is the
            majority vote from among all the trees in a random forest. The level of the majority also
            determines the amount of the confidence that the answer is correct.

            Since a random forest consists of decision trees, it is good to use it for every problem where
            a decision tree is a good choice. Since a random forest reduces bias and variance that exist in
            a decision tree classifier, it outperforms a decision tree algorithm.













                                                     [ 96 ]
   103   104   105   106   107   108   109   110   111   112   113