Page 108 - Data Science Algorithms in a Week
P. 108
Random Forest
Tree 15 votes for the class: Yes
Tree 16 votes for the class: Yes
Tree 17 votes for the class: No
Tree 18 votes for the class: No
Tree 19 votes for the class: No
The class with the maximum number of votes is 'Yes'. Thus the constructed
random forest classifies the feature ['Cold', 'None', '?'] into the class
'Yes'.
However, we should note that only 12 out of the 20 trees voted for the answer Yes. Thus just
as an ordinary decision tree could not decide the case, so here, although having a definite
answer, it may not be so certain. But unlike in decision trees where an answer was not
produced because of data inconsistency, here we have an answer.
Furthermore, by measuring the strength of the voting power for each individual class, we
can measure the level of the confidence that the answer is correct. In this case the feature
['Cold', 'None', '?'] belongs to the class Yes with the confidence of 12/20 or 60%. To
determine the level of certainty of the classification more precisely, even a larger ensemble
of random decision trees would be required.
Summary
A random forest is a set of decision trees where each tree is constructed from a sample
chosen randomly from the initial data. This process is called bootstrap aggregating. Its
purpose is to reduce variance and bias in the classification made by a random forest. The
bias is further reduced during a construction of a decision tree by considering only a
random subset of the variables for each branch of the tree.
Once a random forest is constructed, the result of the classification of a random forest is the
majority vote from among all the trees in a random forest. The level of the majority also
determines the amount of the confidence that the answer is correct.
Since a random forest consists of decision trees, it is good to use it for every problem where
a decision tree is a good choice. Since a random forest reduces bias and variance that exist in
a decision tree classifier, it outperforms a decision tree algorithm.
[ 96 ]