Page 88 - Data Science Algorithms in a Week
P. 88
Random Forest
Overview of random forest algorithm
General considerations, to begin with, we choose the number of the trees that we are going
to construct for a decision forest. A random forest does not tend to overfit (unless the data is
very noisy), so choosing many decision trees will not decrease the accuracy of the
prediction. However, the more decision trees, the more computational power is required.
Also, increasing the number of the decision trees in the forest dramatically, does not
increase the accuracy of the classification much. It is important that we have sufficiently
many decision trees so that most of the data is used for the classification when chosen
randomly for the construction of a decision tree.
In practice, one can run the algorithm on a specific number of decision trees, increase their
number, and compare the results of the classification of a smaller and a bigger forest. If the
results are very similar, then there is no reason to increase the number of trees.
To simplify demonstration, in this book, we will use a small number of decision trees in a
random forest.
Overview of random forest construction
We will describe how each tree is constructed in a random fashion. Given N training
features, for the construction of each decision tree, we provide the data by selecting N
features randomly with replacement from the initial data for the random forest. This
process of selecting the data randomly with replacement for each tree is called bootstrap
aggregating or tree bagging. The purpose of bootstrap aggregating is to reduce the variance
and bias in the results of the classification.
Say a feature has M variables that are used to classify the feature using the decision tree.
When we must make a branching decision at a node, in the ID3 algorithm we choose the
variable that resulted in the highest information gain. Here in a random decision tree at
each node, we consider only at most m (which is at most M) variables (we do not consider
the ones that were already chosen) sampled in a random fashion without the replacement
from the given M variables. Then out of these m variables, we choose the one that results in
the highest information gain.
The rest of the construction of a random decision tree is carried out just as it was for a
decision tree in the previous chapter.
[ 76 ]