Page 95 - Data Science Algorithms in a Week

P. 95

Random Forest

├── [swimming_suit=None]
│ ├── [water_temperature=Cold]
│ │ └── [swim=No]
│ └──[water_temperature=Warm]
│ └── [swim=No]
└── [swimming_suit=Good]
├── [water_temperature=Cold]
│ └── [swim=No]
└── [water_temperature=Warm]
└── [swim=Yes]
The total number of trees in the random forest=2.
The maximum number of the variables considered at the node is m=3.

Classification with random forest

Because we use only a subset of the original data for the construction of the random
decision tree, we may not have enough features to form a full tree that is able to classify
every feature. In such cases, a tree will not return any class for a feature that should be
classified. Therefore, we will only consider trees that classify a feature to some specific class.
The feature we would like to classify is: ['Good', 'Cold', '?']. A random decision tree
votes for the class to which it classifies a given feature using the same method to classify a
feature as in the previous chapter on decision trees. Tree 0 votes for the class: No. Tree 1
votes for the class: No. The class with the maximum number of votes is 'No'. Therefore, the
constructed random forest classifies the feature ['Good', 'Cold', '?'] into the class
'No'.

Implementation of random forest algorithm

We implement a random forest algorithm using a modified decision tree algorithm from the
previous chapter. We also add an option to set a verbose mode within the program that can
describe the whole process of how the algorithm works on a specific input- how a random
forest is constructed with its random decision trees and how this constructed random forest
is used to classify other features.

The implementation of a random forest uses the construction of a decision tree from the
previous chapter. A reader is encouraged to consult the function
decision_tree.construct_general_tree from the previous chapter:

# source_code/4/random_forest.py
import math
import random

[ 83 ]

90 91 92 93 94 95 96 97 98 99 100