Page 95 - Data Science Algorithms in a Week
P. 95
Random Forest
├── [swimming_suit=None]
│ ├── [water_temperature=Cold]
│ │ └── [swim=No]
│ └──[water_temperature=Warm]
│ └── [swim=No]
└── [swimming_suit=Good]
├── [water_temperature=Cold]
│ └── [swim=No]
└── [water_temperature=Warm]
└── [swim=Yes]
The total number of trees in the random forest=2.
The maximum number of the variables considered at the node is m=3.
Classification with random forest
Because we use only a subset of the original data for the construction of the random
decision tree, we may not have enough features to form a full tree that is able to classify
every feature. In such cases, a tree will not return any class for a feature that should be
classified. Therefore, we will only consider trees that classify a feature to some specific class.
The feature we would like to classify is: ['Good', 'Cold', '?']. A random decision tree
votes for the class to which it classifies a given feature using the same method to classify a
feature as in the previous chapter on decision trees. Tree 0 votes for the class: No. Tree 1
votes for the class: No. The class with the maximum number of votes is 'No'. Therefore, the
constructed random forest classifies the feature ['Good', 'Cold', '?'] into the class
'No'.
Implementation of random forest algorithm
We implement a random forest algorithm using a modified decision tree algorithm from the
previous chapter. We also add an option to set a verbose mode within the program that can
describe the whole process of how the algorithm works on a specific input- how a random
forest is constructed with its random decision trees and how this constructed random forest
is used to classify other features.
The implementation of a random forest uses the construction of a decision tree from the
previous chapter. A reader is encouraged to consult the function
decision_tree.construct_general_tree from the previous chapter:
# source_code/4/random_forest.py
import math
import random
[ 83 ]