Page 98 - Data Science Algorithms in a Week

P. 98

Random Forest

"'. Thus the constructed random forest classifies the " +
"feature " + str(feature) + " into the class '" +
str(common.dic_key_max_count(classification)) + "'.\n")
# Program start
csv_file_name = sys.argv[1]
tree_count = int(sys.argv[2])
verbose = int(sys.argv[3])

(heading, complete_data, incomplete_data,
enquired_column) = common.csv_file_to_ordered_data(csv_file_name)
m = choose_m(verbose, len(heading))
random_forest = construct_random_forest(
verbose, heading, complete_data, enquired_column, m, tree_count)
display_forest(verbose, random_forest)
display_classification(verbose, random_forest, heading,
enquired_column, incomplete_data)
Input:

As an input file to the implemented algorithm we provide the data from example Swim
preference.

# source_code/4/swim.csv
swimming_suit,water_temperature,swim
None,Cold,No
None,Warm,No
Small,Cold,No
Small,Warm,No
Good,Cold,No
Good,Warm,Yes
Good,Cold,?
Output:

We type the following command in the command line to get the output:

$ python random_forest.py swim.csv 2 3 > swim.out
2 means that we would like to construct 2 decision trees and 3 is the level of the verbosity of
the program which includes detailed explanations of the construction of the random forest,
the classification of the feature and the graph of the random forest. The last part >
swim.out means that the output is written to the file swim.out. This file can be found in
the chapter directory source_code/4. This output of the program was used above to write
the analysis of Swim preference problem.

[ 86 ]

93 94 95 96 97 98 99 100 101 102 103