Page 98 - Data Science Algorithms in a Week
P. 98

Random Forest


                            "'. Thus the constructed random forest classifies the " +
                            "feature " + str(feature) + " into the class '" +
                            str(common.dic_key_max_count(classification)) + "'.\n")
                # Program start
                csv_file_name = sys.argv[1]
                tree_count = int(sys.argv[2])
                verbose = int(sys.argv[3])

                (heading, complete_data, incomplete_data,
                 enquired_column) = common.csv_file_to_ordered_data(csv_file_name)
                m = choose_m(verbose, len(heading))
                random_forest = construct_random_forest(
                    verbose, heading, complete_data, enquired_column, m, tree_count)
                display_forest(verbose, random_forest)
                display_classification(verbose, random_forest, heading,
                                       enquired_column, incomplete_data)
            Input:

            As an input file to the implemented algorithm we provide the data from example Swim
            preference.

                # source_code/4/swim.csv
                swimming_suit,water_temperature,swim
                None,Cold,No
                None,Warm,No
                Small,Cold,No
                Small,Warm,No
                Good,Cold,No
                Good,Warm,Yes
                Good,Cold,?
            Output:

            We type the following command in the command line to get the output:

                $ python random_forest.py swim.csv 2 3 > swim.out
            2 means that we would like to construct 2 decision trees and 3 is the level of the verbosity of
            the program which includes detailed explanations of the construction of the random forest,
            the classification of the feature and the graph of the random forest. The last part >
            swim.out means that the output is written to the file swim.out. This file can be found in
            the chapter directory source_code/4. This output of the program was used above to write
            the analysis of Swim preference problem.





                                                     [ 86 ]
   93   94   95   96   97   98   99   100   101   102   103