Page 104 - Data Science Algorithms in a Week
P. 104

88                              Fred K. Gruber

                       i.e., one particular model will always have the same performance throughout the run of
                       the genetic algorithm. At the same time, since we are doing 30 replications –each with a
                       different random split— we can get a good idea of the average performance as a function
                       of  the  generation  for  each  of  the  variations  of  the  genetic  algorithm.  Figure  11
                       summarizes this process in an activity diagram.
                          Table 3 lists the different combinations of parameters of the GA that were tested. It
                       was  assumed  that  the  performance  of  each  parameter  is  independent  of  the  others,
                       therefore, not every combination of parameter values were tested.

                       Table 3. Parameters of the genetic algorithm used for testing the different variations

                        Parameter            Value
                        Population           10
                        Generations          20
                        Prob. of crossover   0.95
                        Prob. of mutation    0.05
                        Fitness function     10 fold crossvalidation
                        Selection            2-Tournament selection
                        Crossover types      One point, two point, uniform, diagonal with 4 parents
                        Mutation type        Fixed rate, dynamic rate, self-adaptive rate, feedback
                        Other                Elitism, no elitism

                          After  repeating  the  experiment  30  times  we  calculated  the  average  for  each
                       generation. A subset of 215 points is used for the experiments. This subset was obtained
                       in a stratified manner (the proportion of individuals of class 1 to class -1 was kept equal
                       to the original dataset) from individual number 2. The reduction of the number of points
                       is done to reduce the processing time.
                          In most cases, we are interested in comparing the performance measures at the 20
                                                                                                     th
                       generation the genetic algorithms using different parameters. This comparison is made
                       using several statistical tests like 2 sample t test and best of k systems (Law and Kelton,
                       2000).

                       Effect of the Elitist Strategy

                          Figure 12 shows the effect of elitism when the genetic algorithm uses a one-point
                       crossover with crossover rate of 0.95 and simple mutation with mutation rate of 0.05.
   99   100   101   102   103   104   105   106   107   108   109