Page 104 - Data Science Algorithms in a Week

P. 104

88 Fred K. Gruber

i.e., one particular model will always have the same performance throughout the run of
the genetic algorithm. At the same time, since we are doing 30 replications –each with a
different random split— we can get a good idea of the average performance as a function
of the generation for each of the variations of the genetic algorithm. Figure 11
summarizes this process in an activity diagram.
Table 3 lists the different combinations of parameters of the GA that were tested. It
was assumed that the performance of each parameter is independent of the others,
therefore, not every combination of parameter values were tested.

Table 3. Parameters of the genetic algorithm used for testing the different variations

Parameter Value
Population 10
Generations 20
Prob. of crossover 0.95
Prob. of mutation 0.05
Fitness function 10 fold crossvalidation
Selection 2-Tournament selection
Crossover types One point, two point, uniform, diagonal with 4 parents
Mutation type Fixed rate, dynamic rate, self-adaptive rate, feedback
Other Elitism, no elitism

After repeating the experiment 30 times we calculated the average for each
generation. A subset of 215 points is used for the experiments. This subset was obtained
in a stratified manner (the proportion of individuals of class 1 to class -1 was kept equal
to the original dataset) from individual number 2. The reduction of the number of points
is done to reduce the processing time.
In most cases, we are interested in comparing the performance measures at the 20
th
generation the genetic algorithms using different parameters. This comparison is made
using several statistical tests like 2 sample t test and best of k systems (Law and Kelton,
2000).

Effect of the Elitist Strategy

Figure 12 shows the effect of elitism when the genetic algorithm uses a one-point
crossover with crossover rate of 0.95 and simple mutation with mutation rate of 0.05.

99 100 101 102 103 104 105 106 107 108 109