Page 110 - Data Science Algorithms in a Week

P. 110

94 Fred K. Gruber

Final Variation of the GA

Based on the results of the previous experiments, we selected the parameters shown
in Table 5.

Table 5. Parameters in the final genetic algorithm

Parameters Value
Population 10
Generations 20
Prob. of crossover 0.95
Prob. of mutation 0.05
Fitness function 10-fold crossvalidation
Selection 2-Tournament selection
Crossover types Diagonal with 4 parents
Mutation type Fixed rate
Others Elitist strategy

The activity diagram of the final genetic algorithm is shown in Figure 18. The most
important difference between this final model and the one used in the previous section is
related to the random split of the data. Instead of using only one split of the data for the
complete run of the GA, every time the fitness of the population is calculated, we use a
different random split (see Figure 19).
As a result, all individuals at a particular generation are measured under the same
conditions. Using only one random split throughout the whole run of the GA carries the
danger that the generalization error estimate for one particular model may be higher than
for other models because of the particular random selection and not because it was really
better in general. Using a different random split before calculating the fitness of every
individual carries the same danger: an apparent difference in performance may be due to
the particular random order and not due to the different value of the parameters.
While repeating the estimate several times and getting an average would probably
improve the estimate, the increase in computational requirements makes this approach
prohibitive. For example, if we have 10 individuals and we use 10 fold crossvalidation
we would have to do 100 trainings per generation. If in addition, we repeat every estimate
10 times to get an average we would have to do 1000 trainings. Clearly, for real world
problems this is not a good solution.
Using the same random split in each generation has an interesting analogy with
natural evolution. In nature the environment (represented by a fitness function in GAs) is
likely to vary with time, however, at any particular time all individuals are competing
under the same conditions.

105 106 107 108 109 110 111 112 113 114 115