Page 113 - Data Science Algorithms in a Week
P. 113

Evolutionary Optimization of Support Vector Machines …          97

                          The model created by the genetic algorithms had the parameters shown in Table 6.

                                      Table 6. Best model found by the genetic algorithm

                        Dataset                   C            Degree        p             r
                        Ind7          451.637      959.289      2             0.682536      1
                        Ind10         214.603      677.992      2             0.00968948    1
                        Ind100        479.011      456.25       2             0.428016      1

                          Interestingly, for 2 datasets (ind7 and ind100) the chosen kernel was a mixture of
                       Gaussian and polynomial kernel.
                          For the conventional method, the kernel is arbitrarily set to Gaussian and the penalty
                                                                
                       value  C   was set to 50 while the kernel width   is varied to 0.1, 0.5, 1, 10, and 50. The
                       average generalization error after the 50 replications for 3 individuals from the case study
                       is  shown in  Table  7  and  Table  8  and  the  Tufte’s  boxplot  (Tufte,  1983)  are  shown  in
                       Figure 20-Figure 22 where we compare the percentage of misclassification.

                             Table 7. Performance of models created using the conventional method

                        Kernel width ( )   Ind7               Ind10               Ind100

                        0.1                23.9168             24.3358             24.1783
                        0.5                30.5086             29.8396             30.4063
                        1                  29.0546             28.4365             29.2966
                        10                 30.3981             46.2980             38.2692
                        50                 30.3981             46.2980             38.2692

                               Table 8. Performance of model created using the genetic algorithm

                                           Ind7                Ind10               Ind100
                        GA                 22.0025             21.8491             21.9937

                          The results of a paired t-test of the difference between the performance of best model
                       using the conventional method and the model constructed by the genetic algorithms show
                       that the difference in performance is statistically significant at the 95% level.
                          These experiments show that using genetic algorithms are an effective way to find a
                       good  set  of  parameters  for  support  vector  machines.  This  method  will  become
                       particularly important as more complex kernels with more parameters are designed.
                          Additional experiments including a comparison with neural networks can be found in
                       Gruber (2004).
   108   109   110   111   112   113   114   115   116   117   118