Page 112 - Data Science Algorithms in a Week
P. 112

96                              Fred K. Gruber

                       Other Implementations

                          Several  Python and  R  implementations  of  GAs  are available  and  we  list  a  few of
                       them here.
                          In  Python  the  package  DEAP:  Distributed  Evolutionary  Algorithms  (Fortin  et  al.,
                       2012) in Python provides an extensive toolbox of genetic algorithms libraries that allows
                       rapid  prototyping  and  testing  of  most  of  the  ideas  presented  here.  It  also  supports
                       parallelization and other evolutionary strategies like genetic programming and evolution
                       strategies.
                          Pyevolve is another package in python for genetic algorithms that implements many
                       of the representations and operators of classical genetic algorithms.
                          In R the GA package provides a general implementation of genetic algorithms able to
                       handle both discrete and continuous cases as well as constrained optimization problems.
                       It is also possible to create hybrid genetic algorithms to incorporate efficient local search
                       as well as parallelization either in a single machine with multiple cores or in multiple
                       machines.
                          There  are  also  more  specialized  genetic  algorithms  implementations in  R  for  very
                       specific  applications.  The  “caret”  package  (Kuhn,  2008)  provides  a  genetic  algorithm
                       tailored  towards  supervised  feature  selection.  The  R  package  “gaucho”  (Murison  and
                       Wardell, 2014) uses a GA for analysing tumor heterogeneity from sequencing data and
                       “galgo”  (Trevino  and  Falciani,  2006)  uses  GAs  for  variable  selection  for  very  large
                       dataset like for genomic datasets.


                                                         RESULTS

                          In this section, we compare the performance of the proposed algorithm in Figure 18
                       with several SVMs with arbitrarily selected kernel and parameters.
                          The experiments are performed with selected individuals of the previously mentioned
                       case study. The individuals were selected according to the worst performance as reported
                       in Rabelo (2000). All 648 data points were used in the experiments.
                          The  generalization  performance  of  the  model  constructed  by  the  GAs  was  then
                       compared to the performance of a model constructed by arbitrarily selecting the kernel
                       and the kernel parameters. This method of selecting the model will be referred to from
                       now  as  the  conventional  way.  In  order  to  compare  the  different  models,  the  10-fold
                       crossvalidation was repeated 50 times using the same stream of random numbers. This is
                       akin  to  the  common  random  number  technique  (Law  and  Kelton,  2000)  to  reduce
                       variance. Additionally, the best model from the conventional method was compared with
                       the  model  created  by  the  GA  by  a  paired  t  test  to  determine  if  the  difference  was
                       significant.
   107   108   109   110   111   112   113   114   115   116   117