Page 112 - Data Science Algorithms in a Week

P. 112

96 Fred K. Gruber

Other Implementations

Several Python and R implementations of GAs are available and we list a few of
them here.
In Python the package DEAP: Distributed Evolutionary Algorithms (Fortin et al.,
2012) in Python provides an extensive toolbox of genetic algorithms libraries that allows
rapid prototyping and testing of most of the ideas presented here. It also supports
parallelization and other evolutionary strategies like genetic programming and evolution
strategies.
Pyevolve is another package in python for genetic algorithms that implements many
of the representations and operators of classical genetic algorithms.
In R the GA package provides a general implementation of genetic algorithms able to
handle both discrete and continuous cases as well as constrained optimization problems.
It is also possible to create hybrid genetic algorithms to incorporate efficient local search
as well as parallelization either in a single machine with multiple cores or in multiple
machines.
There are also more specialized genetic algorithms implementations in R for very
specific applications. The “caret” package (Kuhn, 2008) provides a genetic algorithm
tailored towards supervised feature selection. The R package “gaucho” (Murison and
Wardell, 2014) uses a GA for analysing tumor heterogeneity from sequencing data and
“galgo” (Trevino and Falciani, 2006) uses GAs for variable selection for very large
dataset like for genomic datasets.

RESULTS

In this section, we compare the performance of the proposed algorithm in Figure 18
with several SVMs with arbitrarily selected kernel and parameters.
The experiments are performed with selected individuals of the previously mentioned
case study. The individuals were selected according to the worst performance as reported
in Rabelo (2000). All 648 data points were used in the experiments.
The generalization performance of the model constructed by the GAs was then
compared to the performance of a model constructed by arbitrarily selecting the kernel
and the kernel parameters. This method of selecting the model will be referred to from
now as the conventional way. In order to compare the different models, the 10-fold
crossvalidation was repeated 50 times using the same stream of random numbers. This is
akin to the common random number technique (Law and Kelton, 2000) to reduce
variance. Additionally, the best model from the conventional method was compared with
the model created by the GA by a paired t test to determine if the difference was
significant.

107 108 109 110 111 112 113 114 115 116 117