Page 102 - Data Science Algorithms in a Week
P. 102
86 Fred K. Gruber
the different classifiers that are obtained for the different value of the parameters. As
indicated previously, several methods try to estimate the generalization error of a
classifier. Contrary to other applications of GAs, the objective function in this problem is
a random variable with associated variance and it is computationally expensive since it
involves training a learning algorithm. In order to decide which method to use, we
developed several experiments in order to find the estimator with the lowest variance.
The results are summarized in Table 2.
The hold out technique had the highest standard deviation. Stratifying the method,
i.e., keeping the same ratio between classes in the training and testing set slightly reduced
the standard deviation. All crossvalidation estimates had a significantly lower standard
deviation than the hold out technique.
Table 2. Mean and standard deviation of different types of
generalization error estimates
Technique Mean (%) Standard Deviation (%)
10 fold Stratified Modified Crossvalidation 86.830 0.461
Modified Crossvalidation 86.791 0.463
Stratified Crossvalidation 86.681 0.486
Crossvalidation 86.617 0.496
5 fold Stratified Modified Crossvalidation 86.847 0.540
5 fold Stratified Crossvalidation 86.567 0.609
5 fold Crossvalidation 86.540 0.629
Stratified hold out 86.215 1.809
Hold out 86.241 1.977
Since there is no statistically significant difference in the standard deviation between
the different crossvalidation techniques, we use one of the most common: 10-fold
crossvalidation.
We also considered an approximation of the leave-one-out estimator that was
proposed in Joachims (1999) but we found that the estimated error diverged from the
crossvalidation estimates for large values of the parameter C . This behaviour was also
observed in the work of Duan et al. (2003).
Crossover, Selection, and Mutation
Several crossover operators are tested: one point, two point, uniform, and multiparent
diagonal.