Page 137 - Understanding Machine Learning
P. 137

11. 2  Validation  119


              with a rough grid of values for the parameter(s) and plot the corresponding model-
              selection curve.  On the basis of the curve we will zoom in to the correct regime
              and employ a finer grid to search over. It is important to verify that we are in the
              relevant regime.  For example, in the polynomial fitting problem described,  if we
              start searching degrees from the set of values {1,10,20} and do not employ a finer
              grid based on the resulting curve, we will end up with a rather poor model.



              11.2.4  k-Fold  Cross  Validation
              The validation procedure described so far assumes that data is plentiful and that we
              have the ability to sample a fresh validation set. But in some applications, data is
              scarce and we do not want to “waste” data on validation. The k-fold cross validation
              technique is designed to give an accurate estimate of the true error without wasting
              too much data.
                 In k-fold cross validation the original training set is partitioned into k subsets



              (folds) of size m/k (for simplicity, assume that m/k is an integer). For each fold, the
              algorithm is trained on the union of the other folds and then the error of its output
              is estimated using the fold. Finally, the average of all these errors is the estimate of
              the true error. The special case k = m, where m is the number of examples, is called


              leave-one-out (LOO).
                 k-Fold cross validation is often used for model selection (or parameter tuning),
              and once the best parameter is chosen, the algorithm is retrained using this param-
              eter on the entire training set. A pseudocode of k-fold cross validation for model
              selection is given in the following. The procedure receives as input a training set,
              S, a set of possible parameter values,  , an integer, k, representing the number of
              folds, and a learning algorithm, A, which receives as input a training set as well as a
              parameter θ ∈  . It outputs the best parameter as well as the hypothesis trained by
              this parameter on the entire training set.

                             k-Fold  Cro ss  Validation  for  Model  Selection
                       input:
                          training set S = (x 1 , y 1 ),...,(x m , y m )
                          set of parameter values
                          learning algorithm A
                          integer k
                       partition S into S 1 , S 2 ,..., S k
                       foreach θ ∈
                          for i = 1...k
                             h i,θ = A(S \ S i ;θ)
                                    1    k
                          error(θ) =         (h i,θ )
                                    k  i=1  L S i
                       output

                         θ = argmin [error(θ)]
                                   θ

                         h θ = A(S;θ )

                 The cross validation method often works very well in practice. However, it might
              sometime fail,  as the artificial example given in Exercise 11.1 shows.  Rigorously
   132   133   134   135   136   137   138   139   140   141   142