Page 137 - Understanding Machine Learning
P. 137
11. 2 Validation 119
with a rough grid of values for the parameter(s) and plot the corresponding model-
selection curve. On the basis of the curve we will zoom in to the correct regime
and employ a finer grid to search over. It is important to verify that we are in the
relevant regime. For example, in the polynomial fitting problem described, if we
start searching degrees from the set of values {1,10,20} and do not employ a finer
grid based on the resulting curve, we will end up with a rather poor model.
11.2.4 k-Fold Cross Validation
The validation procedure described so far assumes that data is plentiful and that we
have the ability to sample a fresh validation set. But in some applications, data is
scarce and we do not want to “waste” data on validation. The k-fold cross validation
technique is designed to give an accurate estimate of the true error without wasting
too much data.
In k-fold cross validation the original training set is partitioned into k subsets
(folds) of size m/k (for simplicity, assume that m/k is an integer). For each fold, the
algorithm is trained on the union of the other folds and then the error of its output
is estimated using the fold. Finally, the average of all these errors is the estimate of
the true error. The special case k = m, where m is the number of examples, is called
leave-one-out (LOO).
k-Fold cross validation is often used for model selection (or parameter tuning),
and once the best parameter is chosen, the algorithm is retrained using this param-
eter on the entire training set. A pseudocode of k-fold cross validation for model
selection is given in the following. The procedure receives as input a training set,
S, a set of possible parameter values, , an integer, k, representing the number of
folds, and a learning algorithm, A, which receives as input a training set as well as a
parameter θ ∈ . It outputs the best parameter as well as the hypothesis trained by
this parameter on the entire training set.
k-Fold Cro ss Validation for Model Selection
input:
training set S = (x 1 , y 1 ),...,(x m , y m )
set of parameter values
learning algorithm A
integer k
partition S into S 1 , S 2 ,..., S k
foreach θ ∈
for i = 1...k
h i,θ = A(S \ S i ;θ)
1 k
error(θ) = (h i,θ )
k i=1 L S i
output
θ = argmin [error(θ)]
θ
h θ = A(S;θ )
The cross validation method often works very well in practice. However, it might
sometime fail, as the artificial example given in Exercise 11.1 shows. Rigorously