Page 140 - Understanding Machine Learning
P. 140

Model Selection and Validation
           122

                           Error                       Error
                                        Validation error            Validation error





                                         Train error                  Train error
                                                   m                           m

                 Figure 11.1. Examples of learning curves. Left: This learning curve corresponds to the
                 scenario in which the number of examples is always smaller than the VC dimension of the
                 class. Right: This learning curve corresponds to the scenario in which the approximation
                 error is zero and the number of examples is larger than the VC dimension of the class.

                    Next consider the case in which L S (h S ) is small. As we argued before, this does
                 not necessarily imply that the approximation error is small. Indeed, consider two
                 scenarios, in both of which we are trying to learn a hypothesis class of VC-dimension
                 d using the ERM learning rule. In the first scenario, we have a training set of m < d
                 examples and the approximation error of the class is high. In the second scenario,
                 we have a training set of m > 2d examples and the approximation error of the class
                 is zero. In both cases L S (h S ) = 0. How can we distinguish between the two cases?

                 Learning Curves
                 One possible way to distinguish between the two cases is by plotting learning curves.
                 To produce a learning curve we train the algorithm on prefixes of the data of increas-
                 ing sizes. For example, we can first train the algorithm on the first 10% of the
                 examples, then on 20% of them, and so on. For each prefix we calculate the training
                 error (on the prefix the algorithm is being trained on) and the validation error (on
                 a predefined validation set). Such learning curves can help us distinguish between
                 the two aforementioned scenarios. In the first scenario we expect the validation
                 error to be approximately 1/2 for all prefixes, as we didn’t really learn anything.
                 In the second scenario the validation error will start as a constant but then should
                 start decreasing (it must start decreasing once the training set size is larger than the
                 VC-dimension). An illustration of the two cases is given in Figure 11.1.
                    In general, as long as the approximation error is greater than zero we expect the
                 training error to grow with the sample size, as a larger amount of data points makes
                 it harder to provide an explanation for all of them. On the other hand, the validation
                 error tends to decrease with the increase in sample size. If the VC-dimension is
                 finite, when the sample size goes to infinity, the validation and train errors converge
                 to the approximation error. Therefore, by extrapolating the training and validation
                 curves we can try to guess the value of the approximation error, or at least to get a
                 rough estimate on an interval in which the approximation error resides.
                    Getting back to the problem of finding the best remedy for the failure of our
                 algorithm, if we observe that L S (h S ) is small while the validation error is large, then
                 in any case we know that the size of our training set is not sufficient for learning
                 the class H. We can then plot a learning curve. If we see that the validation error is
                 starting to decrease then the best solution is to increase the number of examples (if
                 we can afford to enlarge the data). Another reasonable solution is to decrease the
   135   136   137   138   139   140   141   142   143   144   145