Page 96 - Understanding Machine Learning
P. 96

The Runtime of Learning
           78

                 Efficiently Learnable in the Realizable Case
                 Consider implementing the ERM rule in the realizable case. That is, we are given
                 a training set S = (x 1 , y 1 ),...,(x m , y m ) of examples, such that there exists an axis
                 aligned rectangle, h ∈ H n ,for which h(x i ) = y i for all i. Our goal is to find such an axis
                 aligned rectangle with a zero training error, namely, a rectangle that is consistent
                 with all the labels in S.
                    We show later that this can be done in time O(nm). Indeed, for each i ∈ [n],
                 set a i = min{x i :(x,1) ∈ S} and b i = max{x i :(x,1) ∈ S}. In words, we take a i to be
                 the minimal value of the i’th coordinate of a positive example in S and b i to be the
                 maximal value of the i’th coordinate of a positive example in S. It is easy to verify
                 that the resulting rectangle has zero training error and that the runtime of finding
                 each a i and b i is O(m). Hence, the total runtime of this procedure is O(nm).



                 Not Efficiently Learnable in the Agnostic Case
                 In the agnostic case, we do not assume that some hypothesis h perfectly predicts
                 the labels of all the examples in the training set. Our goal is therefore to find h
                 that minimizes the number of examples for which y i  = h(x i ). It turns out that for
                 many common hypothesis classes, including the classes of axis aligned rectangles we
                 consider here, solving the ERM problem in the agnostic setting is NP-hard (and,
                 in most cases, it is even NP-hard to find some h ∈ H whose error is no more than
                 some constant c > 1 times that of the empirical risk minimizer in H). That is, unless
                 P = NP, there is no algorithm whose running time is polynomial in m and n that
                 is guaranteed to find an ERM hypothesis for these problems (Ben-David, Eiron &
                 Long 2003).
                    On the other hand, it is worthwhile noticing that, if we fix one specific hypothesis
                 class, say, axis aligned rectangles in some fixed dimension, n, then there exist effi-
                 cient learning algorithms for this class. In other words, there are successful agnostic
                 PAC learners that run in time polynomial in 1/
 and 1/δ (but their dependence on
                 the dimension n is not polynomial).
                    To see this, recall the implementation of the ERM rule we presented for the
                 realizable case, from which it follows that an axis aligned rectangle is determined by
                 at most 2n examples. Therefore, given a training set of size m, we can perform an
                 exhaustive search over all subsets of the training set of size at most 2n examples and
                 construct a rectangle from each such subset. Then, we can pick the rectangle with
                 the minimal training error. This procedure is guaranteed to find an ERM hypothe-
                 sis, and the runtime of the procedure is m O(n) . It follows that if n is fixed, the runtime
                 is polynomial in the sample size. This does not contradict the aforementioned hard-
                 ness result, since there we argued that unless P=NP one cannot have an algorithm
                 whose dependence on the dimension n is polynomial as well.



                 8.2.3 Boolean Conjunctions
                                                                 n
                 A Boolean conjunction is a mapping from X ={0,1} to Y ={0,1} that can be
                                                                                      ,for
                 expressed as a proposition formula of the form x i 1  ∧ ... ∧ x i k  ∧¬x j 1  ∧ ... ∧¬x j r
                 some indices i 1 ,...,i k , j 1 ,..., j r ∈ [n]. The function that such a proposition formula
   91   92   93   94   95   96   97   98   99   100   101