Page 39 - Understanding Machine Learning
P. 39

2.4 Exercises  21


                                                          R *
                                   -                    +
                                          +           R(S)
                                                   +          -
                                               +


                                                     +
                                            R 1
              Figure 2.2. Axis aligned rectangles.
                 Formally, given real numbers a 1 ≤ b 1 ,a 2 ≤ b 2 , define the classifier h (a 1 ,b 1 ,a 2 ,b 2 ) by

                                              1if a 1 ≤ x 1 ≤ b 1 and a 2 ≤ x 2 ≤ b 2
                            h (a 1 ,b 1 ,a 2 ,b 2 ) (x 1 ,x 2 ) =         .      (2.10)
                                              0otherwise
                 The class of all axis aligned rectangles in the plane is defined as
                                    2
                                  H   ={h (a 1 ,b 1 ,a 2 ,b 2 ) : a 1 ≤ b 1 , and a 2 ≤ b 2 }.
                                    rec
                 Note that this is an infinite size hypothesis class. Throughout this exercise we rely
                 on the realizability assumption.
                  1. Let A be the algorithm that returns the smallest rectangle enclosing all positive
                    examples in the training set. Show that A is an ERM.
                                                           4log(4/δ)
                  2. Show that if A receives a training set of size ≥  then, with probability of

                    at least 1− δ it returns a hypothesis with error of at most 
.
                                                                       ∗
                    Hint: Fix some distribution D over X,let R = R(a ,b ,a ,b )bethe rectan-
                                                                  ∗
                                                               ∗
                                                         ∗
                                                                     ∗
                                                               1  1  2  2
                    gle that generates the labels, and let f be the corresponding hypothesis. Let
                         ∗
                    a 1 ≥ a be a number such that the probability mass (with respect to D)of the
                         1
                    rectangle R 1 = R(a ,a 1 ,a ,b ) is exactly 
/4. Similarly, let b 1 ,a 2 ,b 2 be numbers
                                            ∗
                                         ∗
                                    ∗
                                    1    2  2
                                                                         ∗  ∗  ∗
                    such that the probability masses of the rectangles R 2 = R(b 1 ,b ,a ,b ), R 3 =
                                                                         1  2  2
                                                  ∗
                       ∗
                    R(a ,b ,a ,a 2 ), R 4 = R(a ,b ,b 2 ,b ) are all exactly 
/4. Let R(S)bethe
                          ∗
                                             ∗
                             ∗
                                          ∗
                       1  1  2            1  1    2
                    rectangle returned by A. See illustration in Figure 2.2.
                                       ∗
                       Show that R(S) ⊆ R .
                       Show that if S contains (positive) examples in all of the rectangles
                       R 1 , R 2 , R 3 , R 4 , then the hypothesis returned by A haserrorof at most 
.
                       For each i ∈{1,...,4}, upper bound the probability that S does not contain
                       an example from R i .
                       Use the union bound to conclude the argument.
                                                                               d
                  3. Repeat the previous question for the class of axis aligned rectangles in R .
                  4. Show that the runtime of applying the algorithm A mentioned earlier is polyno-
                    mial in d,1/
, and in log(1/δ).
   34   35   36   37   38   39   40   41   42   43   44