Page 37 - Understanding Machine Learning
P. 37

2.3 Empirical Risk Minimization with Inductive Bias  19


              only happen if our sample is in the set of misleading samples, M. Formally, we have
              shown that
                                      {S| x : L (D, f ) (h S ) >
}≤ M.
              Note that we can rewrite M as

                                      M =      {S| x : L S (h) = 0}.             (2.5)
                                          h∈H B
              Hence,

                   m                         m        m
                  D ({S| x : L (D, f ) (h S ) >
}) ≤ D (M) = D ( ∪ h∈H B  {S| x : L S (h) = 0}).  (2.6)
                 Next, we upper bound the right-hand side of the preceding equation using the
              union bound – a basic property of probabilities.
              Lemma 2.2 (Union Bound).  For any two sets A, B and a distribution D we have

                                      D(A ∪ B) ≤ D(A) + D(B).

                 Applying the union bound to the right-hand side of Equation (2.6) yields

                           m                             m
                         D ({S| x : L (D, f ) (h S ) >
}) ≤  D ({S| x : L S (h) = 0}).  (2.7)
                                                   h∈H B
              Next, let us bound each summand of the right-hand side of the preceding inequality.
              Fix some “bad” hypothesis h ∈ H B . The event L S (h) = 0 is equivalent to the event
              ∀i,h(x i ) = f (x i ). Since the examples in the training set are sampled i.i.d. we get that
                             m
                                                 m
                           D ({S| x : L S (h) = 0}) = D ({S| x : ∀i,h(x i ) = f (x i )})
                                                 m

                                              =    D({x i : h(x i ) = f (x i )}).  (2.8)
                                                i=1
                 For each individual sampling of an element of the training set we have
                               D({x i : h(x i ) = y i }) = 1 − L (D, f ) (h) ≤ 1 − 
,

              where the last inequality follows from the fact that h ∈ H B . Combining the previous
              equation with Equation (2.8) and using the inequality 1−
 ≤ e −
  we obtain that for
              every h ∈ H B ,
                                  m
                                                           m
                                D ({S| x : L S (h) = 0}) ≤ (1 − 
) ≤ e −
m .     (2.9)
              Combining this equation with Equation (2.7) we conclude that
                             m
                           D ({S| x : L (D, f ) (h S ) >
}) ≤|H B |e −
m  ≤|H|e −
 m .
              A graphical illustration which explains how we used the union bound is given in
              Figure 2.1.

              Corollary 2.3. Let H be a finite hypothesis class. Let δ ∈ (0,1) and 
> 0 and let m be
              an integer that satisfies
                                               log(|H|/δ)
                                          m ≥           .
   32   33   34   35   36   37   38   39   40   41   42