Page 51 - Understanding Machine Learning
P. 51

4.2 Finite Classes Are Agnostic PAC Learnable  33


              Writing
                     {S : ∃h ∈ H,|L S (h) − L D (h)| >
}= ∪ h∈H {S : |L S (h) − L D (h)| >
},
              and applying the union bound (Lemma 2.2) we obtain


                 m                                     m
               D ({S : ∃h ∈ H,|L S (h) − L D (h)| >
}) ≤  D ({S : |L S (h) − L D (h)| >
}). (4.1)
                                                  h∈H
                 Our second step will be to argue that each summand of the right-hand side of
              this inequality is small enough (for a sufficiently large m). That is, we will show that
              for any fixed hypothesis, h, (which is chosen in advance prior to the sampling of the
              training set), the gap between the true and empirical risks, |L S (h)− L D (h)|, is likely
              to be small.
                                                             1    m
                 Recall that L D (h) = E z∼D [ (h,z)] and that L S (h) =   (h,z i ). Since each z i
                                                             m   i=1
              is sampled i.i.d. from D, the expected value of the random variable  (h,z i )is L D (h).
              By the linearity of expectation, it follows that L D (h) is also the expected value of
              L S (h). Hence, the quantity |L D (h) − L S (h)| is the deviation of the random variable
              L S (h) from its expectation. We therefore need to show that the measure of L S (h)is
              concentrated around its expected value.
                 A basic statistical fact, the law of large numbers, states that when m goes to
              infinity, empirical averages converge to their true expectation. This is true for L S (h),
              since it is the empirical average of m i.i.d random variables. However, since the law
              of large numbers is only an asymptotic result, it provides no information about the
              gap between the empirically estimated error and its true value for any given, finite,
              sample size.
                 Instead, we will use a measure concentration inequality due to Hoeffding, which
              quantifies the gap between empirical averages and their expected value.
              Lemma 4.5 (Hoeffding’s Inequality). Let θ 1 ,...,θ m be a sequence of i.i.d. random
              variables and assume that for all i, E[θ i ] = µ and P[a ≤ θ i ≤ b] = 1. Then, for any
              
> 0

                                  m


                              
 1        
                   2       2
                               m
                            P 
     θ i − µ
 >
  ≤ 2 exp −2m 
 /(b − a)  .

                                 i=1
                 The proof can be found in Appendix B.
                 Getting back to our problem, let θ i be the random variable  (h,z i ). Since h is
              fixed and z 1 ,...,z m are sampled i.i.d., it follows that θ 1 ,...,θ m are also i.i.d. random
                                           1    m
                                                 θ
              variables. Furthermore, L S (h) =  i=1 i and L D (h) = µ. Let us further assume
                                           m
              that the range of   is [0,1] and therefore θ i ∈ [0,1]. We therefore obtain that

                                                 m


                 m                            
 1        
                   2
                                               m
               D ({S : |L S (h) − L D (h)| >
}) = P 
  θ i − µ
 >
  ≤ 2 exp −2m 
  . (4.2)

                                                 i=1
              Combining this with Equation (4.1) yields


                         m                                               2
                       D ({S : ∃h ∈ H,|L S (h) − L D (h)| >
}) ≤  2 exp −2m
                                                          h∈H

                                                                         2
                                                        = 2|H| exp −2m 
   .
   46   47   48   49   50   51   52   53   54   55   56