Page 365 - Understanding Machine Learning
P. 365

28.3 The Upper Bound for the Realizable Case  347

                               1
              Finally, Let   =  (L D (A(S)) − min h∈H L D (h)) and note that   ∈ [0,1] (see
                               ρ
              Equation (28.5)). Therefore, using Lemma B.1, we get that


                         P[L D (A(S)) − min L D (h) >
] = P  >  ≥ E[ ] −
                                      h∈H                   ρ           ρ
                                                      1
                                                    ≥  −   .
                                                      4  ρ
              Choosing ρ = 8
 we conclude that if m <  8d  , then with probability of at least 1/8we
                                                 
 2
              will have L D (A(S)) − min h∈H L D (h) ≥ 
.



              28.3 THE UPPER BOUND FOR THE REALIZABLE CASE

              Here we prove that there exists C such that H is PAC learnable with sample
              complexity
                                               d ln(1/
) + ln(1/δ)
                                   m H (
,δ) ≤ C               .

              We do so by showing that for m ≥ C  d ln(1/
)+ln(1/δ) , H is learnable using the ERM

              rule. We prove this claim on the basis of the notion of 
-nets.

              Definition 28.2 (
-net). Let X be a domain. S ⊂ X is an 
-net for H ⊂ 2 X  with
              respect to a distribution D over X if

                                   ∀h ∈ H : D(h) ≥ 
 ⇒ h ∩ S  =∅.

              Theorem 28.3. Let H ⊂ 2 with VCdim(H) = d.Fix 
 ∈ (0,1), δ ∈ (0,1/4) and let
                                    X

                                      8         16e         2
                                  m ≥    2d log      + log      .
                                      
          
          δ
                                                                  m
              Then, with probability of at least 1 − δ over a choice of S ∼ D we have that S is an
              
-net for H.

              Proof. Let
                            B ={S ⊂ X : |S|= m, ∃h ∈ H,D(h) ≥ 
,h ∩ S =∅}
              be the set of sets which are not 
-nets. We need to bound P[S ∈ B]. Define


                 B ={(S,T ) ⊂ X : |S|=|T |= m, ∃h ∈ H,D(h) ≥ 
,h ∩ S =∅, |T ∩ h| >  
m  }.
                                                                               2

              Claim 1
              P[S ∈ B] ≤ 2P[(S,T ) ∈ B ].

              Proof of Claim 1:Since S and T are chosen independently we can write



                     P[(S,T ) ∈ B ] =  E    1 [(S,T)∈B ] = E   E   1 [(S,T)∈B ]    .

                                          2m            S∼D m  T ∼D m
                                    (S,T)∼D
   360   361   362   363   364   365   366   367   368   369   370