Page 367 - Understanding Machine Learning
P. 367

28.3 The Upper Bound for the Realizable Case  349


              therefore obtain that


                             P[A ∈ B ] ≤  E   E      1 [|h∩A j |=0] 1 [|h∩A|≥α]
                                           2m j∼J
                                        A∼D
                                                h∈H A

                                      =   E       1 [|h∩A|≥α] E 1 [|h∩A j |=0] .
                                            2m            j∼J
                                        A∼D
                                             h∈H A
              Now, fix some A s.t. |h ∩ A|≥ α. Then, E j 1 [|h∩A j |=0] is the probability that when
              choosing m balls from a bag with at least α red balls, we will never choose a red ball.
              This probability is at most
                                             m          m   −
m/4
                                  (1 − α/(2m)) = (1 − 
/4) ≤ e   .
              We therefore get that

                                                 −
m/4   −
m/4
                            P[A ∈ B ] ≤  E      e     ≤ e      E   |H A |.
                                          2m                     2m
                                      A∼D                     A∼D
                                            h∈H A
              Using the definition of the growth function we conclude the proof of Claim 2.
                                                                                    d
                 Completing the Proof: By Sauer’s lemma we know that τ H (2m) ≤ (2em/d) .
              Combining this with the two claims we obtain that
                                                       d −
m/4
                                     P[S ∈ B] ≤ 2(2em/d) e    .
              We would like the right-hand side of the inequality to be at most δ;that is,
                                                d −
m/4
                                        2(2em/d) e     ≤ δ.
              Rearranging, we obtain the requirement
                                                4d         4
                      4
                  m ≥   d log(2em/d) + log(2/δ) =  log(m) + (d log(2e/d) + log(2/δ).

              Using Lemma A.2, a sufficient condition for the preceding to hold is that

                                 16d      8d    8
                             m ≥     log      + (d log(2e/d) + log(2/δ).

              A sufficient condition for this is that

                                16d      8d    16             1
                            m ≥     log      +   (d log(2e/d) + log(2/δ)
                                                              2


                                16d       8d 2e     8
                              =       log         +  log(2/δ)
                                  
        d

                                8         16e         2
                              =    2d log      + log      .
                                
          
          δ
              and this concludes our proof.

              28.3.1 From 
-Nets to PAC Learnability
              Theorem 28.4. Let H be a hypothesis class over X with VCdim(H) = d.Let D be a
              distribution over X and let c ∈ H be a target hypothesis. Fix 
,δ ∈ (0,1) and let m be
              as defined in Theorem 28.3. Then, with probability of at least 1 −δ over a choice of m
   362   363   364   365   366   367   368   369   370   371   372