Page 360 - Understanding Machine Learning
P. 360

Proof of the Fundamental Theorem of Learning Theory
           342

                 Denote A ={(1 [h(x 1 ) =y 1 ] ,...,1 [h(x m ) =y m ] ): h ∈ H}. This clearly implies that
                                                      em
                                                           d
                                               |A| ≤        .
                                                       d
                 Combining this with Lemma 26.8 we obtain the following bound on the Rademacher
                 complexity:

                                                    2d log(em/d)
                                           R(A) ≤              .
                                                         m
                 Using Theorem 26.5 we obtain that with probability of at least 1−δ, for every h ∈ H
                 we have that

                                                 8d log(em/d)    2log(2/δ)
                                L D (h) − L S (h) ≤          +            .
                                                      m              m
                 Repeating the previous argument for minus the zero-one loss and applying the
                 union bound we obtain that with probability of at least 1 − δ, for every h ∈ H it
                 holds that

                                                  8d log(em/d)    2log(4/δ)
                                |L D (h) − L S (h)|≤          +
                                                       m              m

                                                   8d log(em/d) + 2log(4/δ)
                                              ≤ 2                        .
                                                             m
                 To ensure that this is smaller than 
 we need

                                     4
                                 m ≥   · 8d log(m) + 8d log(e/d) + 2log(4/δ) .
                                     
 2
                 Using Lemma A.2, a sufficient condition for the inequality to hold is that

                                   32d      64d    8
                             m ≥ 4    · log      +   · 8d log(e/d) + 2log(4/δ) .
                                   
 2      
 2    
 2
                 28.2 THE LOWER BOUND FOR THE AGNOSTIC CASE

                 Here, we prove that there exists C such that H is agnostic PAC learnable with
                 sample complexity
                                                      d + ln(1/δ)
                                          m H (
,δ) ≥ C         .
                                                          
 2
                 We will prove the lower bound in two parts. First, we will show that m(
,δ) ≥
                                2
                 0.5log(1/(4δ))/
 , and second we will show that for every δ ≤ 1/8 we have that
                              2
                 m(
,δ) ≥ 8d/
 . These two bounds will conclude the proof.
                 28.2.1 Showing That m(
,δ) ≥ 0.5log(1/(4δ))/
   2
                                                √
                 We first show that for any 
< 1/ 2and any δ ∈ (0,1), we have that m(
,δ) ≥
                                                                               2
                                2
                 0.5log(1/(4δ))/
 . To do so, we show that for m ≤ 0.5log(1/(4δ))/
 , H is not
                 learnable.
                    Choose one example that is shattered by H.That is, let c be an example such that
                 there are h + ,h − ∈ H for which h + (c) = 1and h − (c) =−1. Define two distributions,
   355   356   357   358   359   360   361   362   363   364   365