Page 359 - Understanding Machine Learning
P. 359

28





              Proof of the Fundamental Theorem
              of Learning Theory













              In this chapter we prove Theorem 6.8 from Chapter 6. We remind the reader the
              conditions of the theorem, which will hold throughout this chapter: H is a hypothesis
              class of functions from a domain X to {0,1}, the loss function is the 0 − 1 loss, and
              VCdim(H) = d < ∞.
                 We shall prove the upper bound for both the realizable and agnostic cases and
              shall prove the lower bound for the agnostic case. The lower bound for the realizable
              case is left as an exercise.

              28.1 THE UPPER BOUND FOR THE AGNOSTIC CASE

              For the upper bound we need to prove that there exists C such that H is agnostic
              PAC learnable with sample complexity
                                                  d + ln(1/δ)
                                       m H (
,δ) ≤ C        .
                                                      
 2
              We will prove the slightly looser bound:
                                              d log(d/
) + ln(1/δ)
                                   m H (
,δ) ≤ C                .               (28.1)
                                                      
 2
              The tighter bound in the theorem statement requires a more involved proof, in
              which a more careful analysis of the Rademacher complexity using a technique
              called “chaining” should be used. This is beyond the scope of this book.
                 To prove Equation (28.1), it suffices to show that applying the ERM with a
              sample size

                               32d      64d     8
                          m ≥ 4    · log     +    · 8d log(e/d) + 2log(4/δ)
                                
 2      
 2   
 2
              yields an 
,δ-learner for H. We prove this result on the basis of Theorem 26.5.
                 Let (x 1 , y 1 ),...,(x m , y m ) be a classification training set. Recall that the Sauer-
              Shelah lemma tells us that if VCdim(H) = d then
                                                                  d
                                
                      
     em
                                 {(h(x 1 ),...,h(x m )) : h ∈ H} ≤
                                
                      
          .
                                                              d
                                                                                        341
   354   355   356   357   358   359   360   361   362   363   364