Page 241 - Understanding Machine Learning
P. 241

19.2 Analysis  223


              Combining the preceding two equations we get
                                   
                                          r

                     E        P[C i ]   ≤  P[C i ]e −P[C i ]m  ≤ r maxP[C i ]e −P[C i ]m .
                      S                                       i
                        i:C i ∩S=∅       i=1
              Finally, by a standard calculus, max a ae −ma  ≤  1  and this concludes the proof.
                                                     me
                 Equipped with the preceding lemmas we are now ready to state and prove the
              main result of this section – an upper bound on the expected error of the 1-NN
              learning rule.
                                       d
              Theorem 19.3. Let X = [0,1] ,Y ={0,1},and D be a distribution over X × Y for
              which the conditional probability function, η,is a c-Lipschitz function. Let h S denote
                                                           m
              the result of applying the 1-NN rule to a sample S ∼ D . Then,
                                                          √      1

                                 E [L D (h S )] ≤ 2 L D (h ) + 4c dm  − d+1 .
                                   m
                               S∼D
                                                              d
              Proof. Fix some 
 = 1/T, for some integer T,let r = T and let C 1 ,...,C r be the
                                                                                    d
              cover of the set X using boxes of length 
: Namely, for every (α 1 ,...,α d ) ∈ [T] ,
              there exists a set C i of the form {x : ∀ j,x j ∈ [(α j − 1)/T ,α j /T ]}. An illustration for
              d = 2, T = 5 and the set corresponding to α = (2,4) is given in the following.

                                          1





                                                      1

                                                         √                        √

              For each x,x in the same box we have  x − x  ≤  d 
.Otherwise,  x − x  ≤  d.
              Therefore,
                                                                      
                                                     √                  √

                      E [ x − x π 1 (x)  ] ≤ E P  C i    d + P    C i 
 d ,
                                                                           
                                        
                                                                      
                      x,S             S
                                           i:C i ∩S=∅        i:C i ∩S =∅

              and by combining Lemma 19.2 with the trivial bound P[   C i ] ≤ 1 we get that
                                                               i:C i ∩S =∅
                                                    √    r
                                    E [ x − x π 1 (x)  ] ≤  d  + 
 .
                                    x,S                 me
                                               d
              Since the number of boxes is r = (1/
) we get that
                                                  √      d −d
                                   E [ x − x π 1 (x)  ] ≤  d  2 
  + 
 .
                                   S,x                  me
              Combining the preceding with Lemma 19.1 we obtain that
                                                      √     d −d

                               E[L D (h S )] ≤ 2 L D (h ) + c d  2 
  + 
 .
                                S                          me
   236   237   238   239   240   241   242   243   244   245   246