Page 244 - Understanding Machine Learning
P. 244

Nearest Neighbor
           226
                                                                m

                 the result of applying the k-NN rule to a sample S ∼ D ,where k ≥ 10.Let h be the
                 Bayes optimal hypothesis. Then,

                                               8              √        −1/(d+1)
                             E[L D (h S )] ≤ 1 +  L D (h ) + 6c d + k m       .
                             S                 k
                 19.1 Prove the following lemma.
                      Lemma 19.6.  Let C 1 ,...,C r be a collection of subsets of some domain set, X.Let S
                      be a sequence of m points sampled i.i.d. according to some probability distribution,
                      D over X. Then, for every k ≥ 2,
                                                           
                                                                 2rk
                                            E         P[C i ]   ≤  .
                                           S∼D m                 m
                                                i:|C i ∩S|<k
                      Hints:
                        Show that
                                                   
                                                         r

                                     E        P[C i ]  =  P[C i ]P[|C i ∩ S| < k].
                                      S                         S
                                        i:|C i ∩S|<k     i=1
                        Fix some i and suppose that k < P[C i ]m/2. Use Chernoff’s bound to show that
                                                                       −P[C i ]m/8
                                   P[|C i ∩ S| < k] ≤ P[|C i ∩ S| < P[C i ]m/2] ≤ e  .
                                   S             S
                        Use the inequality max a ae −ma  ≤  1  to show that for such i we have
                                                    me
                                                                         8
                                                               −P[C i ]m/8
                                      P[C i ]P[|C i ∩ S| < k] ≤ P[C i ]e  ≤  .
                                           S                             me
                        Conclude the proof by using the fact that for the case k ≥ P[C i ]m/2 we clearly
                         have:
                                                                     2k
                                           P[C i ]P[|C i ∩ S| < k] ≤ P[C i ] ≤  .
                                                S                    m
                 19.2 We use the notation y ∼ p as a shorthand for “y is a Bernoulli random variable
                      with expected value p.” Prove the following lemma:
                      Lemma 19.7.  Let k ≥ 10 and let Z 1 ,..., Z k be independent Bernoulli random
                                                        1             1    k
                      variables with P[Z i = 1] = p i . Denote p =  p i and p =  Z i . Show that
                                                        k  i          k  i=1

                                                             8
                                   E   P [y  = 1 [p >1/2] ] ≤ 1+  P [y  = 1 [p>1/2] ].

                                 Z 1 ,...,Z k y∼p            k  y∼p
                      Hints:
                      W.l.o.g. assume that p ≤ 1/2. Then, P y∼p [y  = 1 [p>1/2] ] = p.Let y = 1 [p >1/2] .


                        Show that
                                       E   P [y  = y ]− p =  P  [p > 1/2](1− 2p).


                                     Z 1 ,...,Z k y∼p    Z 1 ,...,Z k
                        Use Chernoff’s bound (Lemma B.3) to show that

                                                          −kph  1  −1

                                              P[p > 1/2] ≤ e   2p   ,
                         where
                                             h(a) = (1+ a)log(1+ a) − a.
   239   240   241   242   243   244   245   246   247   248   249