Page 244 - Understanding Machine Learning

P. 244

Nearest Neighbor
226
m

the result of applying the k-NN rule to a sample S ∼ D ,where k ≥ 10.Let h be the
Bayes optimal hypothesis. Then,

8 √ −1/(d+1)
E[L D (h S )] ≤ 1 + L D (h ) + 6c d + k m .
S k
19.1 Prove the following lemma.
Lemma 19.6. Let C 1 ,...,C r be a collection of subsets of some domain set, X.Let S
be a sequence of m points sampled i.i.d. according to some probability distribution,
D over X. Then, for every k ≥ 2,
 
2rk
E  P[C i ]  ≤ .
S∼D m m
i:|C i ∩S|<k
Hints:
Show that
 
r

E  P[C i ]  = P[C i ]P[|C i ∩ S| < k].
S S
i:|C i ∩S|<k i=1
Fix some i and suppose that k < P[C i ]m/2. Use Chernoff’s bound to show that
−P[C i ]m/8
P[|C i ∩ S| < k] ≤ P[|C i ∩ S| < P[C i ]m/2] ≤ e .
S S
Use the inequality max a ae −ma ≤ 1 to show that for such i we have
me
8
−P[C i ]m/8
P[C i ]P[|C i ∩ S| < k] ≤ P[C i ]e ≤ .
S me
Conclude the proof by using the fact that for the case k ≥ P[C i ]m/2 we clearly
have:
2k
P[C i ]P[|C i ∩ S| < k] ≤ P[C i ] ≤ .
S m
19.2 We use the notation y ∼ p as a shorthand for “y is a Bernoulli random variable
with expected value p.” Prove the following lemma:
Lemma 19.7. Let k ≥ 10 and let Z 1 ,..., Z k be independent Bernoulli random
1 1 k
variables with P[Z i = 1] = p i . Denote p = p i and p = Z i . Show that
k i k i=1

8
E P [y = 1 [p >1/2] ] ≤ 1+ P [y = 1 [p>1/2] ].

Z 1 ,...,Z k y∼p k y∼p
Hints:
W.l.o.g. assume that p ≤ 1/2. Then, P y∼p [y = 1 [p>1/2] ] = p.Let y = 1 [p >1/2] .

Show that
E P [y = y ]− p = P [p > 1/2](1− 2p).

Z 1 ,...,Z k y∼p Z 1 ,...,Z k
Use Chernoff’s bound (Lemma B.3) to show that

−kph 1 −1

P[p > 1/2] ≤ e 2p ,
where
h(a) = (1+ a)log(1+ a) − a.

239 240 241 242 243 244 245 246 247 248 249