Page 244 - Understanding Machine Learning
P. 244
Nearest Neighbor
226
m
the result of applying the k-NN rule to a sample S ∼ D ,where k ≥ 10.Let h be the
Bayes optimal hypothesis. Then,
8 √ −1/(d+1)
E[L D (h S )] ≤ 1 + L D (h ) + 6c d + k m .
S k
19.1 Prove the following lemma.
Lemma 19.6. Let C 1 ,...,C r be a collection of subsets of some domain set, X.Let S
be a sequence of m points sampled i.i.d. according to some probability distribution,
D over X. Then, for every k ≥ 2,
2rk
E P[C i ] ≤ .
S∼D m m
i:|C i ∩S|<k
Hints:
Show that
r
E P[C i ] = P[C i ]P[|C i ∩ S| < k].
S S
i:|C i ∩S|<k i=1
Fix some i and suppose that k < P[C i ]m/2. Use Chernoff’s bound to show that
−P[C i ]m/8
P[|C i ∩ S| < k] ≤ P[|C i ∩ S| < P[C i ]m/2] ≤ e .
S S
Use the inequality max a ae −ma ≤ 1 to show that for such i we have
me
8
−P[C i ]m/8
P[C i ]P[|C i ∩ S| < k] ≤ P[C i ]e ≤ .
S me
Conclude the proof by using the fact that for the case k ≥ P[C i ]m/2 we clearly
have:
2k
P[C i ]P[|C i ∩ S| < k] ≤ P[C i ] ≤ .
S m
19.2 We use the notation y ∼ p as a shorthand for “y is a Bernoulli random variable
with expected value p.” Prove the following lemma:
Lemma 19.7. Let k ≥ 10 and let Z 1 ,..., Z k be independent Bernoulli random
1 1 k
variables with P[Z i = 1] = p i . Denote p = p i and p = Z i . Show that
k i k i=1
8
E P [y = 1 [p >1/2] ] ≤ 1+ P [y = 1 [p>1/2] ].
Z 1 ,...,Z k y∼p k y∼p
Hints:
W.l.o.g. assume that p ≤ 1/2. Then, P y∼p [y = 1 [p>1/2] ] = p.Let y = 1 [p >1/2] .
Show that
E P [y = y ]− p = P [p > 1/2](1− 2p).
Z 1 ,...,Z k y∼p Z 1 ,...,Z k
Use Chernoff’s bound (Lemma B.3) to show that
−kph 1 −1
P[p > 1/2] ≤ e 2p ,
where
h(a) = (1+ a)log(1+ a) − a.