Page 241 - Understanding Machine Learning
P. 241
19.2 Analysis 223
Combining the preceding two equations we get
r
E P[C i ] ≤ P[C i ]e −P[C i ]m ≤ r maxP[C i ]e −P[C i ]m .
S i
i:C i ∩S=∅ i=1
Finally, by a standard calculus, max a ae −ma ≤ 1 and this concludes the proof.
me
Equipped with the preceding lemmas we are now ready to state and prove the
main result of this section – an upper bound on the expected error of the 1-NN
learning rule.
d
Theorem 19.3. Let X = [0,1] ,Y ={0,1},and D be a distribution over X × Y for
which the conditional probability function, η,is a c-Lipschitz function. Let h S denote
m
the result of applying the 1-NN rule to a sample S ∼ D . Then,
√ 1
E [L D (h S )] ≤ 2 L D (h ) + 4c dm − d+1 .
m
S∼D
d
Proof. Fix some
= 1/T, for some integer T,let r = T and let C 1 ,...,C r be the
d
cover of the set X using boxes of length
: Namely, for every (α 1 ,...,α d ) ∈ [T] ,
there exists a set C i of the form {x : ∀ j,x j ∈ [(α j − 1)/T ,α j /T ]}. An illustration for
d = 2, T = 5 and the set corresponding to α = (2,4) is given in the following.
1
1
√ √
For each x,x in the same box we have x − x ≤ d
.Otherwise, x − x ≤ d.
Therefore,
√ √
E [ x − x π 1 (x) ] ≤ E P C i d + P C i
d ,
x,S S
i:C i ∩S=∅ i:C i ∩S =∅
and by combining Lemma 19.2 with the trivial bound P[ C i ] ≤ 1 we get that
i:C i ∩S =∅
√ r
E [ x − x π 1 (x) ] ≤ d +
.
x,S me
d
Since the number of boxes is r = (1/
) we get that
√ d −d
E [ x − x π 1 (x) ] ≤ d 2
+
.
S,x me
Combining the preceding with Lemma 19.1 we obtain that
√ d −d
E[L D (h S )] ≤ 2 L D (h ) + c d 2
+
.
S me