Page 363 - Understanding Machine Learning

P. 363

28.2 The Lower Bound for the Agnostic Case 345

Next, ﬁx some learning algorithm A. As in the proof of the No-Free-Lunch
theorem, we have that

max E L D b (A(S)) − min L D b (h) (28.3)
d m
D b :b∈{±1} S∼D b h∈H

≥ E E L D b (A(S)) − min L D b (h) (28.4)
d m
D b :b∼U({±1} ) S∼D b h∈H

|{i ∈ [d]: A(S)(c i ) = b i |
= E E ρ · (28.5)
d
D b :b∼U({±1} ) S∼D m d
b
d
ρ
= E E 1 [A(S)(c i ) =b i ] , (28.6)
d D b :b∼U({±1} ) S∼D m
d
i=1 b
where the ﬁrst equality follows from Equation (28.2). In addition, using the deﬁni-
m
tion of D b , to sample S ∼ D b we can ﬁrst sample ( j 1 ,..., j m ) ∼ U([d]) ,set x r = c j i ,
] = (1 + ρ)/2. Let us simplify the notation
and ﬁnally sample y r such that P[y r = b j i
and use y ∼ b to denote sampling according to P[y = b] = (1 + ρ)/2. Therefore, the
right-hand side of Equation (28.6) equals
d
ρ
E E E 1 [A(S)(c i ) =b i ] . (28.7)
d j∼U([d]) b∼U({±1} ) ∀r,y r ∼b jr
d
m
i=1
We now proceed in two steps. First, we show that among all learning algorithms,
A, the one which minimizes Equation (28.7) (and hence also Equation (28.4)) is the
Maximum-Likelihood learning rule, denoted A ML . Formally, for each i, A ML (S)(c i )
is the majority vote among the set {y r : r ∈ [m],x r = c i }. Second, we lower bound
Equation (28.7)for A ML .
Lemma 28.1. Among all algorithms, Equation (28.4) is minimized for A being the
Maximum-Likelihood algorithm, A ML , deﬁned as

∀i, A ML (S)(c i ) = sign y r .
r:x r =c i
m
m
Proof. Fix some j ∈ [d] . Note that given j and y ∈{±1} , the training set S is fully
determined. Therefore, we can write A( j, y) instead of A(S). Let us also ﬁx i ∈ [d].
m
Denote b ¬i the sequence (b 1 ,...,b i−1 ,b i+1 ,...,b m ). Also, for any y ∈{±1} ,let y I
denote the elements of y corresponding to indices for which j r = i and let y ¬I be
the rest of the elements of y. We have
E E 1 [A(S)(c i ) =b i ]
d
b∼U({±1} ) ∀r,y r ∼b jr
1 ¬i
= E P[y|b ,b i ]1 [A( j,y)(c i ) =b i ]
2 b ∼U({±1} d−1 )
¬i
b i ∈{±1} y
 
¬I ¬i 1 I
= E P[y |b ]  P[y |b i ]1 [A( j,y)(c i ) =b i ]  .
¬i d−1 ) 2
b ∼U({±1} ¬I I
y y b i ∈{±1}

358 359 360 361 362 363 364 365 366 367 368