Page 345 - Understanding Machine Learning
P. 345
26.1 The Rademacher Complexity 327
Taking expectation over S on both sides we obtain
E sup (L D ( f ) − L S ( f )) ≤ E sup (L S ( f ) − L S ( f ))
S S,S
f ∈F f ∈F
m
1
= E sup ( f (z ) − f (z i )) . (26.6)
i
m S,S
f ∈F
i=1
Next, we note that for each j, z j and z are i.i.d. variables. Therefore, we can replace
j
them without affecting the expectation:
E sup ( f (z ) − f (z j )) + ( f (z ) − f (z i ))
j
i
S,S
f ∈F
i = j
= E sup ( f (z j ) − f (z )) + ( f (z ) − f (z i )) . (26.7)
i
j
S,S
f ∈F
i = j
Let σ j be a random variable such that P[σ j = 1] = P[σ j =−1] = 1/2. From
Equation (26.7) we obtain that
E sup σ j ( f (z ) − f (z j )) + ( f (z ) − f (z i ))
j
i
S,S ,σ j
f ∈F
i = j
1 1
= (l.h.s. of Equation (26.7)) + (r.h.s. of Equation (26.7))
2 2
= E sup ( f (z ) − f (z j )) + ( f (z ) − f (z i )) . (26.8)
j
i
S,S
f ∈F
i = j
Repeating this for all j we obtain that
m m
E sup ( f (z ) − f (z i )) = E sup σ i ( f (z ) − f (z i )) . (26.9)
i
i
S,S S,S ,σ
f ∈F f ∈F
i=1 i=1
Finally,
sup σ i ( f (z ) − f (z i )) ≤ sup σ i f (z ) + sup −σ i f (z i )
i i
f ∈F f ∈F f ∈F
i i i
and since the probability of σ is the same as the probability of −σ, the right-hand
side of Equation (26.9) can be bounded by
E sup σ i f (z ) + sup σ i f (z i )
i
S,S ,σ
f ∈F f ∈F
i i
= m E[R(F ◦ S )] + m E[R(F ◦ S)] = 2m E[R(F ◦ S)].
S S S
The lemma immediately yields that, in expectation, the ERM rule finds a
hypothesis which is close to the optimal hypothesis in H.