Page 345 - Understanding Machine Learning

P. 345

26.1 The Rademacher Complexity 327

Taking expectation over S on both sides we obtain

E sup (L D ( f ) − L S ( f )) ≤ E sup (L S ( f ) − L S ( f ))

S S,S
f ∈F f ∈F
m
1

= E sup ( f (z ) − f (z i )) . (26.6)
i
m S,S
f ∈F
i=1

Next, we note that for each j, z j and z are i.i.d. variables. Therefore, we can replace
j
them without affecting the expectation:
  

E  sup  ( f (z ) − f (z j )) + ( f (z ) − f (z i )) 
j
i
S,S
f ∈F
i = j
  

= E  sup  ( f (z j ) − f (z )) + ( f (z ) − f (z i ))  . (26.7)
i
j
S,S
f ∈F
i = j
Let σ j be a random variable such that P[σ j = 1] = P[σ j =−1] = 1/2. From
Equation (26.7) we obtain that
  

E  sup  σ j ( f (z ) − f (z j )) + ( f (z ) − f (z i )) 
j
i

S,S ,σ j
f ∈F
i = j
1 1
= (l.h.s. of Equation (26.7)) + (r.h.s. of Equation (26.7))
2 2
  

= E  sup  ( f (z ) − f (z j )) + ( f (z ) − f (z i ))  . (26.8)
j
i
S,S
f ∈F
i = j
Repeating this for all j we obtain that

m m

E sup ( f (z ) − f (z i )) = E sup σ i ( f (z ) − f (z i )) . (26.9)

i
i
S,S S,S ,σ

f ∈F f ∈F
i=1 i=1
Finally,

sup σ i ( f (z ) − f (z i )) ≤ sup σ i f (z ) + sup −σ i f (z i )

i i
f ∈F f ∈F f ∈F
i i i
and since the probability of σ is the same as the probability of −σ, the right-hand
side of Equation (26.9) can be bounded by

E sup σ i f (z ) + sup σ i f (z i )

S,S ,σ
f ∈F f ∈F
i i

= m E[R(F ◦ S )] + m E[R(F ◦ S)] = 2m E[R(F ◦ S)].
S S S
The lemma immediately yields that, in expectation, the ERM rule ﬁnds a
hypothesis which is close to the optimal hypothesis in H.

340 341 342 343 344 345 346 347 348 349 350