Page 161 - Understanding Machine Learning

P. 161

13.3 Tikhonov Regularization as a Stabilizer 143

It follows (using Theorem 13.2)that

2ρ 2
E [L D (A(S)) − L S (A(S))] ≤ .
m λm
S∼D

13.3.2 Smooth and Nonnegative Loss

If the loss is β-smooth and nonnegative then it is also self-bounded (see
Section 12.1):

2
∇ f (w) ≤ 2β f (w). (13.12)

2β
We further assume that λ ≥ , or, in other words, that β ≤ λm/2. By the smoothness
m
assumption we have that
β
(i) (i) (i) 2
(A(S ),z i ) − (A(S),z i ) ≤ø∇ (A(S),z i ), A(S ) − A(S) + A(S ) − A(S) .
2
(13.13)

Using the Cauchy-Schwartz inequality and Equation (12.6) we further obtain that

(i)
(A(S ),z i ) − (A(S),z i )
β
(i) (i) 2
≤√∇ (A(S),z i ) A(S ) − A(S) + A(S ) − A(S)
2
(i) β (i) 2
≤ 2βØ (A(S),z i ) A(S ) − A(S) + A(S ) − A(S) . (13.14)
2
By a symmetric argument it holds that

(i)

(A(S),z ) − (A(S ),z )

β
(i) (i) (i) 2
≤ 2βØ (A(S ),z ) A(S ) − A(S) + A(S ) − A(S) .
2
Plugging these inequalities into Equation (13.10) and rearranging terms we obtain
that

√
2β
(i) (i)

A(S ) − A(S) ≤ (A(S),z i ) + (A(S ),z ) .
(λm − β)
Combining the preceding with the assumption β ≤ λm/2 yields
√
8β
(i) (i)

A(S ) − A(S) ≤ (A(S),z i ) + (A(S ),z ) .
λm

156 157 158 159 160 161 162 163 164 165 166