Page 161 - Understanding Machine Learning
P. 161
13.3 Tikhonov Regularization as a Stabilizer 143
It follows (using Theorem 13.2)that
2ρ 2
E [L D (A(S)) − L S (A(S))] ≤ .
m λm
S∼D
13.3.2 Smooth and Nonnegative Loss
If the loss is β-smooth and nonnegative then it is also self-bounded (see
Section 12.1):
2
∇ f (w) ≤ 2β f (w). (13.12)
2β
We further assume that λ ≥ , or, in other words, that β ≤ λm/2. By the smoothness
m
assumption we have that
β
(i) (i) (i) 2
(A(S ),z i ) − (A(S),z i ) ≤ø∇ (A(S),z i ), A(S ) − A(S) + A(S ) − A(S) .
2
(13.13)
Using the Cauchy-Schwartz inequality and Equation (12.6) we further obtain that
(i)
(A(S ),z i ) − (A(S),z i )
β
(i) (i) 2
≤√∇ (A(S),z i ) A(S ) − A(S) + A(S ) − A(S)
2
(i) β (i) 2
≤ 2βØ (A(S),z i ) A(S ) − A(S) + A(S ) − A(S) . (13.14)
2
By a symmetric argument it holds that
(i)
(A(S),z ) − (A(S ),z )
β
(i) (i) (i) 2
≤ 2βØ (A(S ),z ) A(S ) − A(S) + A(S ) − A(S) .
2
Plugging these inequalities into Equation (13.10) and rearranging terms we obtain
that
√
2β
(i) (i)
A(S ) − A(S) ≤ (A(S),z i ) + (A(S ),z ) .
(λm − β)
Combining the preceding with the assumption β ≤ λm/2 yields
√
8β
(i) (i)
A(S ) − A(S) ≤ (A(S),z i ) + (A(S ),z ) .
λm