Page 159 - Understanding Machine Learning
P. 159
13.3 Tikhonov Regularization as a Stabilizer 141
f (u)
f(w) ≥ λ α(1 − α)||u − w || 2
2
w u
αw + (1−α) u
The following lemma implies that the objective of RLM is (2λ)-strongly convex.
In addition, it underscores an important property of strong convexity.
Lemma 13.5.
2
1. The function f (w) = λ w is 2λ-strongly convex.
2. If f is λ-strongly convex and g is convex, then f + g is λ-strongly convex.
3. If f is λ-strongly convex and u is a minimizer of f , then, for any w,
λ 2
f (w) − f (u) ≥ w − u .
2
Proof. The first two points follow directly from the definition. To prove the last
point, we divide the definition of strong convexity by α and rearrange terms to get
that
f (u + α(w − u)) − f (u) λ 2
≤ f (w) − f (u) − (1 − α) w − u .
α 2
Taking the limit α → 0 we obtain that the right-hand side converges to f (w)− f (u)−
λ w − u . On the other hand, the left-hand side becomes the derivative of the
2
2
function g(α) = f (u + α(w − u)) at α = 0. Since u is a minimizer of f , it follows that
α = 0 is a minimizer of g, and therefore the left-hand side of the preceding goes to
zero in the limit α → 0, which concludes our proof.
We now turn to prove that RLM is stable. Let S = (z 1 ,...,z m ) be a training set,
let z be an additional example, and let S (i) = (z 1 ,...,z i−1 , z , z i+1 ,...,z m ). Let A be
the RLM rule, namely,
2
A(S) = argmin L S (w) + λ w .
w
2
Denote f S (w) = L S (w)+λ w , and on the basis of Lemma 13.5 we know that f S is
(2λ)-strongly convex. Relying on part 3 of the lemma, it follows that for any v,
2
f S (v) − f S (A(S)) ≥ λ v − A(S) . (13.7)