Page 159 - Understanding Machine Learning

P. 159

13.3 Tikhonov Regularization as a Stabilizer 141

f (u)

f(w) ≥ λ α(1 − α)||u − w || 2
2
w u
αw + (1−α) u

The following lemma implies that the objective of RLM is (2λ)-strongly convex.
In addition, it underscores an important property of strong convexity.

Lemma 13.5.

2
1. The function f (w) = λ w is 2λ-strongly convex.
2. If f is λ-strongly convex and g is convex, then f + g is λ-strongly convex.
3. If f is λ-strongly convex and u is a minimizer of f , then, for any w,

λ 2
f (w) − f (u) ≥ w − u .
2
Proof. The ﬁrst two points follow directly from the deﬁnition. To prove the last
point, we divide the deﬁnition of strong convexity by α and rearrange terms to get
that

f (u + α(w − u)) − f (u) λ 2
≤ f (w) − f (u) − (1 − α) w − u .
α 2

Taking the limit α → 0 we obtain that the right-hand side converges to f (w)− f (u)−
λ w − u . On the other hand, the left-hand side becomes the derivative of the
2
2
function g(α) = f (u + α(w − u)) at α = 0. Since u is a minimizer of f , it follows that
α = 0 is a minimizer of g, and therefore the left-hand side of the preceding goes to
zero in the limit α → 0, which concludes our proof.

We now turn to prove that RLM is stable. Let S = (z 1 ,...,z m ) be a training set,

let z be an additional example, and let S (i) = (z 1 ,...,z i−1 , z , z i+1 ,...,z m ). Let A be

the RLM rule, namely,

2
A(S) = argmin L S (w) + λ w .
w
2
Denote f S (w) = L S (w)+λ w , and on the basis of Lemma 13.5 we know that f S is
(2λ)-strongly convex. Relying on part 3 of the lemma, it follows that for any v,

2
f S (v) − f S (A(S)) ≥ λ v − A(S) . (13.7)

154 155 156 157 158 159 160 161 162 163 164