Page 146 - Understanding Machine Learning

P. 146

Convex Learning Problems
128

Proof. The ﬁrst claim follows by
g(αu + (1 − α)v) = max f i (αu + (1 − α)v)
i
≤ max[α f i (u) + (1 − α) f i (v)]
i
≤ α max f i (u) + (1 − α)max f i (v)
i i
= αg(u) + (1 − α)g(v).

For the second claim

g(αu + (1 − α)v) = w i f i (αu + (1 − α)v)
i

≤ w i [α f i (u) + (1 − α) f i (v)]
i

= α w i f i (u) + (1 − α) w i f i (v)
i i
= αg(u) + (1 − α)g(v).

Example 12.3. The function g(x) =|x| is convex. To see this, note that g(x) =
max{x,−x} and that both the function f 1 (x) = x and f 2 (x) =−x are convex.

12.1.2 Lipschitzness
The deﬁnition of Lipschitzness that follows is with respect to the Euclidean norm
d
over R . However, it is possible to deﬁne Lipschitzness with respect to any norm.
d
d
k
Deﬁnition 12.6 (Lipschitzness). Let C ⊂ R . A function f : R → R is ρ-Lipschitz
over C if for every w 1 ,w 2 ∈ C we have that f (w 1 ) − f (w 2 ) ≤ ρ w 1 − w 2 .
Intuitively, a Lipschitz function cannot change too fast. Note that if f : R → R is
differentiable, then by the mean value theorem we have

f (w 1 ) − f (w 2 ) = f (u)(w 1 − w 2 ),
where u is some point between w 1 and w 2 . It follows that if the derivative of f is
everywhere bounded (in absolute value) by ρ, then the function is ρ-Lipschitz.

Example 12.4.
The function f (x) =|x| is 1-Lipschitz over R. This follows from the triangle
inequality: For every x 1 ,x 2 ,
|x 1 |−|x 2 |= |x 1 − x 2 + x 2 |−|x 2 |≤ |x 1 − x 2 |+|x 2 |−|x 2 |=|x 1 − x 2 |.

Since this holds for both x 1 ,x 2 and x 2 ,x 1 , we obtain that ||x 1 |−|x 2 || ≤ |x 1 − x 2 |.
The function f (x) = log(1 + exp(x)) is 1-Lipschitz over R. To see this, observe
that

exp(x)

| f (x)|=

≤ 1.
1 + exp(x) exp( − x) + 1

141 142 143 144 145 146 147 148 149 150 151