Page 146 - Understanding Machine Learning
P. 146
Convex Learning Problems
128
Proof. The first claim follows by
g(αu + (1 − α)v) = max f i (αu + (1 − α)v)
i
≤ max[α f i (u) + (1 − α) f i (v)]
i
≤ α max f i (u) + (1 − α)max f i (v)
i i
= αg(u) + (1 − α)g(v).
For the second claim
g(αu + (1 − α)v) = w i f i (αu + (1 − α)v)
i
≤ w i [α f i (u) + (1 − α) f i (v)]
i
= α w i f i (u) + (1 − α) w i f i (v)
i i
= αg(u) + (1 − α)g(v).
Example 12.3. The function g(x) =|x| is convex. To see this, note that g(x) =
max{x,−x} and that both the function f 1 (x) = x and f 2 (x) =−x are convex.
12.1.2 Lipschitzness
The definition of Lipschitzness that follows is with respect to the Euclidean norm
d
over R . However, it is possible to define Lipschitzness with respect to any norm.
d
d
k
Definition 12.6 (Lipschitzness). Let C ⊂ R . A function f : R → R is ρ-Lipschitz
over C if for every w 1 ,w 2 ∈ C we have that f (w 1 ) − f (w 2 ) ≤ ρ w 1 − w 2 .
Intuitively, a Lipschitz function cannot change too fast. Note that if f : R → R is
differentiable, then by the mean value theorem we have
f (w 1 ) − f (w 2 ) = f (u)(w 1 − w 2 ),
where u is some point between w 1 and w 2 . It follows that if the derivative of f is
everywhere bounded (in absolute value) by ρ, then the function is ρ-Lipschitz.
Example 12.4.
The function f (x) =|x| is 1-Lipschitz over R. This follows from the triangle
inequality: For every x 1 ,x 2 ,
|x 1 |−|x 2 |= |x 1 − x 2 + x 2 |−|x 2 |≤ |x 1 − x 2 |+|x 2 |−|x 2 |=|x 1 − x 2 |.
Since this holds for both x 1 ,x 2 and x 2 ,x 1 , we obtain that ||x 1 |−|x 2 || ≤ |x 1 − x 2 |.
The function f (x) = log(1 + exp(x)) is 1-Lipschitz over R. To see this, observe
that
1
exp(x)
| f (x)|=
=
≤ 1.
1 + exp(x) exp( − x) + 1