Page 146 - Understanding Machine Learning
P. 146

Convex Learning Problems
           128

                 Proof. The first claim follows by
                                g(αu + (1 − α)v) = max f i (αu + (1 − α)v)
                                                  i
                                               ≤ max[α f i (u) + (1 − α) f i (v)]
                                                  i
                                               ≤ α max f i (u) + (1 − α)max f i (v)
                                                   i                i
                                               = αg(u) + (1 − α)g(v).

                 For the second claim

                               g(αu + (1 − α)v) =  w i f i (αu + (1 − α)v)
                                                i

                                             ≤    w i [α f i (u) + (1 − α) f i (v)]
                                                i

                                             = α    w i f i (u) + (1 − α)  w i f i (v)
                                                  i                 i
                                             = αg(u) + (1 − α)g(v).



                 Example 12.3. The function g(x) =|x| is convex. To see this, note that g(x) =
                 max{x,−x} and that both the function f 1 (x) = x and f 2 (x) =−x are convex.


                 12.1.2 Lipschitzness
                 The definition of Lipschitzness that follows is with respect to the Euclidean norm
                       d
                 over R . However, it is possible to define Lipschitzness with respect to any norm.
                                                                       d
                                                      d
                                                                            k
                 Definition 12.6 (Lipschitzness). Let C ⊂ R . A function f : R → R is ρ-Lipschitz
                 over C if for every w 1 ,w 2 ∈ C we have that   f (w 1 ) − f (w 2 ) ≤ ρ  w 1 − w 2  .
                    Intuitively, a Lipschitz function cannot change too fast. Note that if f : R → R is
                 differentiable, then by the mean value theorem we have

                                        f (w 1 ) − f (w 2 ) = f (u)(w 1 − w 2 ),
                 where u is some point between w 1 and w 2 . It follows that if the derivative of f is
                 everywhere bounded (in absolute value) by ρ, then the function is ρ-Lipschitz.

                 Example 12.4.
                    The function f (x) =|x| is 1-Lipschitz over R. This follows from the triangle
                     inequality: For every x 1 ,x 2 ,
                           |x 1 |−|x 2 |= |x 1 − x 2 + x 2 |−|x 2 |≤ |x 1 − x 2 |+|x 2 |−|x 2 |=|x 1 − x 2 |.

                     Since this holds for both x 1 ,x 2 and x 2 ,x 1 , we obtain that ||x 1 |−|x 2 || ≤ |x 1 − x 2 |.
                    The function f (x) = log(1 + exp(x)) is 1-Lipschitz over R. To see this, observe
                     that

                                                               1
                                            
 exp(x)

                                    | f (x)|=  
      
  =  
         
  ≤ 1.
                                             1 + exp(x)   exp( − x) + 1
   141   142   143   144   145   146   147   148   149   150   151