Page 159 - Understanding Machine Learning
P. 159

13.3 Tikhonov Regularization as a Stabilizer  141





                                                     f (u)




                                     f(w)                ≥  λ  α(1 − α)||u − w || 2
                                                          2
                                         w              u
                                          αw + (1−α) u

                 The following lemma implies that the objective of RLM is (2λ)-strongly convex.
              In addition, it underscores an important property of strong convexity.

              Lemma 13.5.

                                          2
                1. The function f (w) = λ w  is 2λ-strongly convex.
                2. If f is λ-strongly convex and g is convex, then f + g is λ-strongly convex.
                3. If f is λ-strongly convex and u is a minimizer of f , then, for any w,

                                                      λ       2
                                         f (w) − f (u) ≥  w − u  .
                                                      2
              Proof. The first two points follow directly from the definition. To prove the last
              point, we divide the definition of strong convexity by α and rearrange terms to get
              that

                         f (u + α(w − u)) − f (u)           λ             2
                                             ≤ f (w) − f (u) − (1 − α) w − u  .
                                  α                         2

              Taking the limit α → 0 we obtain that the right-hand side converges to f (w)− f (u)−
              λ  w − u  . On the other hand, the left-hand side becomes the derivative of the
                      2
              2
              function g(α) = f (u + α(w − u)) at α = 0. Since u is a minimizer of f , it follows that
              α = 0 is a minimizer of g, and therefore the left-hand side of the preceding goes to
              zero in the limit α → 0, which concludes our proof.


                 We now turn to prove that RLM is stable. Let S = (z 1 ,...,z m ) be a training set,

              let z be an additional example, and let S (i)  = (z 1 ,...,z i−1 , z , z i+1 ,...,z m ). Let A be

              the RLM rule, namely,

                                                             2
                                   A(S) = argmin L S (w) + λ w   .
                                            w
                                        2
              Denote f S (w) = L S (w)+λ w  , and on the basis of Lemma 13.5 we know that f S is
              (2λ)-strongly convex. Relying on part 3 of the lemma, it follows that for any v,

                                                              2
                                    f S (v) − f S (A(S)) ≥ λ v − A(S)  .        (13.7)
   154   155   156   157   158   159   160   161   162   163   164