Page 161 - Understanding Machine Learning
P. 161

13.3 Tikhonov Regularization as a Stabilizer  143


              It follows (using Theorem 13.2)that

                                                             2ρ 2
                                   E [L D (A(S)) − L S (A(S))] ≤  .
                                     m                       λm
                                  S∼D

              13.3.2 Smooth and Nonnegative Loss

              If the loss is β-smooth and nonnegative then it is also self-bounded (see
              Section 12.1):

                                                2
                                         ∇ f (w)  ≤ 2β f (w).                  (13.12)

                                      2β
              We further assume that λ ≥  , or, in other words, that β ≤ λm/2. By the smoothness
                                      m
              assumption we have that
                                                                   β
                    (i)                                (i)               (i)       2
                (A(S ),z i ) −  (A(S),z i ) ≤ø∇  (A(S),z i ), A(S ) − A(S) +   A(S ) − A(S)  .
                                                                   2
                                                                               (13.13)

              Using the Cauchy-Schwartz inequality and Equation (12.6) we further obtain that

                     (i)
                 (A(S ),z i ) −  (A(S),z i )
                                                           β
                                               (i)               (i)       2
                           ≤√∇ (A(S),z i )  A(S ) − A(S) +   A(S ) − A(S)
                                                           2
                                               (i)          β     (i)      2
                           ≤   2βØ (A(S),z i ) A(S ) − A(S) +   A(S ) − A(S)  .  (13.14)
                                                            2
              By a symmetric argument it holds that

                                    (i)

                    (A(S),z ) −  (A(S ),z )

                                                               β
                                        (i)       (i)               (i)       2
                             ≤   2βØ (A(S ),z ) A(S ) − A(S) +   A(S ) − A(S)  .
                                                               2
              Plugging these inequalities into Equation (13.10) and rearranging terms we obtain
              that

                                         √
                                           2β
                          (i)                                        (i)

                       A(S ) − A(S) ≤               (A(S),z i ) +   (A(S ),z ) .
                                       (λm − β)
              Combining the preceding with the assumption β ≤ λm/2 yields
                                         √
                                           8β
                            (i)                                    (i)

                         A(S ) − A(S) ≤           (A(S),z i ) +   (A(S ),z ) .
                                         λm
   156   157   158   159   160   161   162   163   164   165   166