Page 351 - Understanding Machine Learning
P. 351

26.3 Generalization Bounds for SVM  333


              Finally, since the variables σ 1 ,...,σ m are independent we have
                                                        

                              m

                                     2
                          E      σ i x i   2  = E    σ i σ j  x i ,x j   
                          σ               σ
                              i=1            i, j
                                                            m

                                                                        2
                                       =     x i ,x j  E[σ i σ j ] +   x i ,x i  E σ i
                                                  σ                 σ
                                          i = j            i=1
                                          m

                                                2
                                                              2
                                       =     x i   ≤ m max x i   .
                                                2
                                                              2
                                                        i
                                          i=1
              Combining this with Equation (26.15) and Equation (26.16) we conclude our proof.
                 Next we bound the Rademacher complexity of H 1 ◦ S.
                                                         n
              Lemma 26.11. Let S = (x 1 ,...,x m ) be vectors in R . Then,

                                                         2log(2n)
                                                                 .
                                  R(H 1 ◦ S) ≤ max x i   ∞
                                               i            m
              Proof. Using Holder’s inequality we know that for any vectors w,v we have  w,v ≤
               w  1  v  ∞ . Therefore,

                                                      m

                                 mR(H 1 ◦ S) = E  sup   σ i a i
                                             σ
                                                a∈H 1 ◦S
                                                     i=1

                                                       m

                                           = E    sup     σ i  w,x i
                                             σ
                                                w: w  1 ≤1
                                                       i=1

                                                          m

                                           = E    sup  w,   σ i x i
                                             σ
                                                w: w  1 ≤1
                                                          i=1

                                                  m

                                           ≤ E      σ i x i   ∞ .              (26.17)
                                             σ
                                                 i=1
                                                                    √
                                                  m
              For each j ∈ [n], let v j = (x 1, j ,...,x m, j ) ∈ R . Note that  v j   2 ≤  m max i  x i   ∞ .Let
              V ={v 1 ,...,v n ,−v 1 ,...,−v n }. The right-hand side of Equation (26.17)is mR(V ).
              Using Massart lemma (Lemma 26.8) we have that

                                                     2log(2n)/m,
                                   R(V ) ≤ max x i   ∞
                                           i
              which concludes our proof.
              26.3 GENERALIZATION BOUNDS FOR SVM
              In this section we use Rademacher complexity to derive generalization bounds for
              generalized linear predictors with Euclidean norm constraint. We will show how this
              leads to generalization bounds for hard-SVM and soft-SVM.
   346   347   348   349   350   351   352   353   354   355   356