Page 350 - Understanding Machine Learning
P. 350

Rademacher Complexities
           332


                 supremum is not affected by replacing a and a . Therefore,
                                                          m        m
                                   1

                         mR(A 1 ) ≤    E    sup   a 1 − a +  σ i a i +  σ i a   i  .  (26.13)
                                                       1
                                   2 σ 2 ,...,σ m  a,a ∈A

                                                          i=2     i=2
                 But, using the same equalities as in Equation (26.12), it is easy to see that the right-
                 hand side of Equation (26.13) exactly equals mR(A), which concludes our proof.

                 26.2 RADEMACHER COMPLEXITY OF LINEAR CLASSES

                 In this section we analyze the Rademacher complexity of linear classes. To simplify
                 the derivation we first define the following two classes:


                        H 1 ={x  → w,x  :  w  1 ≤ 1},  H 2 ={x  → w,x  :  w  2 ≤ 1}.  (26.14)
                    The following lemma bounds the Rademacher complexity of H 2 . We allow the
                 x i to be vectors in any Hilbert space (even infinite dimensional), and the bound
                 does not depend on the dimensionality of the Hilbert space. This property becomes
                 useful when analyzing kernel methods.

                 Lemma 26.10. Let S = (x 1 ,...,x m ) be vectors in a Hilbert space. Define: H 2 ◦ S =
                 {( w,x 1  ,..., w,x m  ):  w  2 ≤ 1}. Then,

                                                      max i  x i   2
                                           R(H 2 ◦ S) ≤  √      .
                                                           m
                 Proof. Using Cauchy-Schwartz inequality we know that for any vectors w,v we
                 have  w,v ≤  w  v . Therefore,

                                                          m


                                     mR(H 2 ◦ S) = E  sup    σ i a i
                                                 σ
                                                    a∈H 2 ◦S
                                                          i=1

                                                           m

                                               = E   sup     σ i  w,x i
                                                 σ
                                                    w: w ≤1
                                                          i=1

                                                              m

                                               = E   sup  w,    σ i x i
                                                 σ
                                                    w: w ≤1
                                                             i=1

                                                      m

                                               ≤ E      σ i x i   2 .              (26.15)
                                                 σ
                                                     i=1
                 Next, using Jensen’s inequality we have that
                                                         
                                                      1/2                1/2
                           0       0         0      0 2            0      0 2
                           0 m     0         0 m    0              0 m    0

                           0       0       0       0             0      0
                                                         ≤                 . (26.16)
                        E 0    σ i x i0  = E 0  σ i x i0       E 0    σ i x i0
                                        σ
                         σ 0       0         0      0           σ 0       0
                             i=1    2         i=1     2             i=1    2
   345   346   347   348   349   350   351   352   353   354   355