Page 348 - Understanding Machine Learning
P. 348

Rademacher Complexities
           330
                                                  m               N     ( j)         ( j)

                 Lemma 26.7. Let A be a subset of R and let A ={  j=1  α j a  : N ∈ N,∀ j,a  ∈

                  A,α j ≥ 0, α  1 = 1}. Then, R(A ) = R(A).
                 Proof. The main idea follows from the fact that for any vector v we have
                                                  N

                                           sup      α j v j = maxv j .
                                         α≥0: α  1 =1  j=1   j
                 Therefore,

                                                             m    N
                                                                       ( j)

                                 mR(A ) = E    sup     sup     σ i  α j a i
                                           σ
                                            α≥0: α  1 =1 a (1) ,...,a (N)  i=1  j=1
                                                     N        m
                                                                   ( j)
                                         = E   sup      α j sup  σ i a i
                                           σ
                                            α≥0: α  1 =1  a ( j)
                                                     j=1     i=1
                                                m

                                         = Esup   σ i a i
                                           σ
                                            a∈A
                                                i=1
                                         = mR(A),
                 and we conclude our proof.
                    The next lemma, due to Massart, states that the Rademacher complexity of a
                 finite set grows logarithmically with the size of the set.
                                                                                       m
                 Lemma 26.8 (Massart Lemma). Let A ={a 1 ,...,a N } be a finite set of vectors in R .
                           1    N
                                  a
                 Define ¯ a =   i=1 i . Then,
                           N

                                                           2log(N)
                                       R(A) ≤ max a − ¯ a          .
                                               a∈A           m
                 Proof. On the basis of Lemma 26.6, we can assume without loss of generality that

                  ¯ a = 0.Let λ> 0and let A ={λa 1 ,...,λa N }. We upper bound the Rademacher
                 complexity as follows:

                                                                   σ,a
                              mR(A ) = E max σ,a    = E log maxe
                                       σ  a∈A          σ      a∈A


                                     ≤ E log     e   σ,a
                                       σ
                                              a∈A


                                     ≤ log E     e   σ,a   // Jensen’s inequality
                                           σ
                                              a∈A

                                               m

                                     = log        E[e σ i a i  ] ,
                                                  σ i
                                           a∈A i=1

                 where the last equality occurs because the Rademacher variables are independent.
                 Next, using Lemma A.6 we have that for all a i ∈ R,
                                           exp(a i ) + exp( − a i )  2
                                   Ee σ i a i  =             ≤ exp(a /2),
                                                                    i
                                   σ i             2
   343   344   345   346   347   348   349   350   351   352   353