Page 349 - Understanding Machine Learning
P. 349

26.1 The Rademacher Complexity  331


              and therefore

                                      m        2
                                              a i                    2

                     mR(A ) ≤ log       exp        = log     exp  a  /2
                                              2
                                  a∈A i=1                 a∈A



                                                 2
                                                                          2

                            ≤ log |A |maxexp  a  /2    = log(|A |) + max( a  /2).

                                      a∈A                          a∈A
                          1
              Since R(A) = R(A ) we obtain from the equation that

                          λ
                                                  2
                                                              2
                                        log(|A|) + λ max a∈A ( a  /2)
                                 R(A) ≤                           .
                                                   λm

                                              2
              Setting λ =  2log(|A|)/max a∈A  a  and rearranging terms we conclude our
              proof.
                 The following lemma shows that composing A with a Lipschitz function does not
              blow up the Rademacher complexity. The proof is due to Kakade and Tewari.
              Lemma 26.9 (Contraction Lemma). For each i ∈ [m],let φ i : R → R be a ρ-Lipschitz
                                                                                  m
              function; namely, for all α,β ∈ R we have |φ i (α) − φ i (β)|≤ ρ |α − β|.For a ∈ R let
              φ(a) denote the vector (φ 1 (a 1 ),...,φ m (y m )).Let φ ◦ A ={φ(a): a ∈ A}. Then,
                                          R(φ ◦ A) ≤ ρ R(A).
              Proof. For simplicity, we prove the lemma for the case ρ = 1. The case ρ  =
                                            1
              1 will follow by defining φ =   φ and then using Lemma 26.6.Let A i =
                                            ρ
              {(a 1 ,...,a i−1 ,φ i (a i ),a i+1 ,...,a m ): a ∈ A}. Clearly, it suffices to prove that for any
              set A and all i we have R(A i ) ≤ R(A). Without loss of generality we will prove the
              latter claim for i = 1 and to simplify notation we omit the subscript from φ 1 . We have
                                   m


                  mR(A 1 ) = E sup   σ i a i
                            σ
                              a∈A 1 i=1

                                           m

                          = E supσ 1 φ(a 1 ) +  σ i a i
                            σ
                               a∈A
                                           i=2
                                                 m                      m
                            1
                          =     E   sup φ(a 1 ) +  σ i a i  + sup −φ(a 1 ) +  σ i a i
                            2 σ 2 ,...,σ m  a∈A           a∈A
                                                i=2                    i=2
                                                         m        m
                            1

                          =     E    sup  φ(a 1 ) − φ(a ) +  σ i a i +  σ i a   i
                                                    1
                            2 σ 2 ,...,σ m  a,a ∈A

                                                        i=2      i=2
                                                    m        m
                            1

                          ≤    E     sup  |a 1 − a |+  σ i a i +  σ i a i    ,  (26.12)
                                                1
                            2 σ 2 ,...,σ m
                                    a,a ∈A          i=2     i=2
              where in the last inequality we used the assumption that φ is Lipschitz. Next, we
              note that the absolute value on |a 1 − a | in the preceding expression can be omitted

                                              1

              since both a and a are from the same set A and the rest of the expression in the
   344   345   346   347   348   349   350   351   352   353   354