Page 357 - Understanding Machine Learning
P. 357

27.2 From Covering to Rademacher Complexity via Chaining  339


              We can now write
                                  1
                                         ∗
                           R(A) =   E σ,a
                                  m
                                                     M
                                  1           (M)          (k)  (k−1)
                                          ∗
                               =    E  σ,a − b    +     σ,b  − b
                                  m
                                                    k=1

                                                       M
                                  1  
          (M)        1
                                            ∗
                               ≤    E  σ  a − b     +       E sup σ,a  .
                                 m                        m
                                                       k=1     a∈ ˆ B k
                         √             (M)      −M                             c  −M
                                   ∗
              Since  σ =   m and  a − b    ≤ c2    , the first summand is at most √ 2  .
                                                                                m
              Additionally, by Massart lemma,

                                                      2
                  1                −k   2log(N(c2 −k , A) )   −k  log(N(c2 −k , A))
                    E sup σ,a ≤ 3c2                      = 6c2                  .
                  m                           m                        m
                      a∈ ˆ B k
              Therefore,
                                               M
                                     c2 −M  6c     −k
                              R(A) ≤ √    +       2    log(N(c2 −k , A)).
                                       m    m
                                               k=1
                 As a corollary we obtain the following:
              Lemma 27.5. Assume that there are α,β > 0 such that for any k ≥ 1 we have

                                       log(N(c2 −k , A)) ≤ α + βk.
              Then,
                                                6c
                                         R(A) ≤   (α + 2β).
                                                m
              Proof. The bound follows from Lemma 27.4 by taking M →∞ and noting that
                    −k              −k
                ∞  2  = 1and   ∞  k2   = 2.
                k=1            k=1
                                                                                m
              Example 27.2. Consider a set A which lies in a d dimensional subspace of R and
                                                                   √   d
                                                                  2c d
              such that c = max a∈A  a . We have shown that N(r, A) ≤   . Therefore, for
                                                                   r
              any k,


                                                             √
                                 log(N(c2 −k  , A)) ≤  d log 2 k+1  d

                                                         √     √
                                               ≤   d log(2 d) +  kd

                                                         √     √
                                               ≤   d log(2 d) +  dk.
              Hence Lemma 27.5 yields




                                6c          √      √          c  d log(d)
                        R(A) ≤        d log(2 d) + 2 d  = O               .
                                m                                  m
   352   353   354   355   356   357   358   359   360   361   362