Page 345 - Understanding Machine Learning
P. 345

26.1 The Rademacher Complexity  327


              Taking expectation over S on both sides we obtain

                   E   sup (L D ( f ) − L S ( f )) ≤ E  sup (L S ( f ) − L S ( f ))

                   S                          S,S
                       f ∈F                        f ∈F
                                                          m
                                              1

                                           =     E   sup    ( f (z ) − f (z i )) .  (26.6)
                                                                i
                                              m S,S
                                                     f ∈F
                                                         i=1

              Next, we note that for each j, z j and z are i.i.d. variables. Therefore, we can replace
                                              j
              them without affecting the expectation:
                                                                 



                         E    sup   ( f (z ) − f (z j )) +  ( f (z ) − f (z i )) 
                                        j
                                                          i
                         S,S
                              f ∈F
                                                   i = j
                                                                     



                           = E    sup   ( f (z j ) − f (z )) +  ( f (z ) − f (z i ))  .  (26.7)
                                                              i
                                                   j
                             S,S
                                  f ∈F
                                                       i = j
              Let σ j be a random variable such that P[σ j = 1] = P[σ j =−1] = 1/2. From
              Equation (26.7) we obtain that
                                                                   



                         E    sup   σ j ( f (z ) − f (z j )) +  ( f (z ) − f (z i )) 
                                          j
                                                            i

                       S,S ,σ j
                              f ∈F
                                                     i = j
                           1                         1
                         = (l.h.s. of Equation (26.7)) + (r.h.s. of Equation (26.7))
                           2                         2
                                                                   



                         = E    sup   ( f (z ) − f (z j )) +  ( f (z ) − f (z i ))   .  (26.8)
                                          j
                                                             i
                           S,S
                                f ∈F
                                                      i = j
              Repeating this for all j we obtain that

                           m                             m


                   E   sup    ( f (z ) − f (z i ))  = E  sup  σ i ( f (z ) − f (z i )) .  (26.9)

                                                                 i
                                 i
                  S,S                         S,S ,σ

                       f ∈F                          f ∈F
                           i=1                          i=1
              Finally,


                       sup    σ i ( f (z ) − f (z i )) ≤ sup  σ i f (z ) + sup  −σ i f (z i )

                                   i                      i
                        f ∈F                   f ∈F           f ∈F
                            i                      i              i
              and since the probability of σ is the same as the probability of −σ, the right-hand
              side of Equation (26.9) can be bounded by


                          E    sup   σ i f (z ) + sup  σ i f (z i )

                                          i

                         S,S ,σ
                               f ∈F           f ∈F
                                   i              i

                             = m E[R(F ◦ S )] + m E[R(F ◦ S)] = 2m E[R(F ◦ S)].
                                 S              S               S
                 The lemma immediately yields that, in expectation, the ERM rule finds a
              hypothesis which is close to the optimal hypothesis in H.
   340   341   342   343   344   345   346   347   348   349   350