Page 344 - Understanding Machine Learning
P. 344

Rademacher Complexities
           326

                 refer to S 1 as a validation set and to S 2 as a training set. We can then estimate
                 the representativeness of S by

                                                           ( f )).                  (26.2)
                                            sup (L S 1  ( f ) − L S 2
                                            f ∈F
                                                                                 m
                 This can be written more compactly by defining σ = (σ 1 ,...,σ m ) ∈{±1} to be a
                 vector such that S 1 ={z i : σ i = 1} and S 2 ={z i : σ i =−1}. Then, if we further assume
                 that |S 1 |=|S 2 | then Equation (26.2) can be rewritten as

                                                     m
                                             2
                                                sup    σ i f (z i ).                (26.3)
                                             m f ∈F
                                                    i=1
                 The Rademacher complexity measure captures this idea by considering the expec-
                 tation of the term appearing in Equation 26.3 with respect to a random choice of σ.
                 Formally, let F ◦ S be the set of all possible evaluations a function f ∈ F can achieve
                 on a sample S, namely,

                                      F ◦ S ={( f (z 1 ),..., f (z m )) : f ∈ F}.

                                                                                        1
                 Let the variables in σ be distributed i.i.d. according to P[σ i = 1] = P[σ i =−1] = .
                                                                                        2
                 Then, the Rademacher complexity of F with respect to S is defined as follows:
                                                              m
                                           def 1
                                   R(F ◦ S) =      E     sup    σ i f (z i ) .      (26.4)
                                                      m
                                              m σ∼{±1}
                                                         f ∈F
                                                             i=1
                                                        m
                 More generally, given a set of vectors, A ⊂ R , we define
                                                          m
                                              def 1
                                         R(A) =     E sup    σ i a i .              (26.5)
                                                 m σ  a∈A
                                                          i=1
                    The following lemma bounds the expected value of the representativeness of S
                 by twice the expected Rademacher complexity.
                 Lemma 26.2.
                                      E [ Rep (F, S)] ≤ 2 E R(F ◦ S).
                                        m     D              m
                                    S∼D                   S∼D

                 Proof. Let S ={z ,...,z } be another i.i.d. sample. Clearly, for all f ∈ F, L D ( f ) =
                                 1     m
                        (
                 E S [L S f )]. Therefore, for every f ∈ F we have

                           L D ( f ) − L S ( f ) = E[L S ( f )] − L S ( f ) = E[L S ( f ) − L S ( f )].


                                            S                  S
                 Taking supremum over f ∈ F of both sides, and using the fact that the supremum
                 of expectation is smaller than expectation of the supremum we obtain
                                                              (
                              sup (L D ( f ) − L S ( f )) = sup E[L S f ) − L S ( f )]
                                                     f ∈F S
                               f ∈F

                                                  ≤ E   sup (L S ( f ) − L S ( f )) .

                                                     S     f ∈F
   339   340   341   342   343   344   345   346   347   348   349