Page 343 - Understanding Machine Learning
P. 343

26





              Rademacher Complexities















              In Chapter 4 we have shown that uniform convergence is a sufficient condition for
              learnability. In this chapter we study the Rademacher complexity, which measures
              the rate of uniform convergence. We will provide generalization bounds based on
              this measure.


              26.1 THE RADEMACHER COMPLEXITY
              Recall the definition of an 
-representative sample from Chapter 4, repeated here
              for convenience.
              Definition 26.1 (
-Representative Sample).  A training set   S  is  called
              
-representative (w.r.t. domain Z, hypothesis class H, loss function  , and distri-
              bution D)if
                                       sup|L D (h) − L S (h)|≤ 
.
                                       h∈H
                 We have shown that if S is an 
/2 representative sample then the ERM rule is
              
-consistent, namely, L D (ERM H (S)) ≤ min h∈H L D (h) + 
.
                 To simplify our notation, let us denote
                                     def    def
                                   F =   ◦ H ={z  →  (h,z): h ∈ H},
              and given f ∈ F, we define
                                                             m
                                                          1
                               L D ( f ) = E [ f (z)],  L S ( f ) =  f (z i ).
                                       z∼D                m
                                                            i=1
              We define the representativeness of S with respect to F as the largest gap between
              the true error of a function f and its empirical error, namely,
                                            def
                                 Rep (F, S) = sup (L D ( f ) − L S ( f )).      (26.1)
                                     D
                                                f ∈F
                 Now, suppose we would like to estimate the representativeness of S using the
              sample S only. One simple idea is to split S into two disjoint sets, S = S 1 ∪ S 2 ;



                                                                                        325
   338   339   340   341   342   343   344   345   346   347   348