Page 70 - Understanding Machine Learning
P. 70

The VC-Dimension
           52

                    To bound the left-hand side of Equation (6.4) we first note that for every h ∈ H,



                                            m

                 we can rewrite L D (h) = E S ∼D [L S  (h)], where S = z ,...,z is an additional i.i.d.
                                                                      m
                                                                1
                 sample. Therefore,



                           E   sup|L D (h) − L S (h)| = E  sup  
  E  L S (h) − L S (h) .



                          S∼D m                     S∼D  m  h∈H S ∼D  m
                               h∈H
                 A generalization of the triangle inequality yields


                                   E [L S                E |L S

                                
         (h) − L S (h)] ≤      (h) − L S (h)|,
                                
    m              
      m
                                 S ∼D                  S ∼D
                 and the fact that supermum of expectation is smaller than expectation of supremum
                 yields
                              sup  E |L S (h) − L S (h)|≤  E  sup|L S (h) − L S (h)|.


                                     m                    m
                              h∈H S ∼D                S ∼D h∈H
                 Formally, the previous two inequalities follow from Jensen’s inequality. Combining
                 all we obtain

                      E   sup |L D (h) − L S (h)| ≤  E  sup |L S   (h) − L S (h)|

                     S∼D m                     S,S ∼D m
                          h∈H                         h∈H

                                                            
 m



                                             =   E    sup  1 
 
  ( (h,z ) −  (h,z i ))
 .  (6.5)
                                                                     i
                                                    m
                                               S,S ∼D
                                                      h∈H m
                                                             i=1
                 The expectation on the right-hand side is over a choice of two i.i.d. samples S =
                 z 1 ,...,z m and S = z ,...,z .Since all of these 2m vectors are chosen i.i.d., nothing



                                   1     m
                 will change if we replace the name of the random vector z i with the name of the


                 random vector z . If we do it, instead of the term ( (h,z )− (h,z i )) in Equation (6.5)
                               i                                i
                                                                                     m

                 we will have the term −( (h,z ) −  (h,z i )). It follows that for every σ ∈{±1} we
                                            i
                 have that Equation (6.5) equals


                                                
 m


                                      E    sup  1 
 
  σ i ( (h,z ) −  (h,z i ))
                                                           i
                                         m
                                   S,S ∼D  h∈H m
                                                 i=1
                                               m
                 Since this holds for every σ ∈{±1} , it also holds if we sample each component of σ
                 uniformly at random from the uniform distribution over {±1}, denoted U ± .Hence,
                 Equation (6.5) also equals

                                                  
 m


                                  E     E    sup  1 
 
  σ i ( (h,z ) −  (h,z i ))
 ,

                                                             i
                                 σ∼U  m  S,S ∼D m

                                    ±        h∈H m
                                                   i=1
                 and by the linearity of expectation it also equals

                                                  
 m

                                   E    E    sup  1 
 
  σ i ( (h,z ) −  (h,z i ))
 .


                                                             i
                                      m

                                 S,S ∼D σ∼U m
                                          ±  h∈H m 
 i=1
   65   66   67   68   69   70   71   72   73   74   75