Page 393 - Understanding Machine Learning
P. 393

B.4 Hoeffding’s Inequality  375


              and



                          −tZ
                                   −t
                       E[e   ] = E[e  i  Z i ] = E  e  −tZ i
                                               i

                              =   E[e −tZ i ]                 by independence
                                 i
                                          −t
                              =    1 + p i (e  − 1)
                                 i
                                      −t
                              ≤   e p i (e  −1)               using 1 + x ≤ e x
                                i
                              = e (e −t −1)p .
              Setting t =−log(1 − δ) yields
                                                   e −δp      −ph(−δ)
                              P[Z < (1 − δ)p] ≤             = e     .
                                               e (1−δ)log(1−δ) p
              It is easy to verify that h( − δ) ≥ h(δ) and hence
              Lemma B.5. Using the notation of Lemma B.3 we also have

                                                                    2
                                                                   δ
                            P[Z < (1 − δ)p] ≤ e −ph(−δ)  ≤ e −ph(δ)  ≤ e −p  2+2δ/3  .

              B.4 HOEFFDING’S INEQUALITY
              Lemma B.6 (Hoeffding’s Inequality). Let Z 1 ,..., Z m be a sequence of i.i.d. random
                                 1    m
                                                        ¯
                             ¯
              variables and let Z =    Z i . Assume that E[Z] = µ and P[a ≤ Z i ≤ b] = 1 for
                                 m  i=1
              every i. Then, for any 
> 0

                                 m


                              
 1        
                   2       2
                               m
                           P 
      Z i − µ
 >
  ≤ 2 exp −2m 
 /(b − a)  .

                                 i=1
                                                   1
                                               ¯
              Proof. Denote X i = Z i − E[Z i ]and X =  X i . Using the monotonicity of the
                                                   m  i
              exponent function and Markov’s inequality, we have that for every λ> 0and 
> 0,
                                              λ ¯ X  λ
  −λ
   λ ¯ X
                                   ¯
                                 P[X ≥ 
] = P[e  ≥ e ] ≤ e  E[e  ].
              Using the independence assumption we also have


                                E[e λ ¯ X ] = E  e λX i /m  =  E[e λX i /m ].
                                            i           i
              By Hoeffding’s lemma (Lemma B.7 later), for every i we have
                                                     2
                                                    λ (b−a) 2
                                         E[e λX i /m ] ≤ e  8m 2  .
              Therefore,
                                                  2   2
                                                 λ (b−a)       2  2
                                                              λ (b−a)
                                          −λ
                               P[X ≥ 
] ≤ e     e  8m 2  = e −λ
+  8m  .
                                  ¯
                                              i
   388   389   390   391   392   393   394   395   396   397   398