Page 391 - Understanding Machine Learning
P. 391

B.3 Chernoff’s Bounds  373


              Proof. Let Y = 1 − Z.   Then Y is a nonnegative random variable with
              E[Y] = 1 − E[Z] = 1 − µ. Applying Markov’s inequality on Y we obtain
                                                              E[Y]   1 − µ
                         P[Z ≤ 1 − a] = P[1 − Z ≥ a] = P[Y ≥ a] ≤  =     .
                                                               a      a
              Therefore,
                                                  1 − µ  a + µ − 1
                                 P[Z > 1 − a] ≥ 1 −    =         .
                                                    a        a




              B.2 CHEBYSHEV’S INEQUALITY
                                                                          2
              Applying Markov’s inequality on the random variable (Z − E[Z]) we obtain
              Chebyshev’s inequality:

                                                                     Var[Z]
                                                             2   2
                       ∀a > 0, P[|Z − E[Z]|≥ a] = P[(Z − E[Z]) ≥ a ] ≤     ,     (B.4)
                                                                       a 2
                                        2
              where Var[Z] = E[(Z − E[Z]) ] is the variance of Z.
                 Consider the random variable  1    m  Z i .Since Z 1 ,..., Z m are i.i.d. it is easy to
                                            m  i=1
              verify that
                                             m
                                          1           Var[Z 1 ]
                                     Var       Z i  =        .
                                          m              m
                                            i=1
              Applying Chebyshev’s inequality, we obtain the following:
              Lemma B.2. Let Z 1 ,..., Z m be a sequence of i.i.d. random variables and assume
              that E[Z 1 ] = µ and Var[Z 1 ] ≤ 1. Then, for any δ ∈ (0,1), with probability of at least
              1 − δ we have

                                           m
                                       
 1         
 
   1

                                             Z i − µ
 ≤    .

                                       
m          
    δ m
                                          i=1
              Proof. Applying Chebyshev’s inequality we obtain that for all a > 0
                                 
   m
                                  
 1        
        Var[Z 1 ]  1

                                        Z i − µ
 > a  ≤      ≤     .

                                  
m         
         ma      ma
                               P 
                        2       2
                                     i=1
              The proof follows by denoting the right-hand side δ and solving for a.
                 The deviation between the empirical average and the mean given previously
              decreases polynomially with m. It is possible to obtain a significantly faster decrease.
              In the sections that follow we derive bounds that decrease exponentially fast.
              B.3 CHERNOFF’S BOUNDS

              Let Z 1 ,..., Z m be independent Bernoulli variables where for every i, P[Z i = 1] = p i
                                          m                m
              and P[Z i = 0] = 1− p i .Let p =  p i and let Z =  Z i . Using the monotonicity
                                          i=1              i=1
   386   387   388   389   390   391   392   393   394   395   396