Page 392 - Understanding Machine Learning
P. 392

Measure Concentration
           374

                 of the exponent function and Markov’s inequality, we have that for every t > 0

                                                                      tZ
                                                                   E[e ]
                                                          t(1+δ)p
                                                     tZ
                                  P[Z > (1 + δ)p] = P[e  > e   ] ≤       .           (B.5)
                                                                  e (1+δ)tp
                 Next,


                               tZ
                                       t
                            E[e ] = E[e  i  Z i  ] = E  e tZ i
                                                  i

                                 =    E[e  tZ i  ]              by independence
                                    i
                                          t          0
                                 =      p i e + (1 − p i )e
                                    i
                                              t
                                 =     1 + p i (e − 1)
                                    i
                                          t
                                 ≤    e p i (e −1)              using 1 + x ≤ e x
                                    i
                                          t
                                       p i (e −1)
                                 = e  i
                                      t
                                 = e (e −1)p .
                 Combining the equation with Equation (B.5) and choosing t = log(1 + δ) we obtain


                 Lemma B.3. Let Z 1 ,..., Z m be independent Bernoulli variables where for every i,
                                                             m                m

                 P[Z i = 1] = p i and P[Z i = 0] = 1 − p i .Let p =  i=1  p i and let Z =  i=1  Z i . Then,
                 for any δ> 0,
                                                           −h(δ) p
                                          P[Z > (1 + δ)p] ≤ e   ,

                 where

                                         h(δ) = (1 + δ)log(1 + δ) − δ.

                                              2
                    Using the inequality h(a) ≥ a /(2 + 2a/3) we obtain

                 Lemma B.4. Using the notation of Lemma B.3 we also have

                                                           −p  δ 2
                                         P[Z > (1 + δ)p] ≤ e  2+2δ/3  .

                    For the other direction, we apply similar calculations:

                                                                           E[e −tZ ]
                                                           −tZ
                                                                 −t(1−δ)p
                    P[Z < (1 − δ)p] = P[ − Z > −(1 − δ)p] = P[e  > e   ] ≤        ,  (B.6)
                                                                          e −(1−δ)tp
   387   388   389   390   391   392   393   394   395   396   397