Page 394 - Understanding Machine Learning
P. 394

Measure Concentration
           376
                                      2
                 Setting λ = 4m
/(b − a) we obtain
                                                            2
                                                          2m
                                                        −
                                             P[X ≥ 
] ≤ e  (b−a) 2  .
                                                ¯
                                                              ¯                 ¯
                 Applying the same arguments on the variable −X we obtain that P[X ≤−
] ≤
                      2
                    2m
                  −
                 e  (b−a) 2  . The theorem follows by applying the union bound on the two cases.
                 Lemma B.7 (Hoeffding’s Lemma). Let X be a random variable that takes values in
                 the interval [a,b] and such that E[X] = 0. Then, for every λ> 0,
                                                        2
                                                       λ (b−a) 2
                                                 λX
                                              E[e  ] ≤ e  8  .
                 Proof. Since f (x) = e λx  is a convex function, we have that for every α ∈ (0,1), and
                  x ∈ [a,b],
                                          f (x) ≤ α f (a) + (1 − α) f (b).
                            b−x
                 Setting α =   ∈ [0,1] yields
                            b−a
                                               b − x  λa  x − a  λb
                                           λx
                                          e  ≤      e  +      e .
                                               b − a     b − a
                 Taking the expectation, we obtain that
                                    b − E[X]  λa  E[x] − a  λb  b   λa   a   λb
                              λX
                           E[e  ] ≤        e  +         e  =       e  −     e ,
                                     b − a        b − a       b − a     b − a
                                                                                  −a
                 whereweused thefact that E[X] = 0. Denote h = λ(b − a), p =        ,and
                                                                                 b−a
                                            h
                  L(h) =−hp + log(1 − p + pe ). Then, the expression on the right-hand side of
                 the equation can be rewritten as e L(h) . Therefore, to conclude our proof it suf-
                                         h  2
                 fices to show that L(h) ≤  . This follows from Taylor’s theorem using the facts
                                         8


                  L(0) = L (0) = 0and L (h) ≤ 1/4for all h.
                 B.5 BENNET’S AND BERNSTEIN’S INEQUALITIES
                 Bennet’s and Bernsein’s inequalities are similar to Chernoff’s bounds, but they
                 hold for any sequence of independent random variables. We state the inequalities
                 without proof, which can be found, for example, in Cesa-Bianchi and Lugosi (2006).
                 Lemma B.8 (Bennet’s Inequality). Let Z 1 ,..., Z m be independent random variables
                 with zero mean, and assume that Z i ≤ 1 with probability 1.Let
                                                     m
                                                   1
                                               2           2
                                              σ ≥      E[Z ].
                                                           i
                                                  m
                                                     i=1
                 Then for all 
> 0,

                                            m
                                                           2
                                                        −mσ h
                                        P     Z i >
 ≤ e      mσ  2  .
                                           i=1
                 where
                                         h(a) = (1 + a)log(1 + a) − a.
   389   390   391   392   393   394   395   396   397   398   399