Page 396 - Understanding Machine Learning
P. 396

Measure Concentration
           378

                 Solving for t yields
                                   2
                                  t /2
                                          = log(1/δ)
                              mL D (h) + t/3
                                         log(1/δ)
                                    2
                                ⇒ t /2 −         t − log(1/δ)mL D (h) = 0
                                            3
                                                *
                                                     2
                                       log(1/δ)    log (1/δ)
                                ⇒ t =          +           + 2log(1/δ)mL D (h)
                                          3           3 2
                                   log(1/δ)
                                ≤ 2        +   2log(1/δ)mL D (h)
                                      3
                 Since  1     α i = L S (h) − L D (h), it follows that with probability of at least 1 − δ,
                       m   i

                                                log(1/δ)    2log(1/δ) L D (h)
                                L S (h) − L D (h) ≤ 2   +                 ,
                                                  3m              m
                 which proves the first inequality. The second part of the lemma follows in a
                 similar way.


                 B.6 SLUD’S INEQUALITY
                                                               m

                 Let X bea(m, p) binomial variable. That is, X =  Z i , where each Z i is 1 with
                                                               i=1
                 probability p and 0 with probability 1− p. Assume that p = (1−
)/2. Slud’s inequal-
                 ity (Slud 1977) tells us that P[X ≥ m/2] is lower bounded by the probability that

                                                                          2
                                                                   2
                 a normal variable will be greater than or equal to  m
 /(1 − 
 ). The following
                 lemma follows by standard tail bounds for the normal distribution.
                 Lemma B.11. Let X be a (m, p) binomial variable and assume that p = (1 − 
)/2.
                 Then,

                                            1
                                                                       2
                                                                2
                               P[X ≥ m/2] ≥    1 −  1 − exp( − m
 /(1 − 
 )) .
                                            2
                                            2
                 B.7 CONCENTRATION OF χ VARIABLES
                 Let X 1 ,..., X k be k independent normally distributed random variables. That is,
                                                                                    2
                                                                          2
                 for all i, X i ∼ N(0,1). The distribution of the random variable X is called χ (chi
                                                                          i
                                                                   2
                                                                                     2
                                                                           2
                 square) and the distribution of the random variable Z = X +···+X is called χ (chi
                                                                                    k
                                                                           k
                                                                   1
                                                           2
                 square with k degrees of freedom). Clearly, E[X ] = 1and E[Z] = k. The following
                                                           i
                                   2
                 lemma states that X is concentrated around its mean.
                                   k
                                        2
                 Lemma B.12. Let Z ∼ χ . Then, for all 
> 0 we have
                                       k
                                                             2
                                          P[Z ≤ (1 − 
)k] ≤ e −
 k/6 ,
                 and for all 
 ∈ (0,3) we have
                                                             2
                                          P[Z ≥ (1 + 
)k] ≤ e −
 k/6 .
                 Finally, for all 
 ∈ (0,3),
                                                                   2
                                    P[(1 − 
)k ≤ Z ≤ (1 + 
)k] ≥ 1 − 2e −
 k/6 .
   391   392   393   394   395   396   397   398   399   400   401