Page 395 - Understanding Machine Learning
P. 395

B.5 Bennet’s and Bernstein’s Inequalities  377

                                            2
                 By using the inequality h(a) ≥ a /(2+2a/3) it is possible to derive the following:

              Lemma B.9 (Bernstein’s Inequality). Let Z 1 ,..., Z m be i.i.d. random variables with
              a zero mean. If for all i, P(|Z i | < M) = 1, then for all t > 0:




                                   m                      2
                                                         t /2
                               P     Z i > t ≤ exp −               .
                                                          2
                                                       E Z + Mt/3
                                  i=1                     j
              B.5.1 Application

              Bernstein’s inequality can be used to interpolate between the rate 1/
 we derived
                                                                          2
              for PAC learning in the realizable case (in Chapter 2)and therate1/
 we derived
              for the unrealizable case (in Chapter 4).

              Lemma B.10. Let   : H × Z → [0,1] be a loss function. Let D be an arbitrary
              distribution over Z. Fix some h. Then, for any δ ∈ (0,1) we have



                                                2L D (h)log(1/δ)  2log(1/δ)
                     1.   P    L S (h) ≥ L D (h) +            +            ≤ δ
                         S∼D m                       3m             m


                                                2L S (h)log(1/δ)  4log(1/δ)
                     2.   P    L D (h) ≥ L S (h) +            +           ≤ δ
                            m
                         S∼D                          m            m
              Proof. Define random variables α 1 ,...,α m s.t. α i =  (h,z i ) − L D (h). Note that
              E[α i ] = 0and that

                               2
                                           2
                            E[α ] = E[ (h,z i ) ] − 2L D (h)E[ (h,z i )] + L D (h) 2
                               i
                                           2
                                 = E[ (h,z i ) ] − L D (h) 2
                                           2
                                 ≤ E[ (h,z i ) ]
                                 ≤ E[ (h,z i )] = L D (h),

                                                                                   2
              where in the last inequality we used the fact that  (h,z i ) ∈ [0,1] and thus  (h,z i ) ≤
               (h,z i ). Applying Bernsein’s inequality over the α i ’s yields



                                 m                     2
                                                      t /2
                              P     α i > t ≤ exp −
                                                        2
                                                     Eα + t/3
                                 i=1                    j
                                                        2
                                                       t /2      def
                                          ≤ exp −                = δ.
                                                   mL D (h) + t/3
   390   391   392   393   394   395   396   397   398   399   400