Page 390 - Understanding Machine Learning
P. 390

Appendix B




                 Measure Concentration
















                 Let Z 1 ,..., Z m be an i.i.d. sequence of random variables and let µ be their mean.
                 The strong law of large numbers states that when m tends to infinity, the empirical
                 average,  1    m  Z i , converges to the expected value µ, with probability 1. Measure
                         m   i=1
                 concentration inequalities quantify the deviation of the empirical average from the
                 expectation when m is finite.


                 B.1 MARKOV’S INEQUALITY
                 We start with an inequality which is called Markov’s inequality. Let Z be a
                 nonnegative random variable. The expectation of Z canbe writtenasfollows:
                                                    ∞
                                                  F
                                           E[Z] =     P[Z ≥ x]dx.                    (B.1)
                                                   x=0
                 Since P[Z ≥ x] is monotonically nonincreasing we obtain
                                       a               a
                                     F               F
                       ∀a ≥ 0, E[Z] ≥    P[Z ≥ x]dx ≥     P[Z ≥ a]dx = a P[Z ≥ a].   (B.2)
                                      x=0              x=0
                 Rearranging the inequality yields Markov’s inequality:
                                                            E[Z]
                                         ∀a ≥ 0, P[Z ≥ a] ≤      .                   (B.3)
                                                              a
                    For random variables that take value in [0,1], we can derive from Markov’s
                 inequality the following.

                 Lemma B.1. Let Z be a random variable that takes values in [0,1]. Assume that
                 E[Z] = µ. Then, for any a ∈ (0,1),

                                                       µ − (1 − a)
                                          P[Z > 1 − a] ≥         .
                                                           a
                 This also implies that for every a ∈ (0,1),
                                                    µ − a
                                          P[Z > a] ≥      ≥ µ − a.
                                                     1 − a



           372
   385   386   387   388   389   390   391   392   393   394   395