Page 392 - Understanding Machine Learning
P. 392
Measure Concentration
374
of the exponent function and Markov’s inequality, we have that for every t > 0
tZ
E[e ]
t(1+δ)p
tZ
P[Z > (1 + δ)p] = P[e > e ] ≤ . (B.5)
e (1+δ)tp
Next,
tZ
t
E[e ] = E[e i Z i ] = E e tZ i
i
= E[e tZ i ] by independence
i
t 0
= p i e + (1 − p i )e
i
t
= 1 + p i (e − 1)
i
t
≤ e p i (e −1) using 1 + x ≤ e x
i
t
p i (e −1)
= e i
t
= e (e −1)p .
Combining the equation with Equation (B.5) and choosing t = log(1 + δ) we obtain
Lemma B.3. Let Z 1 ,..., Z m be independent Bernoulli variables where for every i,
m m
P[Z i = 1] = p i and P[Z i = 0] = 1 − p i .Let p = i=1 p i and let Z = i=1 Z i . Then,
for any δ> 0,
−h(δ) p
P[Z > (1 + δ)p] ≤ e ,
where
h(δ) = (1 + δ)log(1 + δ) − δ.
2
Using the inequality h(a) ≥ a /(2 + 2a/3) we obtain
Lemma B.4. Using the notation of Lemma B.3 we also have
−p δ 2
P[Z > (1 + δ)p] ≤ e 2+2δ/3 .
For the other direction, we apply similar calculations:
E[e −tZ ]
−tZ
−t(1−δ)p
P[Z < (1 − δ)p] = P[ − Z > −(1 − δ)p] = P[e > e ] ≤ , (B.6)
e −(1−δ)tp