Page 392 - Understanding Machine Learning

P. 392

Measure Concentration
374

of the exponent function and Markov’s inequality, we have that for every t > 0

tZ
E[e ]
t(1+δ)p
tZ
P[Z > (1 + δ)p] = P[e > e ] ≤ . (B.5)
e (1+δ)tp
Next,

tZ
t
E[e ] = E[e i Z i ] = E e tZ i
i

= E[e tZ i ] by independence
i
t 0
= p i e + (1 − p i )e
i
t
= 1 + p i (e − 1)
i
t
≤ e p i (e −1) using 1 + x ≤ e x
i
t
p i (e −1)
= e i
t
= e (e −1)p .
Combining the equation with Equation (B.5) and choosing t = log(1 + δ) we obtain

Lemma B.3. Let Z 1 ,..., Z m be independent Bernoulli variables where for every i,
m m

P[Z i = 1] = p i and P[Z i = 0] = 1 − p i .Let p = i=1 p i and let Z = i=1 Z i . Then,
for any δ> 0,
−h(δ) p
P[Z > (1 + δ)p] ≤ e ,

where

h(δ) = (1 + δ)log(1 + δ) − δ.

2
Using the inequality h(a) ≥ a /(2 + 2a/3) we obtain

Lemma B.4. Using the notation of Lemma B.3 we also have

−p δ 2
P[Z > (1 + δ)p] ≤ e 2+2δ/3 .

For the other direction, we apply similar calculations:

E[e −tZ ]
−tZ
−t(1−δ)p
P[Z < (1 − δ)p] = P[ − Z > −(1 − δ)p] = P[e > e ] ≤ , (B.6)
e −(1−δ)tp

387 388 389 390 391 392 393 394 395 396 397