Page 393 - Understanding Machine Learning

P. 393

B.4 Hoeffding’s Inequality 375

and

−tZ
−t
E[e ] = E[e i Z i ] = E e −tZ i
i

= E[e −tZ i ] by independence
i
−t
= 1 + p i (e − 1)
i
−t
≤ e p i (e −1) using 1 + x ≤ e x
i
= e (e −t −1)p .
Setting t =−log(1 − δ) yields
e −δp −ph(−δ)
P[Z < (1 − δ)p] ≤ = e .
e (1−δ)log(1−δ) p
It is easy to verify that h( − δ) ≥ h(δ) and hence
Lemma B.5. Using the notation of Lemma B.3 we also have

2
δ
P[Z < (1 − δ)p] ≤ e −ph(−δ) ≤ e −ph(δ) ≤ e −p 2+2δ/3 .

B.4 HOEFFDING’S INEQUALITY
Lemma B.6 (Hoeffding’s Inequality). Let Z 1 ,..., Z m be a sequence of i.i.d. random
1 m
¯
¯
variables and let Z = Z i . Assume that E[Z] = µ and P[a ≤ Z i ≤ b] = 1 for
m i=1
every i. Then, for any
> 0

1
2 2
m
P
Z i − µ
>
≤ 2 exp −2m
/(b − a) .

i=1
1
¯
Proof. Denote X i = Z i − E[Z i ]and X = X i . Using the monotonicity of the
m i
exponent function and Markov’s inequality, we have that for every λ> 0and
> 0,
λ ¯ X λ
−λ
λ ¯ X
¯
P[X ≥
] = P[e ≥ e ] ≤ e E[e ].
Using the independence assumption we also have

E[e λ ¯ X ] = E e λX i /m = E[e λX i /m ].
i i
By Hoeffding’s lemma (Lemma B.7 later), for every i we have
2
λ (b−a) 2
E[e λX i /m ] ≤ e 8m 2 .
Therefore,
2 2
λ (b−a) 2 2
λ (b−a)
−λ
P[X ≥
] ≤ e e 8m 2 = e −λ
+ 8m .
¯
i

388 389 390 391 392 393 394 395 396 397 398