Page 394 - Understanding Machine Learning
P. 394
Measure Concentration
376
2
Setting λ = 4m
/(b − a) we obtain
2
2m
−
P[X ≥
] ≤ e (b−a) 2 .
¯
¯ ¯
Applying the same arguments on the variable −X we obtain that P[X ≤−
] ≤
2
2m
−
e (b−a) 2 . The theorem follows by applying the union bound on the two cases.
Lemma B.7 (Hoeffding’s Lemma). Let X be a random variable that takes values in
the interval [a,b] and such that E[X] = 0. Then, for every λ> 0,
2
λ (b−a) 2
λX
E[e ] ≤ e 8 .
Proof. Since f (x) = e λx is a convex function, we have that for every α ∈ (0,1), and
x ∈ [a,b],
f (x) ≤ α f (a) + (1 − α) f (b).
b−x
Setting α = ∈ [0,1] yields
b−a
b − x λa x − a λb
λx
e ≤ e + e .
b − a b − a
Taking the expectation, we obtain that
b − E[X] λa E[x] − a λb b λa a λb
λX
E[e ] ≤ e + e = e − e ,
b − a b − a b − a b − a
−a
whereweused thefact that E[X] = 0. Denote h = λ(b − a), p = ,and
b−a
h
L(h) =−hp + log(1 − p + pe ). Then, the expression on the right-hand side of
the equation can be rewritten as e L(h) . Therefore, to conclude our proof it suf-
h 2
fices to show that L(h) ≤ . This follows from Taylor’s theorem using the facts
8
L(0) = L (0) = 0and L (h) ≤ 1/4for all h.
B.5 BENNET’S AND BERNSTEIN’S INEQUALITIES
Bennet’s and Bernsein’s inequalities are similar to Chernoff’s bounds, but they
hold for any sequence of independent random variables. We state the inequalities
without proof, which can be found, for example, in Cesa-Bianchi and Lugosi (2006).
Lemma B.8 (Bennet’s Inequality). Let Z 1 ,..., Z m be independent random variables
with zero mean, and assume that Z i ≤ 1 with probability 1.Let
m
1
2 2
σ ≥ E[Z ].
i
m
i=1
Then for all
> 0,
m
2
−mσ h
P Z i >
≤ e mσ 2 .
i=1
where
h(a) = (1 + a)log(1 + a) − a.