Page 391 - Understanding Machine Learning
P. 391
B.3 Chernoff’s Bounds 373
Proof. Let Y = 1 − Z. Then Y is a nonnegative random variable with
E[Y] = 1 − E[Z] = 1 − µ. Applying Markov’s inequality on Y we obtain
E[Y] 1 − µ
P[Z ≤ 1 − a] = P[1 − Z ≥ a] = P[Y ≥ a] ≤ = .
a a
Therefore,
1 − µ a + µ − 1
P[Z > 1 − a] ≥ 1 − = .
a a
B.2 CHEBYSHEV’S INEQUALITY
2
Applying Markov’s inequality on the random variable (Z − E[Z]) we obtain
Chebyshev’s inequality:
Var[Z]
2 2
∀a > 0, P[|Z − E[Z]|≥ a] = P[(Z − E[Z]) ≥ a ] ≤ , (B.4)
a 2
2
where Var[Z] = E[(Z − E[Z]) ] is the variance of Z.
Consider the random variable 1 m Z i .Since Z 1 ,..., Z m are i.i.d. it is easy to
m i=1
verify that
m
1 Var[Z 1 ]
Var Z i = .
m m
i=1
Applying Chebyshev’s inequality, we obtain the following:
Lemma B.2. Let Z 1 ,..., Z m be a sequence of i.i.d. random variables and assume
that E[Z 1 ] = µ and Var[Z 1 ] ≤ 1. Then, for any δ ∈ (0,1), with probability of at least
1 − δ we have
m
1
1
Z i − µ
≤ .
m
δ m
i=1
Proof. Applying Chebyshev’s inequality we obtain that for all a > 0
m
1
Var[Z 1 ] 1
Z i − µ
> a ≤ ≤ .
m
ma ma
P
2 2
i=1
The proof follows by denoting the right-hand side δ and solving for a.
The deviation between the empirical average and the mean given previously
decreases polynomially with m. It is possible to obtain a significantly faster decrease.
In the sections that follow we derive bounds that decrease exponentially fast.
B.3 CHERNOFF’S BOUNDS
Let Z 1 ,..., Z m be independent Bernoulli variables where for every i, P[Z i = 1] = p i
m m
and P[Z i = 0] = 1− p i .Let p = p i and let Z = Z i . Using the monotonicity
i=1 i=1