Page 391 - Understanding Machine Learning

P. 391

B.3 Chernoff’s Bounds 373

Proof. Let Y = 1 − Z. Then Y is a nonnegative random variable with
E[Y] = 1 − E[Z] = 1 − µ. Applying Markov’s inequality on Y we obtain
E[Y] 1 − µ
P[Z ≤ 1 − a] = P[1 − Z ≥ a] = P[Y ≥ a] ≤ = .
a a
Therefore,
1 − µ a + µ − 1
P[Z > 1 − a] ≥ 1 − = .
a a

B.2 CHEBYSHEV’S INEQUALITY
2
Applying Markov’s inequality on the random variable (Z − E[Z]) we obtain
Chebyshev’s inequality:

Var[Z]
2 2
∀a > 0, P[|Z − E[Z]|≥ a] = P[(Z − E[Z]) ≥ a ] ≤ , (B.4)
a 2
2
where Var[Z] = E[(Z − E[Z]) ] is the variance of Z.
Consider the random variable 1 m Z i .Since Z 1 ,..., Z m are i.i.d. it is easy to
m i=1
verify that
m
1 Var[Z 1 ]
Var Z i = .
m m
i=1
Applying Chebyshev’s inequality, we obtain the following:
Lemma B.2. Let Z 1 ,..., Z m be a sequence of i.i.d. random variables and assume
that E[Z 1 ] = µ and Var[Z 1 ] ≤ 1. Then, for any δ ∈ (0,1), with probability of at least
1 − δ we have

Z i − µ
≤ .

m
δ m
i=1
Proof. Applying Chebyshev’s inequality we obtain that for all a > 0

1
Var[Z 1 ] 1

Z i − µ
> a ≤ ≤ .

m
ma ma
P
2 2
i=1
The proof follows by denoting the right-hand side δ and solving for a.
The deviation between the empirical average and the mean given previously
decreases polynomially with m. It is possible to obtain a signiﬁcantly faster decrease.
In the sections that follow we derive bounds that decrease exponentially fast.
B.3 CHERNOFF’S BOUNDS

Let Z 1 ,..., Z m be independent Bernoulli variables where for every i, P[Z i = 1] = p i
m m
and P[Z i = 0] = 1− p i .Let p = p i and let Z = Z i . Using the monotonicity
i=1 i=1

386 387 388 389 390 391 392 393 394 395 396