Page 396 - Understanding Machine Learning
P. 396
Measure Concentration
378
Solving for t yields
2
t /2
= log(1/δ)
mL D (h) + t/3
log(1/δ)
2
⇒ t /2 − t − log(1/δ)mL D (h) = 0
3
*
2
log(1/δ) log (1/δ)
⇒ t = + + 2log(1/δ)mL D (h)
3 3 2
log(1/δ)
≤ 2 + 2log(1/δ)mL D (h)
3
Since 1 α i = L S (h) − L D (h), it follows that with probability of at least 1 − δ,
m i
log(1/δ) 2log(1/δ) L D (h)
L S (h) − L D (h) ≤ 2 + ,
3m m
which proves the first inequality. The second part of the lemma follows in a
similar way.
B.6 SLUD’S INEQUALITY
m
Let X bea(m, p) binomial variable. That is, X = Z i , where each Z i is 1 with
i=1
probability p and 0 with probability 1− p. Assume that p = (1−
)/2. Slud’s inequal-
ity (Slud 1977) tells us that P[X ≥ m/2] is lower bounded by the probability that
2
2
a normal variable will be greater than or equal to m
/(1 −
). The following
lemma follows by standard tail bounds for the normal distribution.
Lemma B.11. Let X be a (m, p) binomial variable and assume that p = (1 −
)/2.
Then,
1
2
2
P[X ≥ m/2] ≥ 1 − 1 − exp( − m
/(1 −
)) .
2
2
B.7 CONCENTRATION OF χ VARIABLES
Let X 1 ,..., X k be k independent normally distributed random variables. That is,
2
2
for all i, X i ∼ N(0,1). The distribution of the random variable X is called χ (chi
i
2
2
2
square) and the distribution of the random variable Z = X +···+X is called χ (chi
k
k
1
2
square with k degrees of freedom). Clearly, E[X ] = 1and E[Z] = k. The following
i
2
lemma states that X is concentrated around its mean.
k
2
Lemma B.12. Let Z ∼ χ . Then, for all
> 0 we have
k
2
P[Z ≤ (1 −
)k] ≤ e −
k/6 ,
and for all
∈ (0,3) we have
2
P[Z ≥ (1 +
)k] ≤ e −
k/6 .
Finally, for all
∈ (0,3),
2
P[(1 −
)k ≤ Z ≤ (1 +
)k] ≥ 1 − 2e −
k/6 .