Page 367 - Understanding Machine Learning

P. 367

28.3 The Upper Bound for the Realizable Case 349

therefore obtain that

P[A ∈ B ] ≤ E E 1 [|h∩A j |=0] 1 [|h∩A|≥α]
2m j∼J
A∼D
h∈H A

= E 1 [|h∩A|≥α] E 1 [|h∩A j |=0] .
2m j∼J
A∼D
h∈H A
Now, ﬁx some A s.t. |h ∩ A|≥ α. Then, E j 1 [|h∩A j |=0] is the probability that when
choosing m balls from a bag with at least α red balls, we will never choose a red ball.
This probability is at most
m m −
m/4
(1 − α/(2m)) = (1 −
/4) ≤ e .
We therefore get that

−
m/4 −
m/4
P[A ∈ B ] ≤ E e ≤ e E |H A |.
2m 2m
A∼D A∼D
h∈H A
Using the deﬁnition of the growth function we conclude the proof of Claim 2.
d
Completing the Proof: By Sauer’s lemma we know that τ H (2m) ≤ (2em/d) .
Combining this with the two claims we obtain that
d −
m/4
P[S ∈ B] ≤ 2(2em/d) e .
We would like the right-hand side of the inequality to be at most δ;that is,
d −
m/4
2(2em/d) e ≤ δ.
Rearranging, we obtain the requirement
4d 4
4
m ≥ d log(2em/d) + log(2/δ) = log(m) + (d log(2e/d) + log(2/δ).

Using Lemma A.2, a sufﬁcient condition for the preceding to hold is that

16d 8d 8
m ≥ log + (d log(2e/d) + log(2/δ).

A sufﬁcient condition for this is that

16d 8d 16 1
m ≥ log + (d log(2e/d) + log(2/δ)
2

16d 8d 2e 8
= log + log(2/δ)

8 16e 2
= 2d log + log .

δ
and this concludes our proof.

28.3.1 From
-Nets to PAC Learnability
Theorem 28.4. Let H be a hypothesis class over X with VCdim(H) = d.Let D be a
distribution over X and let c ∈ H be a target hypothesis. Fix
,δ ∈ (0,1) and let m be
as deﬁned in Theorem 28.3. Then, with probability of at least 1 −δ over a choice of m

362 363 364 365 366 367 368 369 370 371 372