Page 367 - Understanding Machine Learning
P. 367
28.3 The Upper Bound for the Realizable Case 349
therefore obtain that
P[A ∈ B ] ≤ E E 1 [|h∩A j |=0] 1 [|h∩A|≥α]
2m j∼J
A∼D
h∈H A
= E 1 [|h∩A|≥α] E 1 [|h∩A j |=0] .
2m j∼J
A∼D
h∈H A
Now, fix some A s.t. |h ∩ A|≥ α. Then, E j 1 [|h∩A j |=0] is the probability that when
choosing m balls from a bag with at least α red balls, we will never choose a red ball.
This probability is at most
m m −
m/4
(1 − α/(2m)) = (1 −
/4) ≤ e .
We therefore get that
−
m/4 −
m/4
P[A ∈ B ] ≤ E e ≤ e E |H A |.
2m 2m
A∼D A∼D
h∈H A
Using the definition of the growth function we conclude the proof of Claim 2.
d
Completing the Proof: By Sauer’s lemma we know that τ H (2m) ≤ (2em/d) .
Combining this with the two claims we obtain that
d −
m/4
P[S ∈ B] ≤ 2(2em/d) e .
We would like the right-hand side of the inequality to be at most δ;that is,
d −
m/4
2(2em/d) e ≤ δ.
Rearranging, we obtain the requirement
4d 4
4
m ≥ d log(2em/d) + log(2/δ) = log(m) + (d log(2e/d) + log(2/δ).
Using Lemma A.2, a sufficient condition for the preceding to hold is that
16d 8d 8
m ≥ log + (d log(2e/d) + log(2/δ).
A sufficient condition for this is that
16d 8d 16 1
m ≥ log + (d log(2e/d) + log(2/δ)
2
16d 8d 2e 8
= log + log(2/δ)
d
8 16e 2
= 2d log + log .
δ
and this concludes our proof.
28.3.1 From
-Nets to PAC Learnability
Theorem 28.4. Let H be a hypothesis class over X with VCdim(H) = d.Let D be a
distribution over X and let c ∈ H be a target hypothesis. Fix
,δ ∈ (0,1) and let m be
as defined in Theorem 28.3. Then, with probability of at least 1 −δ over a choice of m