Page 365 - Understanding Machine Learning
P. 365
28.3 The Upper Bound for the Realizable Case 347
1
Finally, Let = (L D (A(S)) − min h∈H L D (h)) and note that ∈ [0,1] (see
ρ
Equation (28.5)). Therefore, using Lemma B.1, we get that
P[L D (A(S)) − min L D (h) >
] = P > ≥ E[ ] −
h∈H ρ ρ
1
≥ − .
4 ρ
Choosing ρ = 8
we conclude that if m < 8d , then with probability of at least 1/8we
2
will have L D (A(S)) − min h∈H L D (h) ≥
.
28.3 THE UPPER BOUND FOR THE REALIZABLE CASE
Here we prove that there exists C such that H is PAC learnable with sample
complexity
d ln(1/
) + ln(1/δ)
m H (
,δ) ≤ C .
We do so by showing that for m ≥ C d ln(1/
)+ln(1/δ) , H is learnable using the ERM
rule. We prove this claim on the basis of the notion of
-nets.
Definition 28.2 (
-net). Let X be a domain. S ⊂ X is an
-net for H ⊂ 2 X with
respect to a distribution D over X if
∀h ∈ H : D(h) ≥
⇒ h ∩ S =∅.
Theorem 28.3. Let H ⊂ 2 with VCdim(H) = d.Fix
∈ (0,1), δ ∈ (0,1/4) and let
X
8 16e 2
m ≥ 2d log + log .
δ
m
Then, with probability of at least 1 − δ over a choice of S ∼ D we have that S is an
-net for H.
Proof. Let
B ={S ⊂ X : |S|= m, ∃h ∈ H,D(h) ≥
,h ∩ S =∅}
be the set of sets which are not
-nets. We need to bound P[S ∈ B]. Define
B ={(S,T ) ⊂ X : |S|=|T |= m, ∃h ∈ H,D(h) ≥
,h ∩ S =∅, |T ∩ h| >
m }.
2
Claim 1
P[S ∈ B] ≤ 2P[(S,T ) ∈ B ].
Proof of Claim 1:Since S and T are chosen independently we can write
P[(S,T ) ∈ B ] = E 1 [(S,T)∈B ] = E E 1 [(S,T)∈B ] .
2m S∼D m T ∼D m
(S,T)∼D