Page 365 - Understanding Machine Learning

P. 365

28.3 The Upper Bound for the Realizable Case 347

1
Finally, Let = (L D (A(S)) − min h∈H L D (h)) and note that ∈ [0,1] (see
ρ
Equation (28.5)). Therefore, using Lemma B.1, we get that

P[L D (A(S)) − min L D (h) >
] = P > ≥ E[ ] −
h∈H ρ ρ
1
≥ − .
4 ρ
Choosing ρ = 8
we conclude that if m < 8d , then with probability of at least 1/8we

2
will have L D (A(S)) − min h∈H L D (h) ≥
.

28.3 THE UPPER BOUND FOR THE REALIZABLE CASE

Here we prove that there exists C such that H is PAC learnable with sample
complexity
d ln(1/
) + ln(1/δ)
m H (
,δ) ≤ C .

We do so by showing that for m ≥ C d ln(1/
)+ln(1/δ) , H is learnable using the ERM

rule. We prove this claim on the basis of the notion of
-nets.

Deﬁnition 28.2 (
-net). Let X be a domain. S ⊂ X is an
-net for H ⊂ 2 X with
respect to a distribution D over X if

∀h ∈ H : D(h) ≥
⇒ h ∩ S =∅.

Theorem 28.3. Let H ⊂ 2 with VCdim(H) = d.Fix
∈ (0,1), δ ∈ (0,1/4) and let
X

8 16e 2
m ≥ 2d log + log .

δ
m
Then, with probability of at least 1 − δ over a choice of S ∼ D we have that S is an

-net for H.

Proof. Let
B ={S ⊂ X : |S|= m, ∃h ∈ H,D(h) ≥
,h ∩ S =∅}
be the set of sets which are not
-nets. We need to bound P[S ∈ B]. Deﬁne

B ={(S,T ) ⊂ X : |S|=|T |= m, ∃h ∈ H,D(h) ≥
,h ∩ S =∅, |T ∩ h| >
m }.
2

Claim 1
P[S ∈ B] ≤ 2P[(S,T ) ∈ B ].

Proof of Claim 1:Since S and T are chosen independently we can write

P[(S,T ) ∈ B ] = E 1 [(S,T)∈B ] = E E 1 [(S,T)∈B ] .

2m S∼D m T ∼D m
(S,T)∼D

360 361 362 363 364 365 366 367 368 369 370