Page 360 - Understanding Machine Learning
P. 360
Proof of the Fundamental Theorem of Learning Theory
342
Denote A ={(1 [h(x 1 ) =y 1 ] ,...,1 [h(x m ) =y m ] ): h ∈ H}. This clearly implies that
em
d
|A| ≤ .
d
Combining this with Lemma 26.8 we obtain the following bound on the Rademacher
complexity:
2d log(em/d)
R(A) ≤ .
m
Using Theorem 26.5 we obtain that with probability of at least 1−δ, for every h ∈ H
we have that
8d log(em/d) 2log(2/δ)
L D (h) − L S (h) ≤ + .
m m
Repeating the previous argument for minus the zero-one loss and applying the
union bound we obtain that with probability of at least 1 − δ, for every h ∈ H it
holds that
8d log(em/d) 2log(4/δ)
|L D (h) − L S (h)|≤ +
m m
8d log(em/d) + 2log(4/δ)
≤ 2 .
m
To ensure that this is smaller than
we need
4
m ≥ · 8d log(m) + 8d log(e/d) + 2log(4/δ) .
2
Using Lemma A.2, a sufficient condition for the inequality to hold is that
32d 64d 8
m ≥ 4 · log + · 8d log(e/d) + 2log(4/δ) .
2
2
2
28.2 THE LOWER BOUND FOR THE AGNOSTIC CASE
Here, we prove that there exists C such that H is agnostic PAC learnable with
sample complexity
d + ln(1/δ)
m H (
,δ) ≥ C .
2
We will prove the lower bound in two parts. First, we will show that m(
,δ) ≥
2
0.5log(1/(4δ))/
, and second we will show that for every δ ≤ 1/8 we have that
2
m(
,δ) ≥ 8d/
. These two bounds will conclude the proof.
28.2.1 Showing That m(
,δ) ≥ 0.5log(1/(4δ))/
2
√
We first show that for any
< 1/ 2and any δ ∈ (0,1), we have that m(
,δ) ≥
2
2
0.5log(1/(4δ))/
. To do so, we show that for m ≤ 0.5log(1/(4δ))/
, H is not
learnable.
Choose one example that is shattered by H.That is, let c be an example such that
there are h + ,h − ∈ H for which h + (c) = 1and h − (c) =−1. Define two distributions,