Page 359 - Understanding Machine Learning
P. 359
28
Proof of the Fundamental Theorem
of Learning Theory
In this chapter we prove Theorem 6.8 from Chapter 6. We remind the reader the
conditions of the theorem, which will hold throughout this chapter: H is a hypothesis
class of functions from a domain X to {0,1}, the loss function is the 0 − 1 loss, and
VCdim(H) = d < ∞.
We shall prove the upper bound for both the realizable and agnostic cases and
shall prove the lower bound for the agnostic case. The lower bound for the realizable
case is left as an exercise.
28.1 THE UPPER BOUND FOR THE AGNOSTIC CASE
For the upper bound we need to prove that there exists C such that H is agnostic
PAC learnable with sample complexity
d + ln(1/δ)
m H (
,δ) ≤ C .
2
We will prove the slightly looser bound:
d log(d/
) + ln(1/δ)
m H (
,δ) ≤ C . (28.1)
2
The tighter bound in the theorem statement requires a more involved proof, in
which a more careful analysis of the Rademacher complexity using a technique
called “chaining” should be used. This is beyond the scope of this book.
To prove Equation (28.1), it suffices to show that applying the ERM with a
sample size
32d 64d 8
m ≥ 4 · log + · 8d log(e/d) + 2log(4/δ)
2
2
2
yields an
,δ-learner for H. We prove this result on the basis of Theorem 26.5.
Let (x 1 , y 1 ),...,(x m , y m ) be a classification training set. Recall that the Sauer-
Shelah lemma tells us that if VCdim(H) = d then
d
em
{(h(x 1 ),...,h(x m )) : h ∈ H} ≤
.
d
341