Page 359 - Understanding Machine Learning

P. 359

Proof of the Fundamental Theorem
of Learning Theory

In this chapter we prove Theorem 6.8 from Chapter 6. We remind the reader the
conditions of the theorem, which will hold throughout this chapter: H is a hypothesis
class of functions from a domain X to {0,1}, the loss function is the 0 − 1 loss, and
VCdim(H) = d < ∞.
We shall prove the upper bound for both the realizable and agnostic cases and
shall prove the lower bound for the agnostic case. The lower bound for the realizable
case is left as an exercise.

28.1 THE UPPER BOUND FOR THE AGNOSTIC CASE

For the upper bound we need to prove that there exists C such that H is agnostic
PAC learnable with sample complexity
d + ln(1/δ)
m H (
,δ) ≤ C .

2
We will prove the slightly looser bound:
d log(d/
) + ln(1/δ)
m H (
,δ) ≤ C . (28.1)

2
The tighter bound in the theorem statement requires a more involved proof, in
which a more careful analysis of the Rademacher complexity using a technique
called “chaining” should be used. This is beyond the scope of this book.
To prove Equation (28.1), it sufﬁces to show that applying the ERM with a
sample size

32d 64d 8
m ≥ 4 · log + · 8d log(e/d) + 2log(4/δ)

2
2
2
yields an
,δ-learner for H. We prove this result on the basis of Theorem 26.5.
Let (x 1 , y 1 ),...,(x m , y m ) be a classiﬁcation training set. Recall that the Sauer-
Shelah lemma tells us that if VCdim(H) = d then
d

em
{(h(x 1 ),...,h(x m )) : h ∈ H} ≤

.
d
341

354 355 356 357 358 359 360 361 362 363 364