Page 66 - Understanding Machine Learning

P. 66

The VC-Dimension
48

6.4 THE FUNDAMENTAL THEOREM OF P AC LEARNING
We have already shown that a class of inﬁnite VC-dimension is not learnable. The
converse statement is also true, leading to the fundamental theorem of statistical
learning theory:

Theorem 6.7 (The Fundamental Theorem of Statistical Learning). Let H be a

hypothesis class of functions from a domain X to {0,1} and let the loss function be the

0−1 loss. Then, the following are equivalent:
1. H has the uniform convergence property.

2. Any ERM rule is a successful agnostic PAC learner for H.
3. H is agnostic PAC learnable.

4. H is PAC learnable.

5. Any ERM rule is a successful PAC learner for H.
6. H has a ﬁnite VC-dimension.

The proof of the theorem is given in the next section.
Not only does the VC-dimension characterize PAC learnability; it even deter-
mines the sample complexity.

Theorem 6.8 (The Fundamental Theorem of Statistical Learning – Quantitative

Version). Let H be a hypothesis class of functions from a domain X to {0,1} and let

the loss function be the 0−1 loss. Assume that VCdim(H) = d < ∞. Then, there are

absolute constants C 1 ,C 2 such that
1. H has the uniform convergence property with sample complexity

d + log(1/δ) UC d + log(1/δ)

C 1 2 ≤ m (
,δ) ≤ C 2 2

H
2. H is agnostic PAC learnable with sample complexity

d + log(1/δ) d + log(1/δ)

C 1 2 ≤ m H (
,δ) ≤ C 2 2

3. H is PAC learnable with sample complexity

d + log(1/δ) d log(1/
) + log(1/δ)

C 1 ≤ m H (
,δ) ≤ C 2

The proof of this theorem is given in Chapter 28.
Remark 6.3. We stated the fundamental theorem for binary classiﬁcation tasks. A
similar result holds for some other learning problems such as regression with the
absolute loss or the squared loss. However, the theorem does not hold for all learn-
ing tasks. In particular, learnability is sometimes possible even though the uniform
convergence property does not hold (we will see an example in Chapter 13, Exercise
6.2). Furthermore, in some situations, the ERM rule fails but learnability is possible
with other learning rules.

61 62 63 64 65 66 67 68 69 70 71