Page 67 - Understanding Machine Learning
P. 67
6.5 Proof of Theorem 6.7 49
6.5 PROOF OF THEOREM 6.7
We have already seen that 1 → 2 in Chapter 4. The implications 2 → 3and 3 → 4
are trivial and so is 2 → 5. The implications 4 → 6and 5 → 6 follow from the No-
Free-Lunch theorem. The difficult part is to show that 6 → 1. The proof is based on
two main claims:
If VCdim(H) = d, then even though H might be infinite, when restricting it to
d
a finite set C ⊂ X, its “effective” size, |H C |, is only O(|C| ). That is, the size of
H C grows polynomially rather than exponentially with |C|. This claim is often
referred to as Sauer’s lemma, but it has also been stated and proved indepen-
dently by Shelah and by Perles. The formal statement is given in Section 6.5.1
later.
In Section 4 we have shown that finite hypothesis classes enjoy the uniform con-
vergence property. In Section 6.5.2 later we generalize this result and show
that uniform convergence holds whenever the hypothesis class has a “small
effective size.” By “small effective size” we mean classes for which |H C | grows
polynomially with |C|.
6.5.1 Sauer’s Lemma and the Growth Function
We defined the notion of shattering, by considering the restriction of H to a finite
set of instances. The growth function measures the maximal “effective” size of H on
aset of m examples. Formally:
Definition 6.9 (Growth Function). Let H be a hypothesis class. Then the growth
function of H, denoted τ H : N → N, is defined as
τ H (m) = max |H C |.
C⊂X :|C|=m
In words, τ H (m) is the number of different functions from a set C of size m to {0,1}
that can be obtained by restricting H to C.
m
Obviously, if VCdim(H) = d then for any m ≤ d we have τ H (m) = 2 .In such
cases, H induces all possible functions from C to {0,1}. The following beautiful
lemma, proposed independently by Sauer, Shelah, and Perles, shows that when m
becomes larger than the VC-dimension, the growth function increases polynomially
rather than exponentially with m.
Lemma 6.10 (Sauer-Shelah-Perles). Let H be a hypothesis class with VCdim(H) ≤
d m
d < ∞. Then, for all m, τ H (m) ≤ i=0 i . In particular, if m > d + 1 then τ H (m) ≤
d
(em/d) .
Proof of Sauer’s Lemma*
To prove the lemma it suffices to prove the following stronger claim: For any C =
{c 1 ,...,c m } we have
∀H, |H C |≤ |{B ⊆ C : H shatters B}|. (6.3)