Page 67 - Understanding Machine Learning

P. 67

6.5 Proof of Theorem 6.7 49

6.5 PROOF OF THEOREM 6.7
We have already seen that 1 → 2 in Chapter 4. The implications 2 → 3and 3 → 4
are trivial and so is 2 → 5. The implications 4 → 6and 5 → 6 follow from the No-
Free-Lunch theorem. The difﬁcult part is to show that 6 → 1. The proof is based on
two main claims:

If VCdim(H) = d, then even though H might be inﬁnite, when restricting it to
d
a ﬁnite set C ⊂ X, its “effective” size, |H C |, is only O(|C| ). That is, the size of
H C grows polynomially rather than exponentially with |C|. This claim is often
referred to as Sauer’s lemma, but it has also been stated and proved indepen-
dently by Shelah and by Perles. The formal statement is given in Section 6.5.1
later.
In Section 4 we have shown that ﬁnite hypothesis classes enjoy the uniform con-
vergence property. In Section 6.5.2 later we generalize this result and show
that uniform convergence holds whenever the hypothesis class has a “small
effective size.” By “small effective size” we mean classes for which |H C | grows
polynomially with |C|.

6.5.1 Sauer’s Lemma and the Growth Function

We deﬁned the notion of shattering, by considering the restriction of H to a ﬁnite
set of instances. The growth function measures the maximal “effective” size of H on
aset of m examples. Formally:
Deﬁnition 6.9 (Growth Function). Let H be a hypothesis class. Then the growth
function of H, denoted τ H : N → N, is deﬁned as

τ H (m) = max |H C |.
C⊂X :|C|=m

In words, τ H (m) is the number of different functions from a set C of size m to {0,1}
that can be obtained by restricting H to C.
m
Obviously, if VCdim(H) = d then for any m ≤ d we have τ H (m) = 2 .In such
cases, H induces all possible functions from C to {0,1}. The following beautiful
lemma, proposed independently by Sauer, Shelah, and Perles, shows that when m
becomes larger than the VC-dimension, the growth function increases polynomially
rather than exponentially with m.

Lemma 6.10 (Sauer-Shelah-Perles). Let H be a hypothesis class with VCdim(H) ≤
d m

d < ∞. Then, for all m, τ H (m) ≤ i=0 i . In particular, if m > d + 1 then τ H (m) ≤
d
(em/d) .
Proof of Sauer’s Lemma*
To prove the lemma it sufﬁces to prove the following stronger claim: For any C =
{c 1 ,...,c m } we have

∀H, |H C |≤ |{B ⊆ C : H shatters B}|. (6.3)

62 63 64 65 66 67 68 69 70 71 72