Page 89 - Understanding Machine Learning

P. 89

7.8 Exercises 71

7.8 EXERCISES
7.1 Prove that for any ﬁnite class H, and any description language d : H →{0,1} ,the
∗
VC-dimension of H is at most 2sup{|d(h)| : h ∈ H} – the maximum description length
of a predictor in H.Furthermore, if d is a preﬁx-free description then VCdim(H) ≤
sup{|d(h)| : h ∈ H}.
7.2 Let H ={h n : n ∈ N} be an inﬁnite countable hypothesis class for binary classiﬁcation.
Show that it is impossible to assign weights to the hypotheses in H such that
H could be learnted nonuniformly using these weights. That is, the weighting

function w : H → [0,1] should satisfy the condition w(h) ≤ 1.
h∈H
The weights would be monotonically nondecreasing. That is, if i < j,then
w(h i ) ≤ w(h j ).

∞
7.3 Consider a hypothesis class H = H n , where for every n ∈ N, H n is ﬁnite.
n=1

Find a weighting function w : H → [0,1] such that w(h) ≤ 1 and so that for
h∈H
all h ∈ H, w(h)isdetermined by n(h) = min{n : h ∈ H n } and by |H n(h) |.
(*) Deﬁne such a function w when for all n H n is countable (possibly inﬁnite).
7.4 Let H be some hypothesis class. For any h ∈ H,let |h| denote the description length
of h, according to some ﬁxed description language. Consider the MDL learning
paradigm in which the algorithm returns:

|h|+ ln(2/δ)
h S ∈ argmin L S (h) + ,
h∈H 2m
where S is a sample of size m.For any B > 0, let H B ={h ∈ H : |h|≤ B}, and deﬁne
h = arg min L D (h).
∗
B
h∈H B
Prove a bound on L D (h S ) − L D (h )in termsof B, the conﬁdence parameter δ,and
∗
B
the size of the training set m.
Note: Such bounds are known as oracle inequalities in the literature: We wish to
estimate how good we are compared to a reference classiﬁer (or “oracle”) h .
∗
B
7.5 In this question we wish to show a No-Free-Lunch result for nonuniform learnabil-
ity: namely, that, over any inﬁnite domain, the class of all functions is not learnable
even under the relaxed nonuniform variation of learning.
Recall that an algorithm, A, nonuniformly learns a hypothesis class H if there
2
exists a function m NUL :(0,1) ×H → N such that, for every
,δ ∈ (0,1) and for every
H
h ∈ H,if m ≥ m NUL (
,δ,h) then for every distribution D, with probability of at least
H
m
1− δ over the choice of S ∼ D , it holds that
L D (A(S)) ≤ L D (h) +
.
If such an algorithm exists then we say that H is nonuniformly learnable.
A
1. Let A be a nonuniform learner for a class H. For each n ∈ N deﬁne H ={h ∈ H :
n
m NUL (0.1,0.1,h) ≤ n}. Prove that each such class H n has a ﬁnite VC-dimension.
2. Prove that if a class H is nonuniformly learnable then there are classes H n so that

H = H n and, for every n ∈ N,VCdim(H n ) is ﬁnite.
n∈N
3. Let H be a class that shatters an inﬁnite set. Then, for every sequence of

classes (H n : n ∈ N) such that H = H n , there exists some n for which
n∈N
VCdim(H n ) =∞.
Hint: Given a class H that shatters some inﬁnite set K, and a sequence of classes
(H n : n ∈ N), each having a ﬁnite VC-dimension, start by deﬁning subsets K n ⊆ K
such that, for all n, |K n | > VCdim(H n ) and for any n = m, K n ∩ K m =∅.Now, pick

84 85 86 87 88 89 90 91 92 93 94