Page 89 - Understanding Machine Learning
P. 89
7.8 Exercises 71
7.8 EXERCISES
7.1 Prove that for any finite class H, and any description language d : H →{0,1} ,the
∗
VC-dimension of H is at most 2sup{|d(h)| : h ∈ H} – the maximum description length
of a predictor in H.Furthermore, if d is a prefix-free description then VCdim(H) ≤
sup{|d(h)| : h ∈ H}.
7.2 Let H ={h n : n ∈ N} be an infinite countable hypothesis class for binary classification.
Show that it is impossible to assign weights to the hypotheses in H such that
H could be learnted nonuniformly using these weights. That is, the weighting
function w : H → [0,1] should satisfy the condition w(h) ≤ 1.
h∈H
The weights would be monotonically nondecreasing. That is, if i < j,then
w(h i ) ≤ w(h j ).
∞
7.3 Consider a hypothesis class H = H n , where for every n ∈ N, H n is finite.
n=1
Find a weighting function w : H → [0,1] such that w(h) ≤ 1 and so that for
h∈H
all h ∈ H, w(h)isdetermined by n(h) = min{n : h ∈ H n } and by |H n(h) |.
(*) Define such a function w when for all n H n is countable (possibly infinite).
7.4 Let H be some hypothesis class. For any h ∈ H,let |h| denote the description length
of h, according to some fixed description language. Consider the MDL learning
paradigm in which the algorithm returns:
|h|+ ln(2/δ)
h S ∈ argmin L S (h) + ,
h∈H 2m
where S is a sample of size m.For any B > 0, let H B ={h ∈ H : |h|≤ B}, and define
h = arg min L D (h).
∗
B
h∈H B
Prove a bound on L D (h S ) − L D (h )in termsof B, the confidence parameter δ,and
∗
B
the size of the training set m.
Note: Such bounds are known as oracle inequalities in the literature: We wish to
estimate how good we are compared to a reference classifier (or “oracle”) h .
∗
B
7.5 In this question we wish to show a No-Free-Lunch result for nonuniform learnabil-
ity: namely, that, over any infinite domain, the class of all functions is not learnable
even under the relaxed nonuniform variation of learning.
Recall that an algorithm, A, nonuniformly learns a hypothesis class H if there
2
exists a function m NUL :(0,1) ×H → N such that, for every
,δ ∈ (0,1) and for every
H
h ∈ H,if m ≥ m NUL (
,δ,h) then for every distribution D, with probability of at least
H
m
1− δ over the choice of S ∼ D , it holds that
L D (A(S)) ≤ L D (h) +
.
If such an algorithm exists then we say that H is nonuniformly learnable.
A
1. Let A be a nonuniform learner for a class H. For each n ∈ N define H ={h ∈ H :
n
m NUL (0.1,0.1,h) ≤ n}. Prove that each such class H n has a finite VC-dimension.
2. Prove that if a class H is nonuniformly learnable then there are classes H n so that
H = H n and, for every n ∈ N,VCdim(H n ) is finite.
n∈N
3. Let H be a class that shatters an infinite set. Then, for every sequence of
classes (H n : n ∈ N) such that H = H n , there exists some n for which
n∈N
VCdim(H n ) =∞.
Hint: Given a class H that shatters some infinite set K, and a sequence of classes
(H n : n ∈ N), each having a finite VC-dimension, start by defining subsets K n ⊆ K
such that, for all n, |K n | > VCdim(H n ) and for any n = m, K n ∩ K m =∅.Now, pick