Page 74 - Understanding Machine Learning
P. 74
The VC-Dimension
56
Hint: Use Exercise 6.3 in Chapter 5.
2. Prove that for every H that is PAC learnable, VCdim(H) < ∞. (Note that this is
the implication 3 → 6 in Theorem 6.7.)
6.11 VC of union: Let H 1 ,...,H r be hypothesis classes over some fixed domain set X.
Let d = max i VCdim(H i ) and assume for simplicity that d ≥ 3.
1. Prove that
r
VCdim ∪ i=1 H i ≤ 4d log(2d) + 2log(r).
Hint: Takeaset of k examples and assume that they are shattered by the union
k
class. Therefore, the union class can produce all 2 possible labelings on these
examples. Use Sauer’s lemma to show that the union class cannot produce more
d
k
d
than rk labelings. Therefore, 2 rk . Now use Lemma A.2.
2. (*) Prove that for r = 2 it holds that
VCdim(H 1 ∪ H 2) ≤ 2d + 1.
6.12 Dudley classes: In this question we discuss an algebraic framework for defining con-
n
cept classes over R and show a connection between the VC dimension of such
n
classes and their algebraic properties. Given a function f : R → R we define the
corresponding function, POS( f )(x) = 1 [ f (x)>0] .For a class F of real valued func-
tions we define a corresponding class of functions POS(F) ={POS( f ): f ∈ F}.We
say that a family, F, of real valued functions is linearly closed if for all f ,g ∈ F and
r ∈ R,( f +rg)∈ F (where addition and scalar multiplication of functions are defined
n
point wise, namely, for all x ∈ R ,( f +rg)(x) = f (x) +rg(x)). Note that if a family
of functions is linearly closed then we can view it as a vector space over the reals.
def
n
For a function g : R → R and a family of functions F,let F + g ={ f + g : f ∈ F}.
Hypothesis classes that have a representation as POS(F + g) for some vector space
of functions F and some function g are called Dudley classes.
n
1. Show that for every g : R → R and every vector space of functions F as defined
earlier, VCdim(POS(F + g)) = VCdim(POS(F)).
2. (**) For every linearly closed family of real valued functions F,the VC-
dimension of the corresponding class POS(F) equals the linear dimension of
F (as a vector space). Hint: Let f 1 ,..., f d be a basis for the vector space F. Con-
d
n
sider the mapping x è ( f 1 (x),..., f d (x)) (from R to R ). Note that this mapping
n
induces a matching between functions over R of the form POS( f ) and homo-
d
geneous linear halfspaces in R (the VC-dimension of the class of homogeneous
linear halfspaces is analyzed in Chapter 9).
3. Show that each of the following classes can be represented as a Dudley class:
n
1. The class HS n of halfspaces over R (see Chapter 9).
n
2. The class HH S n of all homogeneous halfspaces over R (see Chapter 9).
d
3. The class B d of all functions defined by (open) balls in R . Use the Dudley
representation to figure out the VC-dimension of this class.
d
4. Let P denote the class of functions defined by polynomial inequalities of
n
degree ≤ d, namely,
d
P ={h p : p is a polynomial of degree ≤ d in the variables x 1 ,...,x n },
n
where for x = (x 1 ....,x n ), h p (x) = 1 [p(x)≥0] (the degree of a multivariable poly-
nomial is the maximal sum of variable exponents over all of its terms. For
3 2
2
example, the degree of p(x) = 3x x + 4x 3 x is 5).
1 2 7