Page 371 - Understanding Machine Learning

P. 371

29.3 C alculating the N atarajan Dimension 353

The proof of Natarajan’s lemma shares the same spirit of the proof of Sauer’s
lemma and is left as an exercise (see Exercise 29.3).

29.3 CALCULATING THE NATARAJAN DIMENSION

In this section we show how to calculate (or estimate) the Natarajan dimension of
several popular classes, some of which were studied in Chapter 17.As these cal-
culations indicate, the Natarajan dimension is often proportional to the number of
parameters required to deﬁne a hypothesis.

29.3.1 One-vs.-All Based Classes

In Chapter 17 we have seen two reductions of multiclass categorization to binary
classiﬁcation: One-vs.-All and All-Pairs. In this section we calculate the Natarajan
dimension of the One-vs.-All method.
Recall that in One-vs.-All we train, for each label, a binary classiﬁer that dis-
tinguishes between that label and the rest of the labels. This naturally suggests
X
considering multiclass hypothesis classes of the following form. Let H bin ⊂{0,1}
k
¯
¯
be a binary hypothesis class. For every h = (h 1 ,...,h k ) ∈ (H bin ) deﬁne T(h): X →
[k]by
T (h)(x) = argmaxh i (x).
¯
i∈[k]
If there are two labels that maximize h i (x), we choose the smaller one. Also, let
OvA,k k
¯
¯
H ={T(h): h ∈ (H bin ) }.
bin
OvA,k
What “should” be the Natarajan dimension of H ? Intuitively, to specify a
bin
hypothesis in H bin we need d = VCdim(H bin ) parameters. To specify a hypothe-
OvA,k
sis in H , we need to specify k hypotheses in H bin . Therefore, kd parameters
bin
should sufﬁce. The following lemma establishes this intuition.
Lemma 29.5. If d = VCdim(H bin ) then
OvA,k
Ndim(H ) ≤ 3kd log(kd).
bin
Proof. Let C ⊂ X be a shattered set. By the deﬁnition of shattering (for multiclass
hypotheses)

OvA,k

|C|

H bin
≥ 2 .
C
OvA,k
On the other hand, each hypothesis in H is determined by using k hypotheses
bin
from H bin . Therefore,

OvA,k k

OvA,k
dk
2 |C| ≤
H
≤|C| .

bin
C
The proof follows by taking the logarithm and applying Lemma A.1.

366 367 368 369 370 371 372 373 374 375 376