Page 371 - Understanding Machine Learning
P. 371
29.3 C alculating the N atarajan Dimension 353
The proof of Natarajan’s lemma shares the same spirit of the proof of Sauer’s
lemma and is left as an exercise (see Exercise 29.3).
29.3 CALCULATING THE NATARAJAN DIMENSION
In this section we show how to calculate (or estimate) the Natarajan dimension of
several popular classes, some of which were studied in Chapter 17.As these cal-
culations indicate, the Natarajan dimension is often proportional to the number of
parameters required to define a hypothesis.
29.3.1 One-vs.-All Based Classes
In Chapter 17 we have seen two reductions of multiclass categorization to binary
classification: One-vs.-All and All-Pairs. In this section we calculate the Natarajan
dimension of the One-vs.-All method.
Recall that in One-vs.-All we train, for each label, a binary classifier that dis-
tinguishes between that label and the rest of the labels. This naturally suggests
X
considering multiclass hypothesis classes of the following form. Let H bin ⊂{0,1}
k
¯
¯
be a binary hypothesis class. For every h = (h 1 ,...,h k ) ∈ (H bin ) define T(h): X →
[k]by
T (h)(x) = argmaxh i (x).
¯
i∈[k]
If there are two labels that maximize h i (x), we choose the smaller one. Also, let
OvA,k k
¯
¯
H ={T(h): h ∈ (H bin ) }.
bin
OvA,k
What “should” be the Natarajan dimension of H ? Intuitively, to specify a
bin
hypothesis in H bin we need d = VCdim(H bin ) parameters. To specify a hypothe-
OvA,k
sis in H , we need to specify k hypotheses in H bin . Therefore, kd parameters
bin
should suffice. The following lemma establishes this intuition.
Lemma 29.5. If d = VCdim(H bin ) then
OvA,k
Ndim(H ) ≤ 3kd log(kd).
bin
Proof. Let C ⊂ X be a shattered set. By the definition of shattering (for multiclass
hypotheses)
OvA,k
|C|
H bin
≥ 2 .
C
OvA,k
On the other hand, each hypothesis in H is determined by using k hypotheses
bin
from H bin . Therefore,
OvA,k k
H bin
≤|(H bin ) | .
C
C
d
By Sauer’s lemma, |(H bin ) |≤ |C| . We conclude that
C
OvA,k
dk
2 |C| ≤
H
≤|C| .
bin
C
The proof follows by taking the logarithm and applying Lemma A.1.