Page 371 - Understanding Machine Learning
P. 371

29.3  C alculating  the  N atarajan  Dimension  353


                 The proof of Natarajan’s lemma shares the same spirit of the proof of Sauer’s
              lemma and is left as an exercise (see Exercise 29.3).


              29.3 CALCULATING THE NATARAJAN DIMENSION

              In this section we show how to calculate (or estimate) the Natarajan dimension of
              several popular classes, some of which were studied in Chapter 17.As these cal-
              culations indicate, the Natarajan dimension is often proportional to the number of
              parameters required to define a hypothesis.


              29.3.1 One-vs.-All Based Classes

              In Chapter 17 we have seen two reductions of multiclass categorization to binary
              classification: One-vs.-All and All-Pairs. In this section we calculate the Natarajan
              dimension of the One-vs.-All method.
                 Recall that in One-vs.-All we train, for each label, a binary classifier that dis-
              tinguishes between that label and the rest of the labels. This naturally suggests
                                                                                    X
              considering multiclass hypothesis classes of the following form. Let H bin ⊂{0,1}
                                                                    k
                                                                              ¯
                                                ¯
              be a binary hypothesis class. For every h = (h 1 ,...,h k ) ∈ (H bin ) define T(h): X →
              [k]by
                                       T (h)(x) = argmaxh i (x).
                                          ¯
                                                  i∈[k]
              If there are two labels that maximize h i (x), we choose the smaller one. Also, let
                                      OvA,k                  k
                                                    ¯
                                                ¯
                                    H      ={T(h): h ∈ (H bin ) }.
                                      bin
                                                          OvA,k
              What “should” be the Natarajan dimension of H   ? Intuitively, to specify a
                                                          bin
              hypothesis in H bin we need d = VCdim(H bin ) parameters. To specify a hypothe-
                    OvA,k
              sis in H   , we need to specify k hypotheses in H bin . Therefore, kd parameters
                    bin
              should suffice. The following lemma establishes this intuition.
              Lemma 29.5. If d = VCdim(H bin ) then
                                            OvA,k
                                     Ndim(H      ) ≤ 3kd log(kd).
                                            bin
              Proof. Let C ⊂ X be a shattered set. By the definition of shattering (for multiclass
              hypotheses)

                                            OvA,k
                                        
          
    |C|
                                        
 H bin    
 ≥ 2  .
                                                  C
                                                 OvA,k
              On the other hand, each hypothesis in H  is determined by using k hypotheses
                                                 bin
              from H bin . Therefore,

                                         OvA,k              k

                                     
 H bin    
 ≤|(H bin ) | .
                                                          C
                                               C
                                           d
              By Sauer’s lemma, |(H bin ) |≤ |C| . We conclude that
                                    C

                                               OvA,k  
     dk
                                    2 |C|  ≤ 
 H      
 ≤|C| .

                                               bin
                                                    C
              The proof follows by taking the logarithm and applying Lemma A.1.
   366   367   368   369   370   371   372   373   374   375   376