Page 370 - Understanding Machine Learning
P. 370

Multiclass  Learna bility
           352

                    To  define  the  Natarajan  dimension,  we  first  generalize  the  definition  of
                 shattering.

                 Definition  29.1  (Shattering  (Multiclass  Version)).  We  say  that  a  set  C ⊂ X is



                 shattered by H if there exist two functions  f 0 , f 1  : C → [ k] such that




                     For every x ∈ C,  f 0 ( x)  = f 1 ( x).

                     For every B ⊂ C, there exists a function h ∈ H such that









                                  ∀x ∈ B,h( x) = f 0 ( x) and ∀x ∈ C \ B,h( x) = f 1 ( x).




                 Definition 29.2  (Natarajan Dimension).  The Natarajan dimension of H,  denoted
                 Ndim(H), is the maximal size of a shattered set C ⊂ X.

                    It  is  not  hard  to  see  that  in  the  case  that  there  are  exactly  two  classes,
                 Ndim(H) = VCdim(H).  Therefore,  the Natarajan dimension  generalizes the VC
                 dimension. We next show that the Natarajan dimension allows us to generalize the
                 fundamental theorem of statistical learning from binary classification to multiclass
                 classification.
                 29.2  THE  MULTICLASS  FUNDAMENTAL  THEOREM
                 Theorem 29.3  (The Multiclass Fundamental Theorem).  There exist absolute con-
                 stants  C 1 ,C 2  > 0  such  that  the  following  holds.  For  every  hypothesis  class  H of


                 functions from X to [ k], such that the Natarajan dimension of H is d, we have


                    1.  H has the uniform convergence property with sample complexity

                                    d + log(1/δ)  UC          d log(k) + log(1/δ)



                                 C 1     2     ≤ m  H    (
,δ) ≤ C 2  2       .

                    2.  H is agnostic PAC learnable with sample complexity

                                    d + log(1/δ)             d log(k) + log(1/δ)




                                  C 1     2    ≤ m H (
,δ) ≤ C 2      2       .

                    3.  H is PAC learnable (assuming realizability) with sample complexity

                                                                     kd

                                   d + log(1/δ)              d log  
  + log(1/δ)


                                 C 1           ≤ m H (
,δ) ≤ C 2               .

                 29.2.1  On  the  Proof  of  Theorem  29.3
                 The lower bounds in Theorem 29.3 can be deduced by a reduction from the binary
                 fundamental theorem (see Exercise 29.5).
                    The upper bounds in Theorem 29.3 can be proved along the same lines of the
                 proof  of  the  fundamental theorem  for  binary  classification,  given in  Chapter  28
                 (see Exercise 29.4).  The sole ingredient of that proof that should be modified in
                 a nonstraightforward manner is Sauer’s lemma. It applies only to binary classes and
                 therefore must be replaced. An appropriate substitute is Natarajan’s lemma:
                                                Ndim(H)  2Ndim(H)
                 Lemma 29.4 (Natarajan). |H|≤ |X|      · k      .
   365   366   367   368   369   370   371   372   373   374   375