Page 227 - Understanding Machine Learning
P. 227

17.6 Summary   209


              of   is fixed so we only need to maximize the expression

                                                r

                                          max     v i  w,x i  .
                                          v∈ ¯ Y a,b
                                               i=1
              Suppose the examples are sorted so that  w,x 1   ≥ ··· ≥  w,x r  . Then, it is easy to
              verify that we would like to set v i to be positive for the smallest indices i. Doing
              this, with the constraint on a,b, amounts to setting v i = 1for the a top ranked posi-
              tive examples and for the b top-ranked negative examples. This yields the following
              procedure.

                                     Solving Equation (17.14)
                      input:
                        (x 1 ,...,x r ),(y 1 ,..., y r ),w,V ,
                      assumptions:
                          is a function of a, b, c, d
                        V contains all vectors for which f (a,b) = 1 for some function f
                      initialize:
                        P =|{i : y i = 1}|, N =|{i : y i =−1}|

                        µ = ( w,x 1  ,..., w,x r  ), α =−∞
                        sort examples so that µ 1 ≥ µ 2 ≥ ··· ≥ µ r
                        let i 1 ,...,i P be the (sorted) indices of the positive examples
                        let j 1 ,..., j N be the (sorted) indices of the negative examples
                      for a = 0,1,..., P
                        c = P − a
                        for b = 0,1,..., N such that f (a,b) = 1
                         d = N − b
                         calculate   using a, b, c, d
                                                                 = 1
                         set v 1 ,...,v r s.t. v i 1  = ··· = v i a  = v j 1  = ··· = v j b
                           and the rest of the elements of v equal −1
                                      r

                                         v
                         set α =   +  i=1 i µ i
                         if α ≥ α


                           α = α, v = v
                      output v

              17.6 SUMMARY

              Many real world supervised learning problems can be cast as learning a multiclass
              predictor. We started the chapter by introducing reductions of multiclass learning
              to binary learning. We then described and analyzed the family of linear predictors
              for multiclass learning. We have shown how this family can be used even if the
              number of classes is extremely large, as long as we have an adequate structure on
              the problem. Finally, we have described ranking problems. In Chapter 29 we study
              the sample complexity of multiclass learning in more detail.
   222   223   224   225   226   227   228   229   230   231   232