Page 227 - Understanding Machine Learning

P. 227

17.6 Summary 209

of is ﬁxed so we only need to maximize the expression

max v i w,x i .
v∈ ¯ Y a,b
i=1
Suppose the examples are sorted so that w,x 1 ≥ ··· ≥ w,x r . Then, it is easy to
verify that we would like to set v i to be positive for the smallest indices i. Doing
this, with the constraint on a,b, amounts to setting v i = 1for the a top ranked posi-
tive examples and for the b top-ranked negative examples. This yields the following
procedure.

Solving Equation (17.14)
input:
(x 1 ,...,x r ),(y 1 ,..., y r ),w,V ,
assumptions:
is a function of a, b, c, d
V contains all vectors for which f (a,b) = 1 for some function f
initialize:
P =|{i : y i = 1}|, N =|{i : y i =−1}|

µ = ( w,x 1 ,..., w,x r ), α =−∞
sort examples so that µ 1 ≥ µ 2 ≥ ··· ≥ µ r
let i 1 ,...,i P be the (sorted) indices of the positive examples
let j 1 ,..., j N be the (sorted) indices of the negative examples
for a = 0,1,..., P
c = P − a
for b = 0,1,..., N such that f (a,b) = 1
d = N − b
calculate using a, b, c, d
= 1
set v 1 ,...,v r s.t. v i 1 = ··· = v i a = v j 1 = ··· = v j b
and the rest of the elements of v equal −1
r

v
set α = + i=1 i µ i
if α ≥ α

α = α, v = v
output v

17.6 SUMMARY

Many real world supervised learning problems can be cast as learning a multiclass
predictor. We started the chapter by introducing reductions of multiclass learning
to binary learning. We then described and analyzed the family of linear predictors
for multiclass learning. We have shown how this family can be used even if the
number of classes is extremely large, as long as we have an adequate structure on
the problem. Finally, we have described ranking problems. In Chapter 29 we study
the sample complexity of multiclass learning in more detail.

222 223 224 225 226 227 228 229 230 231 232