Page 211 - Understanding Machine Learning

P. 211

17.2 Linear Multiclass Predictors 193

17.2 LINEAR MULTICLASS PREDICTORS
In light of the inadequacy of reduction methods, in this section we study a more
direct approach for learning multiclass predictors. We describe the family of linear
multiclass predictors. To motivate the construction of this family, recall that a linear
predictor for binary classiﬁcation (i.e., a halfspace) takes the form

h(x) = sign( w,x ).
An equivalent way to express the prediction is as follows:

h(x) = argmax w, yx ,
y∈{±1}
where yx is the vector obtained by multiplying each element of x by y.
This representation leads to a natural generalization of halfspaces to multiclass
d
problems as follows. Let : X × Y → R be a class-sensitive feature mapping.That
is, takes as input a pair (x, y) and maps it into a d dimensional feature vector.
Intuitively, we can think of the elements of (x, y) as score functions that assess
how well the label y ﬁts the instance x. We will elaborate on later on. Given
d
and a vector w ∈ R , we can deﬁne a multiclass predictor, h : X → Y, as follows:
h(x) = argmax w, (x, y) .
y∈Y
That is, the prediction of h for the input x is the label that achieves the highest
weighted score, where weighting is according to the vector w.
d
d
Let W be some set of vectors in R , for example, W ={w ∈ R : w ≤ B},
for some scalar B > 0. Each pair ( ,W) deﬁnes a hypothesis class of multiclass
predictors:
H ,W ={x → argmax w, (x, y) : w ∈ W}.
y∈Y
Of course, the immediate question, which we discuss in the sequel, is how to con-
d
struct a good . Note that if Y ={±1} and we set (x, y) = yx and W = R ,then
H ,W becomes the hypothesis class of homogeneous halfspace predictors for binary
classiﬁcation.

17.2.1 How to Construct

As mentioned before, we can think of the elements of (x, y) as score functions
that assess how well the label y ﬁts the instance x. Naturally, designing a good
is similar to the problem of designing a good feature mapping (as we discussed in
Chapter 16 and as we will discuss in more detail in Chapter 25). Two examples of
useful constructions are given in the following.

The Multivector Construction:
d
n
Let Y ={1,...,k} and let X = R . We deﬁne : X × Y → R ,where d = nk,as
follows
(x, y) = [0,...,0, x 1 ,...,x n , 0,...,0 ]. (17.2)
9 :; < 9 :; < 9 :; <
∈R (y−1)n ∈R n ∈R (k−y)n

206 207 208 209 210 211 212 213 214 215 216