Page 211 - Understanding Machine Learning
P. 211
17.2 Linear Multiclass Predictors 193
17.2 LINEAR MULTICLASS PREDICTORS
In light of the inadequacy of reduction methods, in this section we study a more
direct approach for learning multiclass predictors. We describe the family of linear
multiclass predictors. To motivate the construction of this family, recall that a linear
predictor for binary classification (i.e., a halfspace) takes the form
h(x) = sign( w,x ).
An equivalent way to express the prediction is as follows:
h(x) = argmax w, yx ,
y∈{±1}
where yx is the vector obtained by multiplying each element of x by y.
This representation leads to a natural generalization of halfspaces to multiclass
d
problems as follows. Let : X × Y → R be a class-sensitive feature mapping.That
is, takes as input a pair (x, y) and maps it into a d dimensional feature vector.
Intuitively, we can think of the elements of (x, y) as score functions that assess
how well the label y fits the instance x. We will elaborate on later on. Given
d
and a vector w ∈ R , we can define a multiclass predictor, h : X → Y, as follows:
h(x) = argmax w, (x, y) .
y∈Y
That is, the prediction of h for the input x is the label that achieves the highest
weighted score, where weighting is according to the vector w.
d
d
Let W be some set of vectors in R , for example, W ={w ∈ R : w ≤ B},
for some scalar B > 0. Each pair ( ,W) defines a hypothesis class of multiclass
predictors:
H ,W ={x → argmax w, (x, y) : w ∈ W}.
y∈Y
Of course, the immediate question, which we discuss in the sequel, is how to con-
d
struct a good . Note that if Y ={±1} and we set (x, y) = yx and W = R ,then
H ,W becomes the hypothesis class of homogeneous halfspace predictors for binary
classification.
17.2.1 How to Construct
As mentioned before, we can think of the elements of (x, y) as score functions
that assess how well the label y fits the instance x. Naturally, designing a good
is similar to the problem of designing a good feature mapping (as we discussed in
Chapter 16 and as we will discuss in more detail in Chapter 25). Two examples of
useful constructions are given in the following.
The Multivector Construction:
d
n
Let Y ={1,...,k} and let X = R . We define : X × Y → R ,where d = nk,as
follows
(x, y) = [0,...,0, x 1 ,...,x n , 0,...,0 ]. (17.2)
9 :; < 9 :; < 9 :; <
∈R (y−1)n ∈R n ∈R (k−y)n