Page 222 - Understanding Machine Learning

P. 222

Multiclass and Ranking
204
r

v
Let us denote (¯ x,v) = i=1 i x i ; it follows that
r

π(h w (¯ x)) = argmax v i w,x i
v∈V
i=1
! "
r

= argmax w, v i x i
v∈V
i=1
= argmax w, (¯ x,v) .
v∈V
On the basis of this observation, we can use the generalized hinge loss for cost-
sensitive multiclass classiﬁcation as a surrogate loss function for the NDCG loss as
follows:

(h w (¯ x),y) ≤ (h w (¯ x),y) + w, (¯ x,π(h w (¯ x))) − w, (¯ x,π(y))
2 3 2 3
≤ max (v,y) + w, (¯ x,v) − w, (¯ x,π(y))
v∈V

= max (v,y) + (v i − π(y) i ) w,x i . (17.8)
v∈V
i=1
The right-hand side is a convex function with respect to w.
We can now solve the learning problem using SGD as described in
Section 17.2.5. The main computational bottleneck is calculating a subgradient of
the loss function, which is equivalent to ﬁnding v that achieves the maximum in
Equation (17.8) (see Claim 14.6). Using the deﬁnition of the NDCG loss, this is
equivalent to solving the problem

argmin (α i v i + β i D(v i )),
v∈V
i=1

where α i =− w,x i and β i = y i /G(y,y). We can think of this problem a little bit
differently by deﬁning a matrix A ∈ R r,r where

A i, j = jα i + D( j)β i .

Now, let us think about each j as a “worker,” each i as a “task,” and A i, j as the cost
of assigning task i to worker j. With this view, the problem of ﬁnding v becomes
the problem of ﬁnding an assignment of the tasks to workers of minimal cost. This
problem is called “the assignment problem” and can be solved efﬁciently. One par-
ticular algorithm is the “Hungarian method” (Kuhn 1955). Another way to solve
the assignment problem is using linear programming. To do so, let us ﬁrst write the

217 218 219 220 221 222 223 224 225 226 227