Page 222 - Understanding Machine Learning
P. 222
Multiclass and Ranking
204
r
v
Let us denote (¯ x,v) = i=1 i x i ; it follows that
r
π(h w (¯ x)) = argmax v i w,x i
v∈V
i=1
! "
r
= argmax w, v i x i
v∈V
i=1
= argmax w, (¯ x,v) .
v∈V
On the basis of this observation, we can use the generalized hinge loss for cost-
sensitive multiclass classification as a surrogate loss function for the NDCG loss as
follows:
(h w (¯ x),y) ≤ (h w (¯ x),y) + w, (¯ x,π(h w (¯ x))) − w, (¯ x,π(y))
2 3 2 3
≤ max (v,y) + w, (¯ x,v) − w, (¯ x,π(y))
v∈V
r
= max (v,y) + (v i − π(y) i ) w,x i . (17.8)
v∈V
i=1
The right-hand side is a convex function with respect to w.
We can now solve the learning problem using SGD as described in
Section 17.2.5. The main computational bottleneck is calculating a subgradient of
the loss function, which is equivalent to finding v that achieves the maximum in
Equation (17.8) (see Claim 14.6). Using the definition of the NDCG loss, this is
equivalent to solving the problem
r
argmin (α i v i + β i D(v i )),
v∈V
i=1
where α i =− w,x i and β i = y i /G(y,y). We can think of this problem a little bit
differently by defining a matrix A ∈ R r,r where
A i, j = jα i + D( j)β i .
Now, let us think about each j as a “worker,” each i as a “task,” and A i, j as the cost
of assigning task i to worker j. With this view, the problem of finding v becomes
the problem of finding an assignment of the tasks to workers of minimal cost. This
problem is called “the assignment problem” and can be solved efficiently. One par-
ticular algorithm is the “Hungarian method” (Kuhn 1955). Another way to solve
the assignment problem is using linear programming. To do so, let us first write the