Page 218 - Understanding Machine Learning
P. 218
Multiclass and Ranking
200
values are indicative of a certain letter. The second type of features take the form
r
1
i, j,2 (x,y) = 1 [y t =i] 1 [y t−1 = j] .
r
t=2
That is, we sum the number of times the letter i follows the letter j. Intuitively,
these features can capture rules like “It is likely to see the pair ‘qu’ in a word” or “It
is unlikely to see the pair ‘rz’ in a word.” Of course, some of these features will not
be very useful, so the goal of the learning process is to assign weights to features by
learning the vector w, so that the weighted score will give us a good prediction via
h w (x) = argmax w, (x,y) .
y∈Y
It is left to show how to solve the optimization problem in the definition of h w (x)
efficiently, as well as how to solve the optimization problem in the definition of ˆy in
the SGD algorithm. We can do this by applying a dynamic programming procedure.
We describe the procedure for solving the maximization in the definition of h w and
leave as an exercise the maximization problem in the definition of ˆy in the SGD
algorithm.
To derive the dynamic programming procedure, let us first observe that we can
write
r
(x,y) = φ(x, y t , y t−1 ),
t=1
d
for an appropriate φ : X × [q] × [q] ∪{0}→ R , and for simplicity we assume that y 0
is always equal to 0. Indeed, each feature function i, j,1 canbewritten interms of
φ i, j,1 (x, y t , y t−1 ) = x i,t 1 [y t = j] ,
while the feature function i, j,2 canbewritten interms of
φ i, j,2 (x, y t , y t−1 ) = 1 [y t =i] 1 [y t−1 = j] .
Therefore, the prediction can be written as
r
h w (x) = argmax w,φ(x, y t , y t−1 ) . (17.4)
y∈Y
t=1
In the following we derive a dynamic programming procedure that solves every
problem of the form given in Equation (17.4). The procedure will maintain a matrix
M ∈ R q,r such that
τ
M s,τ = max w,φ(x, y t , y t−1 ) .
(y 1 ,...,y τ ):y τ =s
t=1
Clearly, the maximum of w, (x,y) equals max s M s,r . Furthermore, we can
calculate M in a recursive manner:
2 3
M s,τ = max M s ,τ−1 + w,φ(x,s,s ) . (17.5)
s