Page 229 - Understanding Machine Learning
P. 229
17.8 Exercises 211
17.2 Multiclass Perceptron: Consider the following algorithm:
Multiclass Batch Perceptron
Input:
A training set (x 1 , y 1 ),...,(x m , y m )
A class-sensitive feature mapping : X × Y → R d
Initialize: w (1) = (0,...,0) ∈ R d
For t = 1, 2,...
(t)
(t)
If (∃ i and y = y i s.t. w , (x i , y i ) ≤ w , (x i , y) )then
w (t+1) = w (t) + (x i , y i ) − (x i , y)
else
output w (t)
Prove the following:
Theorem 17.5. Assume that there exists w such that for all i and for all y = y i it
holds that w , (x i , y i ) ≥ w , (x i , y) + 1.Let R = max i,y (x i , y i ) − (x i , y) .
2
Then, the multiclass Perceptron algorithm stops after at most (R w ) iterations,
(t)
and when it stops it holds that ∀i ∈ [m], y i = argmax w , (x i , y) .
y
17.3 Generalize the dynamic programming procedure given in Section 17.3 for solv-
ˆ
ing the maximization problem given in the definition of h in the SGD procedure
r
for multiclass prediction. You can assume that (y ,y) = t=1 δ(y , y t )forsome
t
arbitrary function δ.
17.4 Prove that Equation (17.7) holds.
17.5 Show that the two definitions of π as defined in Equation (17.12)and
Equation (17.13) are indeed equivalent for all the multivariate performance
measures.