Page 229 - Understanding Machine Learning
P. 229

17.8 Exercises  211


              17.2 Multiclass Perceptron: Consider the following algorithm:
                                      Multiclass Batch Perceptron
                           Input:
                            A training set (x 1 , y 1 ),...,(x m , y m )
                            A class-sensitive feature mapping   : X × Y → R d
                           Initialize: w (1)  = (0,...,0) ∈ R d
                            For t = 1, 2,...
                                                 (t)
                                                                (t)
                             If (∃ i and y  = y i s.t.  w , (x i , y i ) ≤  w , (x i , y) )then
                               w (t+1)  = w (t)  +  (x i , y i ) −  (x i , y)
                             else
                               output w (t)

                  Prove the following:

                  Theorem 17.5. Assume that there exists w such that for all i and for all y  = y i it


                  holds that  w , (x i , y i ) ≥  w , (x i , y) + 1.Let R = max i,y   (x i , y i ) −  (x i , y) .
                                                                            2

                  Then, the multiclass Perceptron algorithm stops after at most (R w  ) iterations,
                                                                (t)
                  and when it stops it holds that ∀i ∈ [m], y i = argmax  w , (x i , y) .
                                                            y
              17.3 Generalize the dynamic programming procedure given in Section 17.3 for solv-
                                                                 ˆ
                  ing the maximization problem given in the definition of h in the SGD procedure
                                                                    r

                  for multiclass prediction. You can assume that  (y ,y) =  t=1  δ(y , y t )forsome
                                                                          t
                  arbitrary function δ.
              17.4 Prove that Equation (17.7) holds.
              17.5 Show that the two definitions of π as defined in Equation (17.12)and
                  Equation (17.13) are indeed equivalent for all the multivariate performance
                  measures.
   224   225   226   227   228   229   230   231   232   233   234