Page 187 - Understanding Machine Learning
P. 187

15.1 Margin and Hard-SVM  169



                 On the basis of the preceding claim, the closest point in the training set to the
              separating hyperplane is min i∈[m] | w,x  i  +  b|. Therefore, the Hard-SVM rule is

                           argmax min | w,x i  +  b|  s.t.  ∀i, y i ( w,x  i  +  b) > 0.


                          (w,b): w =1  i∈[m]
              Whenever there is a solution to the preceding problem (i.e., we are in the separable
              case), we can write an equivalent problem as follows (see Exercise 15.1):
                                      argmax min y i ( w,x i  + b).             (15.1)
                                     (w,b): w =1  i∈[m]
              Next, we give another equivalent formulation of the Hard-SVM rule as a quadratic
              optimization problem. 1
                                            Hard-SVM
                      input: (x 1 , y 1 ),...,(x m , y m )
                      solve:
                                              2
                           (w 0 ,b 0 ) = argmin w  s.t. ∀i, y i ( w,x i  + b) ≥ 1  (15.2)
                                      (w,b)

                                  w 0  ˆ   b 0
                      output: ˆ w =  , b =
                                  w 0      w 0
                 The lemma that follows shows that the output of hard-SVM is indeed the sep-
              arating hyperplane with the largest margin. Intuitively, hard-SVM searches for
              w of minimal norm among all the vectors that separate the data and for which
              | w,x i  + b|≥ 1for all i. In other words, we enforce the margin to be 1, but now
              the units in which we measure the margin scale with the norm of w. Therefore, find-
              ing the largest margin halfspace boils down to finding w whose norm is minimal.
              Formally:

              Lemma 15.2. The output of Hard-SVM is a solution of Equation (15.1).


              Proof. Let (w ,b ) be a solution of Equation (15.1) and define the margin achieved





              by (w ,b )to be γ = min i∈[m] y i ( w ,x i  + b ). Therefore, for all i we have

                                         y i ( w ,x i  + b ) ≥ γ
              or equivalently
                                            w       b
                                         y i (     ,x i  +    ) ≥ 1.
                                            γ       γ
              Hence, the pair ( w     ,  b      ) satisfies the conditions of the quadratic optimization prob-
                             γ  γ
                                                            w     1
              lem given in Equation (15.2). Therefore,  w 0  ≤√    =    . It follows that for
                                                            γ     γ
              all i,
                                           1                    1
                                      ˆ
                           y i (  ˆ w,x i  + b) =  y i ( w 0 ,x i  + b 0 ) ≥  ≥ γ .
                                           w 0                 w 0
              Since   ˆ w = 1 we obtain that ( ˆ w,b) is an optimal solution of Equation (15.1).
                                          ˆ
              1  A quadratic optimization problem is an optimization problem in which the objective is a convex
               quadratic function and the constraints are linear inequalities.
   182   183   184   185   186   187   188   189   190   191   192