Page 189 - Understanding Machine Learning
P. 189

15.2 Soft-SVM and Norm Regularization  171
















                 In the advanced part of the book (Chapter 26), we will prove that the sample
                                                    2
              complexity of Hard-SVM depends on (ρ/γ ) and is independent of the dimension
              d. In particular, Theorem 26.13 in Section 26.3 states the following:
                                                        d

              Theorem  15.4.  Let  D be  a  distribution  over  R × {±1}  that  satisfies  the  (γ,ρ)-

              separability  with  margin  assumption  using  a  homogenous  halfspace.  Then,  with
              probability of at least 1 − δ over the choice of a training set of size m, the 0-1 error of
              the output of Hard-SVM is at most
                                      *

                                        4(ρ/γ ) 2   2log(2/δ)
                                                +            .
                                           m           m
              Remark 15.1 (Margin and the Perceptron).  In Section 9.1.2 we have described and
              analyzed the Perceptron algorithm for finding an ERM hypothesis with respect to
              the class of halfspaces. In particular, in Theorem 9.1 we upper bounded the num-
              ber of updates the Perceptron might make on a given training set. It can be shown
                                                               2
              (see Exercise 15.2) that the upper bound is exactly (ρ/γ ) , where ρ is the radius of
              examples and γ is the margin.
              15.2 SOFT-SVM AND NORM REGULARIZATION
              The Hard-SVM formulation assumes that the training set is linearly separable,
              which is a rather strong assumption. Soft-SVM can be viewed as a relaxation of the
              Hard-SVM rule that can be applied even if the training set is not linearly separable.
                 The optimization problem in Equation (15.2) enforces the hard constraints
              y i ( w,x i  + b) ≥ 1for all i. A natural relaxation is to allow the constraint to be
              violated for some of the examples in the training set. This can be modeled by
              introducing nonnegative slack variables, ξ 1 ,...,ξ m , and replacing each constraint
              y i ( w,x i  + b) ≥ 1 by the constraint y i ( w,x i  + b) ≥ 1 − ξ i .That is, ξ i measures
              by how much the constraint y i ( w,x i  + b) ≥ 1 is being violated. Soft-SVM jointly
              minimizes the norm of w (corresponding to the margin) and the average of ξ i (cor-
              responding to the violations of the constraints). The tradeoff between the two terms
              is controlled by a parameter λ. This leads to the Soft-SVM optimization problem:

                                            Soft-SVM
                      input: (x 1 , y 1 ),...,(x m , y m )
                      parameter: λ> 0
                      solve:

                                              m
                                            1
                                        2
                              min   λ w  +      ξ i
                              w,b,ξ        m                             (15.4)
                                              i=1
                               s.t. ∀i, y i ( w,x i  + b) ≥ 1 − ξ i and ξ i ≥ 0
                      output: w,b
   184   185   186   187   188   189   190   191   192   193   194