Page 187 - Understanding Machine Learning
P. 187
15.1 Margin and Hard-SVM 169
On the basis of the preceding claim, the closest point in the training set to the
separating hyperplane is min i∈[m] | w,x i + b|. Therefore, the Hard-SVM rule is
argmax min | w,x i + b| s.t. ∀i, y i ( w,x i + b) > 0.
(w,b): w =1 i∈[m]
Whenever there is a solution to the preceding problem (i.e., we are in the separable
case), we can write an equivalent problem as follows (see Exercise 15.1):
argmax min y i ( w,x i + b). (15.1)
(w,b): w =1 i∈[m]
Next, we give another equivalent formulation of the Hard-SVM rule as a quadratic
optimization problem. 1
Hard-SVM
input: (x 1 , y 1 ),...,(x m , y m )
solve:
2
(w 0 ,b 0 ) = argmin w s.t. ∀i, y i ( w,x i + b) ≥ 1 (15.2)
(w,b)
w 0 ˆ b 0
output: ˆ w = , b =
w 0 w 0
The lemma that follows shows that the output of hard-SVM is indeed the sep-
arating hyperplane with the largest margin. Intuitively, hard-SVM searches for
w of minimal norm among all the vectors that separate the data and for which
| w,x i + b|≥ 1for all i. In other words, we enforce the margin to be 1, but now
the units in which we measure the margin scale with the norm of w. Therefore, find-
ing the largest margin halfspace boils down to finding w whose norm is minimal.
Formally:
Lemma 15.2. The output of Hard-SVM is a solution of Equation (15.1).
Proof. Let (w ,b ) be a solution of Equation (15.1) and define the margin achieved
by (w ,b )to be γ = min i∈[m] y i ( w ,x i + b ). Therefore, for all i we have
y i ( w ,x i + b ) ≥ γ
or equivalently
w b
y i ( ,x i + ) ≥ 1.
γ γ
Hence, the pair ( w , b ) satisfies the conditions of the quadratic optimization prob-
γ γ
w 1
lem given in Equation (15.2). Therefore, w 0 ≤√ = . It follows that for
γ γ
all i,
1 1
ˆ
y i ( ˆ w,x i + b) = y i ( w 0 ,x i + b 0 ) ≥ ≥ γ .
w 0 w 0
Since ˆ w = 1 we obtain that ( ˆ w,b) is an optimal solution of Equation (15.1).
ˆ
1 A quadratic optimization problem is an optimization problem in which the objective is a convex
quadratic function and the constraints are linear inequalities.