Page 187 - Understanding Machine Learning

P. 187

15.1 Margin and Hard-SVM 169

On the basis of the preceding claim, the closest point in the training set to the
separating hyperplane is min i∈[m] | w,x i + b|. Therefore, the Hard-SVM rule is

argmax min | w,x i + b| s.t. ∀i, y i ( w,x i + b) > 0.

(w,b): w =1 i∈[m]
Whenever there is a solution to the preceding problem (i.e., we are in the separable
case), we can write an equivalent problem as follows (see Exercise 15.1):
argmax min y i ( w,x i + b). (15.1)
(w,b): w =1 i∈[m]
Next, we give another equivalent formulation of the Hard-SVM rule as a quadratic
optimization problem. 1
Hard-SVM
input: (x 1 , y 1 ),...,(x m , y m )
solve:
2
(w 0 ,b 0 ) = argmin w s.t. ∀i, y i ( w,x i + b) ≥ 1 (15.2)
(w,b)

w 0 ˆ b 0
output: ˆ w = , b =
w 0 w 0
The lemma that follows shows that the output of hard-SVM is indeed the sep-
arating hyperplane with the largest margin. Intuitively, hard-SVM searches for
w of minimal norm among all the vectors that separate the data and for which
| w,x i + b|≥ 1for all i. In other words, we enforce the margin to be 1, but now
the units in which we measure the margin scale with the norm of w. Therefore, ﬁnd-
ing the largest margin halfspace boils down to ﬁnding w whose norm is minimal.
Formally:

Lemma 15.2. The output of Hard-SVM is a solution of Equation (15.1).

Proof. Let (w ,b ) be a solution of Equation (15.1) and deﬁne the margin achieved

by (w ,b )to be γ = min i∈[m] y i ( w ,x i + b ). Therefore, for all i we have

y i ( w ,x i + b ) ≥ γ
or equivalently
w b
y i ( ,x i + ) ≥ 1.
γ γ
Hence, the pair ( w , b ) satisﬁes the conditions of the quadratic optimization prob-
γ γ
w 1
lem given in Equation (15.2). Therefore, w 0 ≤√ = . It follows that for
γ γ
all i,
1 1
ˆ
y i ( ˆ w,x i + b) = y i ( w 0 ,x i + b 0 ) ≥ ≥ γ .
w 0 w 0
Since ˆ w = 1 we obtain that ( ˆ w,b) is an optimal solution of Equation (15.1).
ˆ
1 A quadratic optimization problem is an optimization problem in which the objective is a convex
quadratic function and the constraints are linear inequalities.

182 183 184 185 186 187 188 189 190 191 192