Page 194 - Understanding Machine Learning

P. 194

Support Vector Machines
176
Now suppose that we ﬂip the order of min and max in the equation. This can only
decrease the objective value (see Exercise 15.4), and we have
m
1 2
min max w + α i (1 − y i w,x i )
m
w α∈R :α≥0 2
i=1
m
1 2
≥ max min w + α i (1 − y i w,x i ) .
α∈R :α≥0 w 2
m
i=1
The preceding inequality is called weak duality. It turns out that in our case, strong
duality also holds; namely, the inequality holds with equality. Therefore, the dual
problem is
m
1 2
max min w + α i (1 − y i w,x i ) . (15.9)
α∈R :α≥0 w 2
m
i=1
We can simplify the dual problem by noting that once α is ﬁxed, the optimization
problem with respect to w is unconstrained and the objective is differentiable; thus,
at the optimum, the gradient equals zero:
m m

w − α i y i x i = 0 ⇒ w = α i y i x i .
i=1 i=1
This shows us that the solution must be in the linear span of the examples, a
fact we will use later to derive SVM with kernels. Plugging the preceding into
Equation (15.9) we obtain that the dual problem can be rewritten as
  
0 0 2 ! "
0 m 0 m

max  1 0 α i y i x i0 + α i  1 − y i α j y j x j ,x i  . (15.10)
0
0
m
α∈R :α≥0 2 0 0
i=1 i=1 j
Rearranging yields the dual problem
 
m m m
1
max  α i − α i α j y i y j x j ,x i . (15.11)

m
α∈R :α≥0 2
i=1 i=1 j=1
Note that the dual problem only involves inner products between instances and
does not require direct access to speciﬁc elements within an instance. This prop-
erty is important when implementing SVM with kernels, as we will discuss in the
next chapter.
15.5 IMPLEMENTING SOFT-SVM USING SGD
In this section we describe a very simple algorithm for solving the optimization
problem of Soft-SVM, namely,
m
λ 2 1
min w + max{0,1 − y w,x i } . (15.12)
w 2 m
i=1
We rely on the SGD framework for solving regularized loss minimization problems,
as described in Section 14.5.3.

189 190 191 192 193 194 195 196 197 198 199