Page 111 - Understanding Machine Learning

P. 111

9.1 Halfspaces 93

To show that Equation (9.2) holds, we ﬁrst show that w ,w (T +1) ≥ T. Indeed,

at the ﬁrst iteration, w (1) = (0,...,0) and therefore w ,w (1) = 0, while on iteration
t, if we update using example (x i , y i ) we have that
(t)

(t)
w ,w (t+1) − w ,w = w ,w (t+1) − w

= w , y i x i = y i w ,x i
≥ 1.
Therefore, after performing T iterations, we get
T
(t+1) (t)
(T +1)

w ,w = w ,w − w ,w ≥ T , (9.3)
t=1
as required.
Next, we upper bound w (T +1) . For each iteration t we have that
= w
w (t+1) 2 (t) + y i x i 2
(t) 2 (t) 2 2
= w + 2y i w ,x i + y x i
i
(t) 2
≤√w + R 2 (9.4)
where the last inequality is due to the fact that example i is necessarily such that
(t)
y i w ,x i ≤ 0, and the norm of x i is at most R.Now, since w (1) 2
= 0, if we use
Equation (9.4) recursively for T iterations, we obtain that
√
w (T +1) 2 2 ⇒ w (T +1) ≤ TR. (9.5)
≤ TR

Combining Equation (9.3) with Equation (9.5), and using the fact that w = B,we
obtain that
√

w (T +1) ,w T T
≥ √ = .

w w (T +1) B TR BR
We have thus shown that Equation (9.2) holds, and this concludes our proof.
Remark 9.1. The Perceptron is simple to implement and is guaranteed to converge.
However, the convergence rate depends on the parameter B, which in some sit-
uations might be exponentially large in d. Insuchcases, it wouldbe better to
implement the ERM problem by solving a linear program, as described in the pre-
vious section. Nevertheless, for many natural data sets, the size of B is not too large,
and the Perceptron converges quite fast.

9.1.3 The VC Dimension of Halfspaces

To compute the VC dimension of halfspaces, we start with the homogenous case.
d
Theorem 9.2. The VC dimension of the class of homogenous halfspaces in R is d.
Proof. First, consider the set of vectors e 1 ,...,e d , where for every i the vector e i is
the all zeros vector except 1 in the i’th coordinate. This set is shattered by the class

106 107 108 109 110 111 112 113 114 115 116