Page 111 - Understanding Machine Learning
P. 111

9.1 Halfspaces   93


                 To show that Equation (9.2) holds, we first show that  w ,w (T +1)  ≥ T. Indeed,

              at the first iteration, w (1)  = (0,...,0) and therefore  w ,w (1)  = 0, while on iteration
              t, if we update using example (x i , y i ) we have that
                                               (t)



                                                                (t)
                              w ,w (t+1)  −  w ,w  =  w ,w (t+1)  − w

                                                 = w , y i x i  = y i  w ,x i
                                                 ≥ 1.
              Therefore, after performing T iterations, we get
                                          T
                                                   (t+1)      (t)
                                  (T +1)

                             w ,w      =      w ,w     −  w ,w   ≥ T ,           (9.3)
                                         t=1
              as required.
                 Next, we upper bound  w (T +1)  . For each iteration t we have that
                                       = w
                                w (t+1) 2   (t)  + y i x i   2
                                            (t) 2     (t)      2   2
                                       = w   + 2y i  w ,x i  + y  x i
                                                               i
                                            (t) 2
                                       ≤√w   + R   2                             (9.4)
              where the last inequality is due to the fact that example i is necessarily such that
                 (t)
              y i  w ,x i  ≤ 0, and the norm of x i is at most R.Now, since  w (1) 2
                                                                         = 0, if we use
              Equation (9.4) recursively for T iterations, we obtain that
                                                              √
                                 w (T +1) 2   2  ⇒ w  (T +1)  ≤  TR.             (9.5)
                                         ≤ TR

              Combining Equation (9.3) with Equation (9.5), and using the fact that  w  = B,we
              obtain that
                                                            √

                                      w (T +1) ,w    T       T
                                                ≥   √    =     .

                                     w   w (T +1)    B  TR  BR
              We have thus shown that Equation (9.2) holds, and this concludes our proof.
              Remark 9.1. The Perceptron is simple to implement and is guaranteed to converge.
              However, the convergence rate depends on the parameter B, which in some sit-
              uations might be exponentially large in d. Insuchcases, it wouldbe better to
              implement the ERM problem by solving a linear program, as described in the pre-
              vious section. Nevertheless, for many natural data sets, the size of B is not too large,
              and the Perceptron converges quite fast.


              9.1.3 The VC Dimension of Halfspaces

              To compute the VC dimension of halfspaces, we start with the homogenous case.
                                                                               d
              Theorem 9.2. The VC dimension of the class of homogenous halfspaces in R is d.
              Proof. First, consider the set of vectors e 1 ,...,e d , where for every i the vector e i is
              the all zeros vector except 1 in the i’th coordinate. This set is shattered by the class
   106   107   108   109   110   111   112   113   114   115   116