Page 273 - Understanding Machine Learning
P. 273

21.2 Online Classification in the Unrealizable Case  255


                  (t)
                 w |h i (x t ) − y t |. All in all, we have shown that
                i  i
                                          d
                                             (t)              (t)
                                |p t − y t |=  w |h i (x t ) − y t |= w ,v t  .
                                             i
                                         i=1

              Furthermore, for each i,  t t,i is exactly the number of mistakes hypothesis h i
                                       v
              makes. Applying Theorem 21.11 we obtain
              Corollary 21.12. Let H be a finite hypothesis class. There exists an algorithm for
              online classification, whose predictions come from [0,1], that enjoys the regret bound

                            T               T

                              |p t − y t |− min  |h(x t ) − y t |≤  2log(|H|)T .
                                        h∈H
                            t=1            t=1
                 Next, we consider the case of a general hypothesis class. Previously, we con-
              structed an expert for each individual hypothesis. However, if H is infinite this leads
              to a vacuous bound. The main idea is to construct a set of experts in a more sophis-
              ticated way. The challenge is how to define a set of experts that, on one hand, is
              not excessively large and, on the other hand, contains experts that give accurate
              predictions.
                 We construct the set of experts so that for each hypothesis h ∈ H and every
              sequence of instances, x 1 ,x 2 ,...,x T , there exists at least one expert in the set which
              behaves exactly as h on these instances. For each L ≤ Ldim(H) and each sequence
              1 ≤i 1 <i 2 < ··· <i L ≤ T we define an expert. The expert simulates the game between
              SOA (presented in the previous section) and the environment on the sequence
              of instances x 1 ,x 2 ,...,x T assuming that SOA makes a mistake precisely in rounds
              i 1 ,i 2 ,...,i L . The expert is defined by the following algorithm.

                                         Expert(i 1 ,i 2 ,...,i L )

                       input A hypothesis class H ; Indices i 1 < i 2 < ··· < i L
                       initialize: V 1 = H
                       for t = 1,2,...,T
                         receive x t
                                        (r)
                         for r ∈{0,1} let V t  ={h ∈ V t : h(x t ) = r}

                                                  (r)
                         define ˜y t = argmax Ldim V t
                                         r
                          (incaseofa tieset ˜y t = 0)
                         if t ∈{i 1 ,i 2 ,...,i L }
                               predict ˆy t = 1 −Èy t
                         else
                               predict ˆy t =˜y t
                                       (ˆy t )
                         update V t+1 = V t

                 Note that each such expert can give us predictions at every round t while only
              observing the instances x 1 ,...,x t . Our generic online learning algorithm is now an
              application of the Weighted-Majority algorithm with these experts.
   268   269   270   271   272   273   274   275   276   277   278