Page 272 - Understanding Machine Learning
P. 272

Online Learning
           254

                 and obtain

                                       Z t+1        (t)       2 2

                                    log     ≤−    w i  ηv t,i − η v /2
                                                                t,i
                                        Z t
                                                i
                                                                (t) 2
                                                           2
                                                   (t)
                                            =−η w ,v t  + η    w v /2
                                                                i  t,i
                                                             i
                                                  (t)
                                                           2
                                            ≤−η w ,v t  + η /2.
                 Summing this inequality over t we get
                                              T              T              2
                                                   Z t+1                 T η
                                                                 (t)
                         log(Z T+1 ) − log(Z 1 ) =  log  ≤−η    w ,v t  +    .      (21.3)
                                                    Z t                   2
                                             t=1             t=1
                                                                    (T +1)  −η     v
                 Next, we lower bound Z T +1 . For each i,we can rewrite ˜w  = e  t t,i  and we
                                                                    i
                 get that


                                         −η     v            −η     v
                       log Z T +1 = log  e   t t,i  ≥ log maxe   t t,i  =−ηmin   v t,i .
                                                         i                  i
                                      i                                        t
                 Combining the preceding with Equation (21.3) and using the fact that log(Z 1 ) =
                 log(d) we get that
                                                           T             2
                                                                       T η
                                                               (t)
                                −ηmin    v t,i − log(d) ≤− η   w ,v t  +   ,
                                    i                                   2
                                       t                  t=1
                 which can be rearranged as follows:
                                    T
                                        (t)                 log(d)  η T
                                       w ,v t  − min  v t,i ≤     +    .
                                                 i            η      2
                                   t=1              t
                 Plugging the value of η into the equation concludes our proof.
                 Proof of Theorem 21.10

                 Equipped with the Weighted-Majority algorithm and Theorem 21.11, we are ready to
                 prove Theorem 21.10. We start with the simpler case, in which H is a finite class,
                 and let us write H ={h 1 ,...,h d }. In this case, we can refer to each hypothesis, h i ,
                 as an expert, whose advice is to predict h i (x t ), and whose cost is v t,i =|h i (x t ) − y t |.
                                                                     (t)
                 The prediction of the algorithm will therefore be p t =  w h i (x t ) ∈ [0,1], and the
                                                                  i  i
                 loss is

                                       
 d             
  
 d

                                       
    (t)        
  
    (t)
                                            i                  i
                              |p t − y t |= 
  w h i (x t ) − y t
 = 
  w (h i (x t ) − y t )
.

                                        i=1                i=1
                 Now, if y t = 1, then for all i, h i (x t ) − y t ≤ 0. Therefore, the above equals to
                      (t)
                    w |h i (x t ) − y t |.If y t = 0 then for all i, h i (x t ) − y t ≥ 0, and the above also equals
                    i  i
   267   268   269   270   271   272   273   274   275   276   277