Page 271 - Understanding Machine Learning
P. 271

21. 2  Online  C lassification  in  the  Unrealizable  Case  253


                  ( t)                                             ( t)
                 w  = 1,  and choosing the ith expert with probability w .  After the learner
                i    i                                             i
                                                               d
              chooses an expert, it receives a vector of costs, v t ∈ [0,1] , where v t,i is the cost of


              following the advice of the ith expert. If the learner’s predictions are randomized,
                                                                   ( t)     ( t)
              then its loss is defined to be the averaged cost, namely,  w v t,i =  w ,v  t  . The
                                                                i    i
              algorithm assumes that the number of rounds T is given. In Exercise 21.4 we show

              how to get rid of this dependence using the doubling trick.
                                         Weighted-Majority
                       input: number of experts, d ; number of rounds, T

                       parameter: η =  2log(d)/T
                       initialize: ˜ w (1)  = (1,...,1)
                       for t = 1,2,...
                                   (t)
                         set w (t)  = ˜ w /Z t where Z t =     ˜ w (t)
                                                    i  i
                                                                   (t)
                         choose expert i at random according to P[i] = w
                                                                   i
                         receive costs of all experts v t ∈ [0,1] d
                                   (t)
                         pay cost  w ,v t
                                        (t+1)   (t) −ηv t,i
                         update rule ∀i, ˜w  =˜w e
                                        i       i
                 The following theorem is key for analyzing the regret bound of Weighted-
              Majority.
              Theorem 21.11. Assuming that T > 2log(d),the Weighted-Majority algorithm enjoys
              the bound
                                T               T
                                    (t)
                                   w ,v t  − min  v t,i ≤  2log(d)T .
                                            i∈[d]
                                t=1            t=1
              Proof. We have:


                                              (t)
                                             ˜ w
                                Z t+1         i                (t) −ηv t,i

                            log     = log       e −ηv t,i  = log  w  e  .
                                                               i
                                 Z t         Z t
                                          i                 i
                                             2
              Using the inequality e −a  ≤ 1 − a + a /2, which holds for all a ∈ (0,1), and the fact
                      (t)
              that  w   = 1, we obtain
                    i  i


                                 Z t+1         (t)          2 2
                              log     ≤ log   w i  1 − ηv t,i + η v /2
                                                              t,i
                                  Z t
                                            i

                                                   (t)       2 2
                                      = log(1 −  w    ηv t,i − η v /2 ).
                                                   i           t,i
                                               i
                                              9         :;         <
                                                        def
                                                        = b
              Next, note that b ∈ (0,1). Therefore, taking log of the two sides of the inequality
              1 − b ≤ e −b  we obtain the inequality log(1 − b) ≤−b, which holds for all b ≤ 1,
   266   267   268   269   270   271   272   273   274   275   276